How to Debug SEO Issues in a React App
You ship a React app. It returns 200 OK. It works in Chrome. Google doesn't index it. We see this constantly: the HTML response is 2–4KB, contains a root div and scripts, and zero usable text. From a crawler's perspective, the page is empty. This is not an SEO tweak problem. It's a rendering failure.
On This Page
What's Actually Happening
Most React apps ship a client-rendered shell. The server response is ~2–5KB of HTML containing a root <div id="root"> and a script bundle reference. Content only exists after JS executes in the browser.
<!DOCTYPE html>
<html>
<head>
<title>My App</title>
</head>
<body>
<div id="root"></div>
<script src="/assets/index-9f3a1.js"></script>
</body>
</html>Bots don't dependably execute your JS. Even when they do, it's delayed and inconsistent. What actually gets indexed is the initial HTML — and whatever text exists at response time.
If that HTML has:
- <200 characters of visible text
- No
<h1> - No internal links
…it is treated as empty. No text, no links, no structure → no indexing. Companion read: Why Google Can't See Your SPA.
Step 1: Inspect Raw HTML (Not the Rendered DOM)
The most common debugging mistake: opening DevTools and inspecting the Elements panel. That shows the post-JS DOM. Bots don't see that. They see the raw response.
Run:
curl -A "Googlebot" https://yoursite.com -o page.html
wc -c page.html # byte count
grep -o '<h1' page.html | wc -l # h1 countOr in browser: View Source (Cmd/Ctrl+U) — not Inspect Element.
Hard numbers to look for:
- HTML size < 5KB → failure
- Visible text < 200 chars → failure
- No
<h1>or paragraphs → failure
If you don't see real content in raw HTML, stop. That's the root cause. Everything else is downstream of this.
Step 2: Measure Content (Real Thresholds)
Stop guessing. Use real numbers. These are the production thresholds we apply to every site we audit:
| Metric | Healthy | Suspicious | Broken |
|---|---|---|---|
| html_bytes | 20KB+ | 5–15KB | < 5KB |
| visible_text_length | 1,500+ chars | 200–800 chars | < 200 chars |
| word_count | 300+ | 50–200 | < 50 |
| internal_links | 10+ | 1–5 | 0 |
What we see in production:
- • <200 chars → indexed as blank
- • <1KB HTML → guaranteed no indexing
- • 0 internal links → no crawl discovery
The Page Validator applies these exact thresholds for you and flags pages as blank_page or script_shell_only.
Step 3: Identify Script Shell Pages
Script shells have a clear pattern. If your raw HTML matches this, you have one:
HTML size: ~3KB
<script> tags: 15–40
<h1>: missing
<p>: missing
visible text: <100 chars
root element: <div id="root"></div>The page exists. The shell loads. The content is locked behind JS execution that bots don't run. We have a full breakdown here: Script Shell Pages: When Your App Loads But Nothing Works.
Quick sanity check (Chrome):
- DevTools → Settings → Disable JavaScript
- Hard refresh the page
- If the page is blank → bots see the same blank page
Step 4: Check Internal Links and Structure
Indexing isn't enough. Bots also need to discover your other pages. That happens via internal links in the raw HTML.
# Count <a href> tags in the raw response
curl -sA "Googlebot" https://yoursite.com | grep -oE '<a [^>]*href' | wc -lIf this returns 0, your homepage links nowhere. Bots crawl one page and leave. We see this constantly on React Router apps where navigation is rendered by JS.
Also verify:
<title>is unique per route — not a static "My App" everywhere<meta name="description">exists and matches the page- Open Graph tags are present in HTML, not injected by JS
- JSON-LD structured data is in the response, not appended after render
Step 5: Compare Bot vs Browser
The single most useful test: fetch the page with a bot user agent, then with a browser user agent, and diff the responses. If they're meaningfully different, Google is indexing a different page than your users see.
Quick Test: What Do Bots Actually See?
Most people guess. Don't.
Run this test and look at the actual response your site returns to bots.
Fetch your page as Googlebot
Use your terminal:
curl -A "Googlebot" https://yourdomain.comLook for:
- Real visible text (not just
<div id="root">) - Meaningful content in the HTML
- Page size (should not be tiny)
Compare bot vs browser
Now test what a real browser gets:
curl -A "Mozilla/5.0" https://yourdomain.comIf these responses are different, Google is indexing a different page than your users see.
Stop guessing — measure it.
Real example: 253 words vs 13,547
We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

If your HTML doesn't contain the content, Google doesn't either.
Compare Googlebot vs browser on your site → HTTP Debug ToolCheck for common failure signals
We see this all the time in production:
- HTML under ~1KB → usually empty shell
- Visible text under ~200 characters → thin or missing content
- Missing <title> or <h1> → weak or broken page
- Large difference between bot vs browser HTML → rendering issue
Use the DataJelly Visibility Test (Recommended)
You can run this without touching curl. It shows you:
- Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
- Fully rendered browser version
- Side-by-side differences in word count, HTML size, links, and content
What this test tells you (no guessing)
After running this, you'll know:
- Whether your HTML is actually indexable
- Whether bots are seeing partial content
- Whether rendering is breaking in production
This is the difference between "I think SEO is set up" and "I know what Google is indexing."
If you don't understand why this happens, read: Why Google Can't See Your SPA
If this test fails
You have three real options:
SSR
Works if you can keep it stable in production
Prerendering
Breaks with dynamic content and scale
Edge Rendering
Reflects real production output without app changes
If you do nothing, you will not rank consistently. Learn how Edge Rendering works →
This issue doesn't show up in Lighthouse. It shows up in rankings.
Common Root Causes
Four patterns. We see them constantly in React, Vite, and Lovable apps.
Pure CSR (no SSR, no prerender)
Symptom: Every route returns the same shell HTML.
Fix: Serve real HTML to bots — via SSR, prerendering, or an edge proxy.
Hydration mismatch crashes
Symptom: SSR'd HTML exists but React throws during hydration. UI breaks. Console shows "Text content does not match server-rendered HTML."
Fix: Eliminate non-deterministic rendering on first paint. Read: Hydration Crashes: The Silent Killer.
API-gated content
Symptom: The app waits for /api/page-data before rendering anything. The API is slow or fails intermittently. Bots see an empty page.
Fix: Render core content from the server response, not the client API. Show fallback HTML during loading.
Meta tags injected by JS
Symptom: react-helmet or similar adds title/description after mount. Bots index the static <title> from index.html — usually generic, identical for every page.
Fix: Generate per-route HTML at build time or at the edge so meta tags are in the response.
Solutions Compared: SSR vs Prerender vs Edge
Three real approaches. Each has tradeoffs.
| Approach | Works when | Breaks when |
|---|---|---|
| SSR (Next.js, Remix) | You can rewrite to a meta-framework and absorb infra cost | Higher TTFB, infra complexity, hot path scales with traffic |
| Prerendering | Routes are known upfront, content rarely changes, <100 pages | Dynamic data, growing routes, invalidation is imperfect — see Hidden Costs |
| Edge (DataJelly) | You want bots to see live HTML without rewriting your React app | No long-lived snapshot cache → no drift, no stale content |
How Edge solves React SEO
- • Generates or validates HTML at request time for bots — no app rewrite
- • Search bots get full HTML snapshots with real content, links, and meta tags
- • AI crawlers (GPTBot, ClaudeBot, Perplexity) get clean Markdown
- • Real users still get the live SPA — zero impact on UX
- • Works with React, Vite, and Lovable apps out of the box
Practical Checklist
Run all eight on your live site. If even one fails, you have a measurable React SEO problem.
Raw HTML > 15KB on content pages
Anything under 5KB is a script shell.
Visible text > 500 chars in raw response
Under 200 chars = indexed as blank.
<h1> exists in raw HTML
Missing h1 = no semantic anchor for ranking.
Per-route <title> and meta description
Same title across all routes = generic indexing.
Internal <a href> links in raw HTML
0 links = no crawl discovery beyond the homepage.
Page renders with JS disabled
Blank page with JS off = bots see blank too.
Bot vs browser HTML are similar size
Large diff = rendering inconsistency.
Open Graph + JSON-LD in response, not JS-injected
Late injection = social previews and rich results break.
Want this automated? The Page Validator and HTTP Bot Comparison tool run most of these for you.
React SEO problems are rendering problems.
No amount of meta-tag tuning, sitemap fiddling, or backlink work fixes a 4KB shell. If your HTML doesn't contain the content, Google doesn't either.
What DataJelly Does About This
DataJelly Edge sits in front of your existing React app. It serves complete HTML snapshots to search bots and clean Markdown to AI crawlers — without changing your application code. The goal is simple: bots see the same complete page your users see.
Works with React, Vite, and Lovable apps. No rewrite. No SSR migration. No prerender drift.