The Real Problem
Browser view
3,000–8,000
words, full UI
Raw HTML response
4–12 KB
mostly <script> tags
AI crawler view
~0
usable content
We see this constantly on React, Vite, and Lovable builds. The browser shows a fully interactive page with thousands of words of content. The raw HTML response — what AI crawlers actually receive — contains almost nothing.
This is not a minor SEO issue. This is missing HTML.
If your <body> doesn't contain real text on first response, AI systems ignore it. Not "eventually index it." Not "figure it out later." Ignore it.
What's Actually Happening
AI crawlers behave like fast HTTP clients, not browsers. They don't open Chrome. They don't wait for your React app to hydrate. They don't call your APIs.
Here's what they actually do:
- 1Fetch HTML — a single HTTP GET request
- 2Extract visible text + structure — headings, paragraphs, lists, links
- 3Convert to internal format — often Markdown-like for downstream processing
- 4Store embeddings — for retrieval and citation in AI responses
They do not wait for hydration. They do not run your React app. They do not call your APIs.
Concrete Example
Request your page with curl:
curl -s https://yourdomain.com | wc -c
# HTML size: 7 KB
curl -s https://yourdomain.com | grep -oP '(?<=<body>).*(?=</body>)' | wc -w
# <body> text: ~20 wordsThat's the entire page to an AI crawler. After hydration, your browser shows 5,000+ words. The crawler never sees any of it.
💡 This is the same fundamental gap we cover in Why Google Can't See Your SPA — but AI crawlers are even less forgiving because they almost never attempt JavaScript execution.
What Most Guides Get Wrong
Most SEO content still assumes things that are flatly wrong for AI crawlers:
Bots execute JavaScript → AI crawlers almost never do
Rendering eventually happens → AI crawlers have near-zero delay tolerance
Content gets picked up later → if it's not in the first HTML response, it doesn't exist
If you're reading a guide that says "Google renders JavaScript" and assumes the same applies to ChatGPT, Claude, or Perplexity — it's wrong. These systems optimize for fast extraction, not full rendering.
What Breaks in Production
These are not rare edge cases. We see these failures on production sites every week. They're standard failure patterns for JavaScript apps.
Script Shell Pages
- HTML: 5–15 KB — almost entirely
<script>tags - Visible text: under 50 words
- This is the exact pattern Guard flags as
script_shell_only
The AI crawler receives a page that is functionally empty. Zero indexable content.
Partial Hydration
- Header renders server-side → visible
- Main content injected via JS → invisible to crawlers
- Page looks "fine" to humans —
<h1>present but body text missing
The crawler captures an incomplete page. Your heading says "Pricing" but there's no pricing content.
Broken Deep Links
/pricing,/features,/docsall return the same shell HTML- Content loaded via client-side router — never present in initial response
- Crawler sees: no pricing content, no product info, no links
JS Bundle Failure
- One script fails (network timeout or CDN issue)
- Browser retries → user sees the page eventually
- Crawler gets broken render → zero content
Guard flags this as critical_bundle_failure. The page is effectively dead.
CDN / Bot Blocking
- Cloudflare or other CDN returns 403 for non-browser user agents
- Crawler never reaches your origin server
- Result: zero crawlable content, zero visibility
This is surprisingly common. Your CDN's bot protection is actively blocking the systems you want to be visible to.
The result in every case: HTML under 10 KB, visible text under 100 words, zero internal links. That page is effectively dead to AI.
How AI Crawlers Differ from Search Engines
| Behavior | Googlebot | AI Crawlers |
|---|---|---|
| JS execution | Sometimes (queued) | Almost never |
| Render delay tolerance | Seconds to minutes | Near zero |
| HTML dependency | Medium | Absolute |
| Output | Search index | Structured summaries & embeddings |
| Retry behavior | Will revisit | Usually one-shot |
The key difference: AI crawlers optimize for fast extraction, not full rendering. If your content isn't in the initial HTML, it doesn't exist in their pipeline. For a deeper look at how different bots behave, see our Bots Guide.
What Content Formats Actually Work
AI systems consistently extract content from these formats. Everything else degrades.
1. Real HTML Text
- 500–1,000+ words in
<body> - Semantic tags:
<h1>,<p>,<ul> - Content present in HTML — not injected via JavaScript
2. Clean Structure
- Headings properly nested (H1 → H2 → H3)
- Lists instead of div soup
- Links visible in HTML (not generated by JS event handlers)
3. Markdown-Friendly Content
Internally, most AI pipelines convert HTML → Markdown before processing. If your HTML relies on JavaScript, uses dynamic rendering, or lacks structure — it degrades heavily during this conversion.
This is exactly why DataJelly generates AI Markdown snapshots — clean, structured Markdown served directly to AI crawlers, reducing token usage by up to 91% while preserving content hierarchy.
Solutions Compared
Prerendering
Works if:
- • Under 100 routes
- • Content rarely changes
Breaks when:
- • Dynamic pages
- • Stale builds
- • Route explosion
SSR
Works if:
- • Server always returns full HTML
- • Hydration doesn't break
Breaks when:
- • Slow backend
- • Partial renders
- • Caching inconsistencies
Edge Rendering
What actually works:
- • Fully rendered HTML at request time
- • Structured Markdown for AI
- • Zero app changes required
- • No hydration dependency
This is exactly what DataJelly's edge proxy + snapshot system does:
- HTML snapshots for search bots — fully rendered, real content
- AI Markdown for AI crawlers — structured, token-efficient, citation-ready
- Zero reliance on client-side rendering
For a deeper comparison, read Prerender vs SSR vs Edge Rendering.
Practical Checklist
Run these against your site. If any fail, AI crawlers are seeing a broken page.
1. Raw HTML size
curl your page and check total size
2. Text density
Check word count in <body>
3. Script ratio
Check what percentage of HTML is <script> tags
4. Deep link test
Test /pricing, /features, /docs individually
5. Bot simulation
Remove browser headers and request your page
Want to automate this? The HTTP Debug Tool runs these checks for you.
Quick Test
Quick Test: What Do Bots Actually See?
Most people guess. Don't.
Run this test and look at the actual response your site returns to bots.
Fetch your page as Googlebot
Use your terminal:
curl -A "Googlebot" https://yourdomain.comLook for:
- Real visible text (not just
<div id="root">) - Meaningful content in the HTML
- Page size (should not be tiny)
Compare bot vs browser
Now test what a real browser gets:
curl -A "Mozilla/5.0" https://yourdomain.comIf these responses are different, Google is indexing a different page than your users see.
Stop guessing — measure it.
Real example: 253 words vs 13,547
We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

If your HTML doesn't contain the content, Google doesn't either.
Compare Googlebot vs browser on your site → HTTP Debug ToolCheck for common failure signals
We see this all the time in production:
- HTML under ~1KB → usually empty shell
- Visible text under ~200 characters → thin or missing content
- Missing <title> or <h1> → weak or broken page
- Large difference between bot vs browser HTML → rendering issue
Use the DataJelly Visibility Test (Recommended)
You can run this without touching curl. It shows you:
- Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
- Fully rendered browser version
- Side-by-side differences in word count, HTML size, links, and content
What this test tells you (no guessing)
After running this, you'll know:
- Whether your HTML is actually indexable
- Whether bots are seeing partial content
- Whether rendering is breaking in production
This is the difference between "I think SEO is set up" and "I know what Google is indexing."
If you don't understand why this happens, read: Why Google Can't See Your SPA
If this test fails
You have three real options:
SSR
Works if you can keep it stable in production
Prerendering
Breaks with dynamic content and scale
Edge Rendering
Reflects real production output without app changes
If you do nothing, you will not rank consistently. Learn how Edge Rendering works →
This issue doesn't show up in Lighthouse. It shows up in rankings.
The Bottom Line
AI crawlers don't "figure it out later." They read exactly what you return in the first HTML response.
If your page is under 10 KB, under 100 words, and script-heavy — it does not exist to AI.
The fix is not tweaking SEO metadata. The fix is: return real HTML, return structured content, and stop depending on client-side rendering to do the heavy lifting.