On This Page
The Real Problem
We see this all the time: a React or Lovable app returns:
- ~1.2KB HTML
<div id="root"></div>- 3–6
<script>tags
It renders perfectly in Chrome. Googlebot sees an empty page. AI crawlers extract nothing.
Same HTML. Three completely different outcomes.
This is the foundational mistake. You are serving one response to three very different consumers — and assuming it works the same way for each. It doesn't.
What's Actually Happening
Your HTML goes to three consumers that process it completely differently:
| Consumer | Behavior | Expects |
|---|---|---|
| Humans (browser) | Executes JS, waits for hydration, pulls API data after load | Full interactive app |
| Search bots | Often do not execute JS, evaluate raw HTML only | 5KB–100KB HTML, 300–2000+ words, crawlable <a> links |
| AI crawlers | Ignore most DOM structure, strip scripts and UI noise | Clean text blocks, headings + paragraphs, minimal navigation clutter |
If your HTML depends on JavaScript to become "real," bots never see it. This isn't a theoretical problem — it's the default behavior of every SPA framework.
What Most Guides Get Wrong
Most guides say:
- "Just use prerendering"
- "SSR fixes SEO"
- "Make sure HTML is indexable"
That misses the real issue.
The problem is not missing HTML. The problem is wrong HTML for the audience.
A prerendered page with 48KB of HTML that includes a logged-in header, region-specific pricing, and an A/B test variant doesn't help anyone. Google indexes the wrong version. Users see personalized content leaked to the wrong audience. You fixed SEO and broke your product.
Concrete Failure Examples
These are not edge cases. We see every one of these in production, regularly.
1. Dynamic Content Freezes
Prerender snapshot taken at deploy time:
- "Top products" → same 5 items forever
- "Latest posts" → never updates
- Pricing → outdated within hours
We've seen ecommerce pages indexed with out-of-stock items for weeks. The HTML was captured once and never refreshed.
2. Personalization Leaks
Real example:
- User A logs in
- Snapshot gets captured
- HTML gets cached
Now every visitor sees "Welcome back, John" and personalized dashboard links. This happens when prerender runs after auth state loads. It is not rare.
3. You're Caching Failures
If the snapshot runs during:
- An API outage → empty sections
- A critical JS crash → partial render
- A hydration failure → missing content
That broken state gets cached and served as the permanent output. Users and bots both see the broken page until someone manually triggers a re-render.
4. Infrastructure Errors Get Hidden
This one is dangerous. We've seen systems where:
- TLS handshake fails
- Origin returns 502
- Proxy returns fallback HTML with a 200 status
Now bots think the page is valid. Users see degraded content. Errors go completely undetected. A proper edge layer prevents this — TLS failure returns a hard 502, no silent fallbacks. If you serve one HTML to everyone, you lose this protection.
What We See in Production
"Fixed" SEO Pages That Still Fail
The most common pattern we see:
- HTML = 2–3KB
- Text = <150 characters
- DOM = mostly
<script>tags
This triggers:
- Blank page detection
- Thin content classification
- No internal link discovery
This is exactly what DataJelly Guard flags: blank pages (<1KB or low text) and script-shell-only pages.
AI Crawlers Get Garbage
Typical HTML snapshot breakdown:
- 60KB total
- 70% scripts + navigation
- 30% real content
AI systems extract nav labels, footer junk, and partial headings. They miss your main content, key context, and structured meaning. Same HTML — wrong format entirely.
The Fix
The fix is not better rendering. It's serving the right output at the edge, per consumer.
You have three real options:
SSR
Better HTML, still one format for all consumers
Prerendering
Breaks with dynamic content and scale
Edge Rendering
Right output per consumer, always fresh
If you do nothing, you will not rank consistently. Learn how Edge Rendering works →
Solutions Comparison
| Approach | What Happens | Failure Pattern | Verdict |
|---|---|---|---|
| Prerender | HTML generated once, served to all traffic | Stale data, broken personalization, snapshot age | Works for static blogs. Breaks anything dynamic. |
| SSR | HTML generated per request | Still one HTML for all consumers, includes UI noise + scripts | Better HTML, still wrong abstraction. |
| Edge Rendering | Detect request type, serve different outputs per consumer | — | Each system gets what it needs. |
For a detailed breakdown of each approach, read: Prerender vs SSR vs Edge Rendering.
What We Do Differently
Instead of forcing one output, we split the problem:
| Consumer | Output |
|---|---|
| Human | Full app (unchanged) |
| Search bot | Fully rendered HTML snapshot |
| AI crawler | Clean, structured Markdown |
The edge proxy detects bot vs human vs AI at the CDN level. The snapshot service generates HTML for search bots. The AI pipeline produces structured Markdown with ~91% token reduction.
This avoids all the failure modes above:
- No stale UI for users
- No empty HTML for bots
- No noisy DOM for AI
You stop trying to make one format do everything.
Quick Test: What Do Bots Actually See?
Quick Test: What Do Bots Actually See?
Most people guess. Don't.
Run this test and look at the actual response your site returns to bots.
Fetch your page as Googlebot
Use your terminal:
curl -A "Googlebot" https://yourdomain.comLook for:
- Real visible text (not just
<div id="root">) - Meaningful content in the HTML
- Page size (should not be tiny)
Compare bot vs browser
Now test what a real browser gets:
curl -A "Mozilla/5.0" https://yourdomain.comIf these responses are different, Google is indexing a different page than your users see.
Stop guessing — measure it.
Real example: 253 words vs 13,547
We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

If your HTML doesn't contain the content, Google doesn't either.
Compare Googlebot vs browser on your site → HTTP Debug ToolCheck for common failure signals
We see this all the time in production:
- HTML under ~1KB → usually empty shell
- Visible text under ~200 characters → thin or missing content
- Missing <title> or <h1> → weak or broken page
- Large difference between bot vs browser HTML → rendering issue
Use the DataJelly Visibility Test (Recommended)
You can run this without touching curl. It shows you:
- Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
- Fully rendered browser version
- Side-by-side differences in word count, HTML size, links, and content
What this test tells you (no guessing)
After running this, you'll know:
- Whether your HTML is actually indexable
- Whether bots are seeing partial content
- Whether rendering is breaking in production
This is the difference between "I think SEO is set up" and "I know what Google is indexing."
If you don't understand why this happens, read: Why Google Can't See Your SPA
If this test fails
You have three real options:
SSR
Works if you can keep it stable in production
Prerendering
Breaks with dynamic content and scale
Edge Rendering
Reflects real production output without app changes
If you do nothing, you will not rank consistently. Learn how Edge Rendering works →
This issue doesn't show up in Lighthouse. It shows up in rankings.
Practical Checklist
You can verify this in 2 minutes.
1. Check raw HTML size
- <2KB → broken
- <5KB → likely missing content
2. Check visible text
- <200 characters → fail
- <500 characters → weak
3. Disable JavaScript
If the page becomes empty, bots see nothing. This is the single most reliable test. If your content disappears, you have a rendering problem.
4. Inspect the HTML
Look for:
<script>dominating the DOM- Missing
<a>links - No meaningful text
5. Test multiple user agents
| Agent | Expected |
|---|---|
| Browser | Full page |
| Bot | Minimal HTML |
| AI crawler | Poor or no extraction |
If the responses are identical, that's your problem.
Related Diagnostic Tools
Run these yourself — no signup required:
Final Takeaway
One HTML response is not "simple." It's incorrect.
You are serving a browser, a crawler, and an AI model. Each needs a different format. If your HTML is too small, too script-heavy, or too stale — you are already broken.
The fix is not better rendering. It's serving the right output at the edge, per consumer. That's what we built DataJelly Edge to do.
FAQ
Related Reading
Why Script-Based Prerendering Breaks on Real Apps
The deep-dive into why build-time rendering fails in production.
Prerender vs SSR vs Edge Rendering
Side-by-side comparison of what actually works for SEO.
Why Google Can't See Your SPA
The rendering gap that breaks indexing for JavaScript apps.
React SEO Is Broken by Default
Why React apps ship with zero SEO out of the box.
SPA SEO Checklist: 10 Things to Fix
Actionable checklist for making SPAs indexable.
Why Your Sitemap Exists But Google Ignores Pages
Discovery ≠ indexing — the rendering gap behind ignored sitemaps.