How to Test Your Site for AI Visibility (Fast)
You deploy. The site loads. No errors. Two days later: zero AI traffic, no citations, no visibility. We see this constantly — a React or Vite app returns 200 OK, the HTML is 4 KB, contains 12 words, and is mostly script tags. The browser renders fine. AI crawlers get nothing.
On This Page
The Real Problem
If your initial HTML is empty, your AI visibility is zero. No partial credit. Not "mostly indexed." Not "partially extracted." Zero.
The trap is that everything looks fine. Status code 200. Lighthouse green. The page renders perfectly in your browser. Search Console eventually shows the URL as "Discovered." But ChatGPT, Claude, and Perplexity never quote you. Why? Because they read a different document than you do.
A real failing page we audited last week:
- • HTML size: 4.2 KB
- • Visible text: 12 words
- • DOM: 1 empty
<div id="root">+ 6 script tags - • Browser-rendered version: 13,547 words, 77.5 KB
Same URL. Two completely different documents. AI only ever sees the first one.
How AI Crawlers Actually Work
AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, and the rest) do exactly three things, reliably:
- Request the URL
- Read the raw HTML response
- Extract text, links, and structure
They do not wait for:
- React hydration
- API calls or async data fetches
- Client-side routing or rendering
- Lazy-loaded components
- Retries on failure
So your site has two versions. The browser version is 300 KB+ of fully-painted DOM after JavaScript runs. The initial HTML is 4–10 KB, an empty root div, no meaningful text. AI systems only see the second one.
For deeper context on why this happens, see How AI Crawlers Read Your Website and What AI Crawlers Actually Extract.
What Most Guides Get Wrong
Most SEO advice is written for Googlebot circa 2020. That advice is misleading for AI:
"Google can render JS, so SPAs are fine."
Even Google's rendering is delayed and skipped (see Why "Google Renders JavaScript" Is Misleading). AI crawlers don't render at all.
"Client-side rendering is fine."
Fine for users. Invisible to AI. CSR pages with no SSR fallback are zero-citation by default.
"Just improve content quality."
Content quality doesn't matter if the HTML response doesn't contain the content. You can't optimize what isn't there.
If the HTML doesn't contain content, AI doesn't see content. That's the whole rule.
What We See in Production
Four repeatable failure modes. We see all four every week across React, Vite, and Lovable apps.
Script shell pages (most common)
Signals: HTML < 10 KB, visible text < 50 chars, ~80% of the DOM is script tags.
Outcome: Zero AI extraction. UI renders fine in the browser, curl returns only scripts. Covered in detail in Script Shell Pages and Your HTML Is Only 4KB.
Partial deploy failures
Signals: Bundle 404s or a CDN blocks the JS chunk. HTML size unchanged (~5 KB). Console shows a bundle error. Page is visually blank.
Outcome: Users see a broken UI, AI sees a blank page, status code is still 200. See Why Your Site Randomly Breaks After Deploy.
Hydration-only content
Signals: HTML contains layout/nav only — no paragraph text, no headings. All content loads via API after mount.
Outcome: AI gets structure with no content, ignores the page. See Hydration Crashes.
Silent regressions after deploy
Signals: A page used to ship 2,500 words. After a deploy, HTML drops 120 KB → 8 KB. Visible text drops 90%.
Outcome: AI visibility disappears overnight. No alert fires unless you track it. Guard tracks this as major text drop (>40%) and major DOM drop (>50%).
Quick Test: What Do Bots Actually See?
Most people guess. Don't.
Run this test and look at the actual response your site returns to bots.
Fetch your page as Googlebot
Use your terminal:
curl -A "Googlebot" https://yourdomain.comLook for:
- Real visible text (not just
<div id="root">) - Meaningful content in the HTML
- Page size (should not be tiny)
Compare bot vs browser
Now test what a real browser gets:
curl -A "Mozilla/5.0" https://yourdomain.comIf these responses are different, Google is indexing a different page than your users see.
Stop guessing — measure it.
Real example: 253 words vs 13,547
We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

If your HTML doesn't contain the content, Google doesn't either.
Compare Googlebot vs browser on your site → HTTP Debug ToolCheck for common failure signals
We see this all the time in production:
- HTML under ~1KB → usually empty shell
- Visible text under ~200 characters → thin or missing content
- Missing <title> or <h1> → weak or broken page
- Large difference between bot vs browser HTML → rendering issue
Use the DataJelly Visibility Test (Recommended)
You can run this without touching curl. It shows you:
- Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
- Fully rendered browser version
- Side-by-side differences in word count, HTML size, links, and content
What this test tells you (no guessing)
After running this, you'll know:
- Whether your HTML is actually indexable
- Whether bots are seeing partial content
- Whether rendering is breaking in production
This is the difference between "I think SEO is set up" and "I know what Google is indexing."
If you don't understand why this happens, read: Why Google Can't See Your SPA
If this test fails
You have three real options:
SSR
Works if you can keep it stable in production
Prerendering
Breaks with dynamic content and scale
Edge Rendering
Reflects real production output without app changes
If you do nothing, you will not rank consistently. Learn how Edge Rendering works →
This issue doesn't show up in Lighthouse. It shows up in rankings.
Practical Checklist (Fast Testing Workflow)
The whole workflow takes under 5 minutes per page. Run it after every deploy.
1. Fetch raw HTML as an AI bot
curl -H "User-Agent: GPTBot" https://yourdomain.comCheck immediately:
- HTML size (target: > 30 KB)
- Real paragraph text (not "Loading…" or empty divs)
- Headings, links, structured data
2. Compare to a browser response
curl -H "User-Agent: Mozilla/5.0" https://yourdomain.comIf the two responses are wildly different in size or word count, your bot version is incomplete. Run a side-by-side at /seo-tools/http-debug.
3. Inspect the actual HTML shape
A failing SPA looks like this:
<!doctype html><html lang="en"><head><meta charset="UTF-8" /><title>Acme — modern SaaS</title></head><body><div id="root"></div><script type="module" src="/assets/index-a3f7.js"></script><script src="https://cdn.example.com/analytics.js"></script><!-- ...4 more script tags... --></body></html>
If your response looks like that, AI sees nothing. Check for a real <h1>, body text, and meta tags.
4. Disable JavaScript in your browser
DevTools → Command Palette → "Disable JavaScript" → reload. If the page goes blank or shows a spinner, your AI visibility is broken. Bots experience that exact view.
5. Look for script-heavy responses
Open the response. If you see multiple large script tags and no inline content, you have a script shell page. Run Page Validator to score bot-readiness automatically.
6. Track HTML size and text length over time
Diff between deploys:
- HTML size drop > 50% → broken
- Visible text drop > 40% → major issue
- Headings count → 0 → critical
7. Repeat after every deploy
This breaks in production when bundles fail, configs drift, or a CDN blocks an asset. If you're not checking after deploys, you're blind. Automate it with Guard.
[Screenshot placeholder: side-by-side terminal output of curl -H "User-Agent: GPTBot" vs browser fetch on the same URL]
Real Thresholds (Not Theoretical)
These map directly to production failures we see every week:
| Metric | Healthy | At-risk | Broken |
|---|---|---|---|
| HTML size | 30–200 KB | 10–30 KB | < 10 KB |
| Visible text | > 500 words | 200–500 words | < 200 chars |
| Headings (incl. H1) | = 3 | 1–2 | 0 |
| Text drop vs baseline | < 10% | 10–40% | > 40% |
| HTML drop vs baseline | < 20% | 20–50% | > 50% |
Prerender vs SSR vs Edge
If your test fails, you have three real options. Most teams pick the wrong one.
Prerender
Works when: pages are static and rarely change.
Breaks when: content updates frequently or invalidation fails. We see snapshots showing two-month-old pricing. See Hidden Costs of Prerendering.
SSR (Next.js)
Works: HTML contains full content.
Costs: full app rewrite, server complexity, slower TTFB. Often a 6–12 week project.
Edge proxy (DataJelly)
Behavior: bots get fully-rendered HTML snapshots; AI crawlers get clean Markdown.
Result: 80–200 KB HTML, full content, no rewrite. Works with React, Vite, Lovable.
Full breakdown: Prerender vs SSR vs Edge Rendering.
If your content is not in the initial HTML, it does not exist for AI.
Not after hydration. Not after API calls. Only what's in the first response counts. Most modern SPAs fail this test by default.
The DataJelly Approach
DataJelly fixes this without touching your app. Edge serves fully-rendered HTML snapshots to bots and clean AI Markdown to GPTBot/ClaudeBot/PerplexityBot. Guard monitors the real signals — HTML size, visible text, DOM changes, rendering failures — so when a deploy breaks your visibility, you know in minutes, not weeks.
- Edge proxy delivers 80–200 KB rendered HTML to bots
- AI Markdown for GPTBot, ClaudeBot, PerplexityBot
- Guard tracks size/text drops across deploys
- Works with React, Vite, and Lovable SPAs — no rewrites