DJ
DataJelly
Visibility Test
EdgeGuard
PricingSEO ToolsGuidesGet Started
Dashboard
Back to Blog
Blog
AI Visibility
April 2026

How AI Crawlers (ChatGPT, Claude, Perplexity) Actually Read Your Website

Your site renders fine in Chrome and is invisible to AI crawlers. We see this constantly. Here's what's actually happening — and why most advice about it is wrong.

Reading progress0%

On This Page

The Real Problem

Browser view

3,000–8,000

words, full UI

Raw HTML response

4–12 KB

mostly <script> tags

AI crawler view

~0

usable content

We see this constantly on React, Vite, and Lovable builds. The browser shows a fully interactive page with thousands of words of content. The raw HTML response — what AI crawlers actually receive — contains almost nothing.

This is not a minor SEO issue. This is missing HTML.

If your <body> doesn't contain real text on first response, AI systems ignore it. Not "eventually index it." Not "figure it out later." Ignore it.

What's Actually Happening

AI crawlers behave like fast HTTP clients, not browsers. They don't open Chrome. They don't wait for your React app to hydrate. They don't call your APIs.

Here's what they actually do:

  1. 1Fetch HTML — a single HTTP GET request
  2. 2Extract visible text + structure — headings, paragraphs, lists, links
  3. 3Convert to internal format — often Markdown-like for downstream processing
  4. 4Store embeddings — for retrieval and citation in AI responses

They do not wait for hydration. They do not run your React app. They do not call your APIs.

Concrete Example

Request your page with curl:

curl -s https://yourdomain.com | wc -c
# HTML size: 7 KB

curl -s https://yourdomain.com | grep -oP '(?<=<body>).*(?=</body>)' | wc -w
# <body> text: ~20 words

That's the entire page to an AI crawler. After hydration, your browser shows 5,000+ words. The crawler never sees any of it.

💡 This is the same fundamental gap we cover in Why Google Can't See Your SPA — but AI crawlers are even less forgiving because they almost never attempt JavaScript execution.

What Most Guides Get Wrong

Most SEO content still assumes things that are flatly wrong for AI crawlers:

Bots execute JavaScript → AI crawlers almost never do

Rendering eventually happens → AI crawlers have near-zero delay tolerance

Content gets picked up later → if it's not in the first HTML response, it doesn't exist

If you're reading a guide that says "Google renders JavaScript" and assumes the same applies to ChatGPT, Claude, or Perplexity — it's wrong. These systems optimize for fast extraction, not full rendering.

What Breaks in Production

These are not rare edge cases. We see these failures on production sites every week. They're standard failure patterns for JavaScript apps.

1

Script Shell Pages

  • HTML: 5–15 KB — almost entirely <script> tags
  • Visible text: under 50 words
  • This is the exact pattern Guard flags as script_shell_only

The AI crawler receives a page that is functionally empty. Zero indexable content.

2

Partial Hydration

  • Header renders server-side → visible
  • Main content injected via JS → invisible to crawlers
  • Page looks "fine" to humans — <h1> present but body text missing

The crawler captures an incomplete page. Your heading says "Pricing" but there's no pricing content.

3

Broken Deep Links

  • /pricing, /features, /docs all return the same shell HTML
  • Content loaded via client-side router — never present in initial response
  • Crawler sees: no pricing content, no product info, no links
4

JS Bundle Failure

  • One script fails (network timeout or CDN issue)
  • Browser retries → user sees the page eventually
  • Crawler gets broken render → zero content

Guard flags this as critical_bundle_failure. The page is effectively dead.

5

CDN / Bot Blocking

  • Cloudflare or other CDN returns 403 for non-browser user agents
  • Crawler never reaches your origin server
  • Result: zero crawlable content, zero visibility

This is surprisingly common. Your CDN's bot protection is actively blocking the systems you want to be visible to.

The result in every case: HTML under 10 KB, visible text under 100 words, zero internal links. That page is effectively dead to AI.

How AI Crawlers Differ from Search Engines

BehaviorGooglebotAI Crawlers
JS executionSometimes (queued)Almost never
Render delay toleranceSeconds to minutesNear zero
HTML dependencyMediumAbsolute
OutputSearch indexStructured summaries & embeddings
Retry behaviorWill revisitUsually one-shot

The key difference: AI crawlers optimize for fast extraction, not full rendering. If your content isn't in the initial HTML, it doesn't exist in their pipeline. For a deeper look at how different bots behave, see our Bots Guide.

What Content Formats Actually Work

AI systems consistently extract content from these formats. Everything else degrades.

1. Real HTML Text

  • 500–1,000+ words in <body>
  • Semantic tags: <h1>, <p>, <ul>
  • Content present in HTML — not injected via JavaScript

2. Clean Structure

  • Headings properly nested (H1 → H2 → H3)
  • Lists instead of div soup
  • Links visible in HTML (not generated by JS event handlers)

3. Markdown-Friendly Content

Internally, most AI pipelines convert HTML → Markdown before processing. If your HTML relies on JavaScript, uses dynamic rendering, or lacks structure — it degrades heavily during this conversion.

This is exactly why DataJelly generates AI Markdown snapshots — clean, structured Markdown served directly to AI crawlers, reducing token usage by up to 91% while preserving content hierarchy.

Solutions Compared

Prerendering

Works if:

  • • Under 100 routes
  • • Content rarely changes

Breaks when:

  • • Dynamic pages
  • • Stale builds
  • • Route explosion

SSR

Works if:

  • • Server always returns full HTML
  • • Hydration doesn't break

Breaks when:

  • • Slow backend
  • • Partial renders
  • • Caching inconsistencies

Edge Rendering

What actually works:

  • • Fully rendered HTML at request time
  • • Structured Markdown for AI
  • • Zero app changes required
  • • No hydration dependency

This is exactly what DataJelly's edge proxy + snapshot system does:

  • HTML snapshots for search bots — fully rendered, real content
  • AI Markdown for AI crawlers — structured, token-efficient, citation-ready
  • Zero reliance on client-side rendering

For a deeper comparison, read Prerender vs SSR vs Edge Rendering.

Practical Checklist

Run these against your site. If any fail, AI crawlers are seeing a broken page.

1. Raw HTML size

curl your page and check total size

HTML > 20 KB → good
HTML < 10 KB → problem (likely empty shell)

2. Text density

Check word count in <body>

1,000+ words → safe
< 200 words → likely invisible to AI

3. Script ratio

Check what percentage of HTML is <script> tags

Content dominates HTML
70%+ <script> → broken for AI crawlers

4. Deep link test

Test /pricing, /features, /docs individually

Each returns full HTML with real content
All return same root shell → client-side routing issue

5. Bot simulation

Remove browser headers and request your page

Same content regardless of headers
Different response → you have bot blocking or cloaking

Want to automate this? The HTTP Debug Tool runs these checks for you.

Quick Test

Quick Test: What Do Bots Actually See?

~30 seconds

Most people guess. Don't.

Run this test and look at the actual response your site returns to bots.

1

Fetch your page as Googlebot

Use your terminal:

curl -A "Googlebot" https://yourdomain.com

Look for:

  • Real visible text (not just <div id="root">)
  • Meaningful content in the HTML
  • Page size (should not be tiny)
2

Compare bot vs browser

Now test what a real browser gets:

curl -A "Mozilla/5.0" https://yourdomain.com

If these responses are different, Google is indexing a different page than your users see.

Stop guessing — measure it.

Real example: 253 words vs 13,547

We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

Bot vs browser comparison showing 253 words for Googlebot vs 13,547 words for a rendered browser on the same URL

If your HTML doesn't contain the content, Google doesn't either.

Compare Googlebot vs browser on your site → HTTP Debug Tool
3

Check for common failure signals

We see this all the time in production:

  • HTML under ~1KB → usually empty shell
  • Visible text under ~200 characters → thin or missing content
  • Missing <title> or <h1> → weak or broken page
  • Large difference between bot vs browser HTML → rendering issue

Use the DataJelly Visibility Test (Recommended)

You can run this without touching curl. It shows you:

  • Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
  • Fully rendered browser version
  • Side-by-side differences in word count, HTML size, links, and content
Run Visibility Test — Free

What this test tells you (no guessing)

After running this, you'll know:

  • Whether your HTML is actually indexable
  • Whether bots are seeing partial content
  • Whether rendering is breaking in production

This is the difference between "I think SEO is set up" and "I know what Google is indexing."

If you don't understand why this happens, read: Why Google Can't See Your SPA

If this test fails

You have three real options:

SSR

Works if you can keep it stable in production

Prerendering

Breaks with dynamic content and scale

Edge Rendering

Reflects real production output without app changes

If you do nothing, you will not rank consistently. Learn how Edge Rendering works →

This issue doesn't show up in Lighthouse. It shows up in rankings.

Run the TestAsk a Question

The Bottom Line

AI crawlers don't "figure it out later." They read exactly what you return in the first HTML response.

If your page is under 10 KB, under 100 words, and script-heavy — it does not exist to AI.

The fix is not tweaking SEO metadata. The fix is: return real HTML, return structured content, and stop depending on client-side rendering to do the heavy lifting.

Run Visibility Test — FreeTalk to Our TeamStart 7-Day Free Trial

FAQ

Related Reading

How to Check What Googlebot Actually Sees

Step-by-step raw HTML inspection — same technique applies to AI crawler debugging.

Why Google Can't See Your SPA

The fundamental rendering gap that makes JavaScript apps invisible to search engines.

React SEO Is Broken by Default

Why React ships empty HTML and what actually fixes it in production.

Prerender vs SSR vs Edge Rendering

Side-by-side comparison of rendering strategies with real production data.

AI Markdown Snapshots Guide

How DataJelly generates structured Markdown for AI crawlers.

Understanding the Bots

Directory of AI, search, and social bots crawling your site.

AI Visibility Infrastructure

Architecture for serving the right content to every consumer.

HTTP Debug Tool

Compare Googlebot vs browser responses on any URL.

Bot Visibility Test

See exactly what bots receive when they crawl your pages.

Reading progress0%

On This Page

DataJelly

SEO snapshots for modern SPAs. Making JavaScript applications search engine friendly with enterprise-grade reliability.

Product

  • DataJelly Edge
  • DataJelly Guard
  • Pricing
  • SEO Tools
  • Visibility Test
  • Dashboard

Resources

  • Blog
  • Guides
  • Getting Started
  • Prerendering
  • SPA SEO Guide

Company

  • About Us
  • Contact
  • Terms of Service
  • Privacy Policy

© 2026 DataJelly. All rights reserved. Built with love for the modern web.