[Crawl-Date: 2026-04-11]
[Source: DataJelly Visibility Layer]
[URL: https://datajelly.com/blog/how-ai-crawlers-read-your-website]
---
title: How AI Crawlers (ChatGPT, Claude, Perplexity) Actually Read Your Website | DataJelly
description: AI crawlers don't render JavaScript. If your HTML is empty, you're invisible to ChatGPT, Claude, and Perplexity. Here's what actually happens — and how to fix it.
url: https://datajelly.com/blog/how-ai-crawlers-read-your-website
canonical: https://datajelly.com/blog/how-ai-crawlers-read-your-website
og_title: DataJelly - The Visibility Layer for Modern Apps
og_description: Rich social previews for Slack &amp; Twitter. AI-readable content for ChatGPT &amp; Perplexity. Zero-code setup.
og_image: https://datajelly.com/datajelly-og-image.png
twitter_card: summary_large_image
twitter_image: https://datajelly.com/datajelly-og-image.png
---

# How AI Crawlers (ChatGPT, Claude, Perplexity) Actually Read Your Website | DataJelly
> AI crawlers don't render JavaScript. If your HTML is empty, you're invisible to ChatGPT, Claude, and Perplexity. Here's what actually happens — and how to fix it.

---

## The Real Problem

Browser view

3,000–8,000

words, full UI

Raw HTML response

4–12 KB

mostly <script> tags

AI crawler view

~0

usable content

We see this constantly on React, Vite, and [Lovable](https://datajelly.com/guides/lovable-seo) builds. The browser shows a fully interactive page with thousands of words of content. The raw HTML response — what AI crawlers actually receive — contains almost nothing.

This is not a minor SEO issue. This is **missing HTML**.

If your `<body>` doesn't contain real text on first response, AI systems ignore it. Not "eventually index it." Not "figure it out later." **Ignore it.**

## What's Actually Happening

AI crawlers behave like fast HTTP clients, not browsers. They don't open Chrome. They don't wait for your React app to hydrate. They don't call your APIs.

Here's what they actually do:

1. 1**Fetch HTML** — a single HTTP GET request
2. 2**Extract visible text + structure** — headings, paragraphs, lists, links
3. 3**Convert to internal format** — often Markdown-like for downstream processing
4. 4**Store embeddings** — for retrieval and citation in AI responses

They do **not** wait for hydration. They do **not** run your React app. They do **not** call your APIs.
## Concrete Example

Request your page with `curl`:

curl -s https://yourdomain.com | wc -c
## HTML size: 7 KB

curl -s https://yourdomain.com | grep -oP '(?<=<body>).*(?=</body>)' | wc -w
## <body> text: ~20 words

That's the entire page to an AI crawler. After hydration, your browser shows 5,000+ words. The crawler never sees any of it.
💡 This is the same fundamental gap we cover in [Why Google Can't See Your SPA](https://datajelly.com/blog/why-google-cant-see-your-spa) — but AI crawlers are even less forgiving because they almost never attempt JavaScript execution.

## What Most Guides Get Wrong

Most SEO content still assumes things that are flatly wrong for AI crawlers:

Bots execute JavaScript → AI crawlers almost never do

Rendering eventually happens → AI crawlers have near-zero delay tolerance

Content gets picked up later → if it's not in the first HTML response, it doesn't exist

If you're reading a guide that says "Google renders JavaScript" and assumes the same applies to ChatGPT, Claude, or Perplexity — it's wrong. These systems optimize for **fast extraction, not full rendering**.

## What Breaks in Production

These are not rare edge cases. We see these failures on production sites every week. They're standard failure patterns for JavaScript apps.

1
## Script Shell Pages

- HTML: 5–15 KB — almost entirely `<script>` tags
- Visible text: under 50 words
- This is the exact pattern [Guard](https://datajelly.com/products/guard) flags as `script_shell_only`

The AI crawler receives a page that is functionally empty. Zero indexable content.

2
### Partial Hydration

- Header renders server-side → visible
- Main content injected via JS → invisible to crawlers
- Page looks "fine" to humans — `<h1>` present but body text missing

The crawler captures an incomplete page. Your heading says "Pricing" but there's no pricing content.

3
### Broken Deep Links

- `/pricing`, `/features`, `/docs` all return the same shell HTML
- Content loaded via client-side router — never present in initial response
- Crawler sees: no pricing content, no product info, no links

4
### JS Bundle Failure

- One script fails (network timeout or CDN issue)
- Browser retries → user sees the page eventually
- Crawler gets broken render → zero content

Guard flags this as `critical_bundle_failure`. The page is effectively dead.

5
### CDN / Bot Blocking

- Cloudflare or other CDN returns 403 for non-browser user agents
- Crawler never reaches your origin server
- Result: zero crawlable content, zero visibility

This is surprisingly common. Your CDN's bot protection is actively blocking the systems you want to be visible to.

The result in every case: HTML under 10 KB, visible text under 100 words, zero internal links. That page is effectively dead to AI.

## How AI Crawlers Differ from Search Engines
| Behavior | Googlebot | AI Crawlers |
| --- | --- | --- |
| JS execution | Sometimes (queued) | Almost never |
| Render delay tolerance | Seconds to minutes | Near zero |
| HTML dependency | Medium | Absolute |
| Output | Search index | Structured summaries & embeddings |
| Retry behavior | Will revisit | Usually one-shot |
The key difference: AI crawlers optimize for **fast extraction**, not full rendering. If your content isn't in the initial HTML, it doesn't exist in their pipeline. For a deeper look at how different bots behave, see our [Bots Guide](https://datajelly.com/guides/bots) .

## What Content Formats Actually Work

AI systems consistently extract content from these formats. Everything else degrades.
## 1. Real HTML Text

- 500–1,000+ words in `<body>`
- Semantic tags: `<h1>`, `<p>`, `<ul>`
- Content present in HTML — not injected via JavaScript
## 2. Clean Structure

- Headings properly nested (H1 → H2 → H3)
- Lists instead of div soup
- Links visible in HTML (not generated by JS event handlers)
## 3. Markdown-Friendly Content

Internally, most AI pipelines convert HTML → Markdown before processing. If your HTML relies on JavaScript, uses dynamic rendering, or lacks structure — it degrades heavily during this conversion.
This is exactly why DataJelly generates [AI Markdown snapshots](https://datajelly.com/guides/ai-markdown-view) — clean, structured Markdown served directly to AI crawlers, reducing token usage by up to 91% while preserving content hierarchy.

## Solutions Compared
## Prerendering

Works if:

- • Under 100 routes
- • Content rarely changes

Breaks when:

- • Dynamic pages
- • Stale builds
- • Route explosion
## SSR

Works if:

- • Server always returns full HTML
- • Hydration doesn't break

Breaks when:

- • Slow backend
- • Partial renders
- • Caching inconsistencies
## Edge Rendering

What actually works:

- • Fully rendered HTML at request time
- • Structured Markdown for AI
- • Zero app changes required
- • No hydration dependency

This is exactly what [DataJelly's edge proxy + snapshot system](https://datajelly.com/products/edge) does:

- **HTML snapshots** for search bots — fully rendered, real content
- **AI Markdown** for AI crawlers — structured, token-efficient, citation-ready
- **Zero reliance** on client-side rendering

For a deeper comparison, read [Prerender vs SSR vs Edge Rendering](https://datajelly.com/blog/prerender-vs-ssr-vs-edge-rendering) .

## Practical Checklist

Run these against your site. If any fail, AI crawlers are seeing a broken page.
## 1. Raw HTML size

curl your page and check total size

HTML > 20 KB → good

HTML < 10 KB → problem (likely empty shell)
### 2. Text density

Check word count in <body>

1,000+ words → safe

< 200 words → likely invisible to AI
### 3. Script ratio

Check what percentage of HTML is <script> tags

Content dominates HTML

70%+ <script> → broken for AI crawlers
### 4. Deep link test

Test /pricing, /features, /docs individually

Each returns full HTML with real content

All return same root shell → client-side routing issue
### 5. Bot simulation

Remove browser headers and request your page

Same content regardless of headers

Different response → you have bot blocking or cloaking

Want to automate this? The [HTTP Debug Tool](https://datajelly.com/seo-tools/http-debug) runs these checks for you.

## Quick Test
## Quick Test: What Do Bots Actually See?

~30 seconds

Most people guess. Don't.

Run this test and look at the actual response your site returns to bots.

1
### Fetch your page as Googlebot

Use your terminal:

`curl -A "Googlebot" https://yourdomain.com`

Look for:

- Real visible text (not just `<div id="root">`)
- Meaningful content in the HTML
- Page size (should not be tiny)

2
### Compare bot vs browser

Now test what a real browser gets:

`curl -A "Mozilla/5.0" https://yourdomain.com`

If these responses are different, Google is indexing a different page than your users see.

Stop guessing — measure it.
### Real example: 253 words vs 13,547

We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.
[![Bot vs browser comparison showing 253 words for Googlebot vs 13,547 words for a rendered browser on the same URL](https://datajelly.com/assets/bot-comparison-proof-BSBvKXDf.png) ](https://datajelly.com/assets/bot-comparison-proof-BSBvKXDf.png)
If your HTML doesn't contain the content, Google doesn't either.
[Compare Googlebot vs browser on your site → HTTP Debug Tool](https://datajelly.com/seo-tools/http-debug)

3
### Check for common failure signals

We see this all the time in production:

- HTML under ~1KB → usually empty shell
- Visible text under ~200 characters → thin or missing content
- Missing <title> or <h1> → weak or broken page
- Large difference between bot vs browser HTML → rendering issue
### Use the DataJelly Visibility Test (Recommended)

You can run this without touching curl. It shows you:

- Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
- Fully rendered browser version
- Side-by-side differences in word count, HTML size, links, and content

[Run Visibility Test — Free](https://datajelly.com/#visibility-test)
### What this test tells you (no guessing)

After running this, you'll know:

- Whether your HTML is actually indexable
- Whether bots are seeing partial content
- Whether rendering is breaking in production

This is the difference between *"I think SEO is set up"* and **"I know what Google is indexing."**

If you don't understand why this happens, read: [Why Google Can't See Your SPA](https://datajelly.com/blog/why-google-cant-see-your-spa)
### If this test fails

You have three real options:

SSR

Works if you can keep it stable in production

Prerendering

Breaks with dynamic content and scale

Edge Rendering

Reflects real production output without app changes

If you do nothing, you will not rank consistently. [Learn how Edge Rendering works →](https://datajelly.com/products/edge)

This issue doesn't show up in Lighthouse. It shows up in rankings.

[Run the Test](https://datajelly.com/#visibility-test) [Ask a Question](https://datajelly.com/contact)
## The Bottom Line

AI crawlers don't "figure it out later." They read exactly what you return in the first HTML response.

If your page is under 10 KB, under 100 words, and script-heavy — **it does not exist to AI**.

The fix is not tweaking SEO metadata. The fix is: return real HTML, return structured content, and stop depending on client-side rendering to do the heavy lifting.

[Run Visibility Test — Free](https://datajelly.com/#visibility-test) [Talk to Our Team](https://datajelly.com/contact) [Start 14-Day Free Trial](https://datajelly.com/pricing)

## FAQ
## Do AI crawlers execute JavaScript?
## Why does my site work in the browser but not for AI?
## How can I verify what AI crawlers actually see?
## Is SSR enough to fix AI visibility?
## What is AI Markdown?
## Why do SPAs fail for AI crawlers?
## What's the fastest fix for AI visibility?
## Related Reading

[Why Google Can't See Your SPA
The fundamental rendering gap that makes JavaScript apps invisible to search engines.](https://datajelly.com/blog/why-google-cant-see-your-spa) [React SEO Is Broken by Default
Why React ships empty HTML and what actually fixes it in production.](https://datajelly.com/blog/react-seo-broken-by-default) [Prerender vs SSR vs Edge Rendering
Side-by-side comparison of rendering strategies with real production data.](https://datajelly.com/blog/prerender-vs-ssr-vs-edge-rendering) [AI Markdown Snapshots Guide
How DataJelly generates structured Markdown for AI crawlers.](https://datajelly.com/guides/ai-markdown-view) [Understanding the Bots
Directory of AI, search, and social bots crawling your site.](https://datajelly.com/guides/bots) [AI Visibility Infrastructure
Architecture for serving the right content to every consumer.](https://datajelly.com/guides/ai-visibility-infrastructure) [HTTP Debug Tool
Compare Googlebot vs browser responses on any URL.](https://datajelly.com/seo-tools/http-debug) [Bot Visibility Test
See exactly what bots receive when they crawl your pages.](https://datajelly.com/seo-tools/bot-test)

## Structured Data (JSON-LD)
```json
{"@context":"https://schema.org","@type":"FAQPage","mainEntity":[{"@type":"Question","name":"Do AI crawlers execute JavaScript?","acceptedAnswer":{"@type":"Answer","text":"No. In most cases AI crawlers do not execute JavaScript. They fetch your HTML and extract text directly. If your content depends on JS to render, it will not be seen by ChatGPT, Claude, Perplexity, or similar systems."}},{"@type":"Question","name":"Why does my site work in the browser but not for AI?","acceptedAnswer":{"@type":"Answer","text":"Because browsers execute JavaScript, build the DOM, call your APIs, and render the full page. AI crawlers skip all of that. They read the raw HTML response \u2014 and if it\u0027s an empty shell with \u003Cscript\u003E tags, that\u0027s all they get."}},{"@type":"Question","name":"How can I verify what AI crawlers actually see?","acceptedAnswer":{"@type":"Answer","text":"Run curl on your page without browser headers. If the HTML doesn\u0027t contain your actual content \u2014 headings, paragraphs, product info \u2014 then AI crawlers can\u0027t see it either. You can also use the DataJelly Visibility Test to compare bot vs browser output side-by-side."}},{"@type":"Question","name":"Is SSR enough to fix AI visibility?","acceptedAnswer":{"@type":"Answer","text":"Only if your SSR consistently returns complete, fully-rendered HTML on every request. In practice, many SSR setups fail due to hydration errors, slow backends, or partial renders that produce incomplete content. You have to verify the actual output."}},{"@type":"Question","name":"What is AI Markdown?","acceptedAnswer":{"@type":"Answer","text":"AI Markdown is a structured, token-efficient version of your page content optimized for AI extraction. It strips layout noise, preserves content hierarchy, and reduces token usage by up to 91% compared to raw HTML \u2014 making it significantly easier for AI systems to parse and cite."}},{"@type":"Question","name":"Why do SPAs fail for AI crawlers?","acceptedAnswer":{"@type":"Answer","text":"SPAs ship an empty HTML shell (usually just a \u003Cdiv id=\u0022root\u0022\u003E\u003C/div\u003E) and rely entirely on JavaScript to populate content. Since AI crawlers don\u0027t execute JS, they see nothing \u2014 literally zero usable content from the page."}},{"@type":"Question","name":"What\u0027s the fastest fix for AI visibility?","acceptedAnswer":{"@type":"Answer","text":"Serve fully rendered HTML to bots and structured AI Markdown to AI crawlers at the edge. This works without changing your app code, without rewriting your frontend, and without depending on client-side rendering to work perfectly every time."}}]}
```


## Discovery & Navigation
> Semantic links for AI agent traversal.

* [DataJelly Edge](https://datajelly.com/products/edge)
* [DataJelly Guard](https://datajelly.com/products/guard)
* [Features](https://datajelly.com/#features)
* [Pricing](https://datajelly.com/pricing)
* [Visibility Test](https://datajelly.com/visibility-test)
* [Prerendering](https://datajelly.com/prerendering)
* [Prerender Alternative](https://datajelly.com/prerender-alternative)
* [Lovable SEO](https://datajelly.com/lovable-seo)
* [Visibility Layer Guide](https://datajelly.com/guides/visibility-layer)
* [How Snapshots Work](https://datajelly.com/guides/how-snapshots-work)
* [AI SEO Platform](https://datajelly.com/ai-seo-platform)
* [Bot Detection](https://datajelly.com/bot-detection)
* [Dashboard](https://dashboard.datajelly.com/)
* [SEO Tools](https://datajelly.com/seo-tools)
* [Visibility Test](https://datajelly.com/seo-tools/visibility-test)
* [Site Audit](https://datajelly.com/seo-tools/site-audit)
* [Bot Test](https://datajelly.com/seo-tools/bot-test)
* [Social Card Preview](https://datajelly.com/seo-tools/social-card-preview)
* [Robots.txt Tester](https://datajelly.com/seo-tools/robots-txt-tester)
* [Sitemap Validator](https://datajelly.com/seo-tools/sitemap-validator)
* [Structured Data Validator](https://datajelly.com/seo-tools/structured-data-validator)
* [HTTP Header Checker](https://datajelly.com/seo-tools/http-header-checker)
* [Page Speed Analyzer](https://datajelly.com/seo-tools/page-speed-analyzer)
* [SSL Certificate Checker](https://datajelly.com/seo-tools/ssl-checker)
* [DNS Records Viewer](https://datajelly.com/seo-tools/dns-records-viewer)
* [Guides](https://datajelly.com/guides)
* [Getting Started](https://datajelly.com/guides/getting-started)
* [SPA SEO Guide](https://datajelly.com/guides/spa-seo)
* [JavaScript SEO Guide](https://datajelly.com/guides/javascript-seo)
* [SSR Guide](https://datajelly.com/guides/ssr)
* [Search Engine Crawling Guide](https://datajelly.com/guides/search-engine-crawling)
* [Lovable SEO Guide](https://datajelly.com/guides/lovable-seo)
* [AI SEO Testing Guide](https://datajelly.com/guides/ai-seo)
* [SEO Testing Guide](https://datajelly.com/guides/seo-testing)
* [SERP Tracking Guide](https://datajelly.com/guides/serp-tracking)
* [Security Testing Guide](https://datajelly.com/security)
* [About Us](https://datajelly.com/about)
* [Contact](https://datajelly.com/contact)
* [Blog](https://datajelly.com/blog)
* [Terms of Service](https://datajelly.com/terms)
