[Crawl-Date: 2026-04-13]
[Source: DataJelly Visibility Layer]
[URL: https://datajelly.com/blog/check-what-googlebot-sees]
---
title: How to Check What Googlebot Actually Sees (Step-by-Step) | DataJelly
description: Your page renders in Chrome. Googlebot gets 3KB of empty HTML. Here's how to verify what bots actually receive — and why it matters for indexing.
url: https://datajelly.com/blog/check-what-googlebot-sees
canonical: https://datajelly.com/blog/check-what-googlebot-sees
og_title: DataJelly - The Visibility Layer for Modern Apps
og_description: Rich social previews for Slack &amp; Twitter. AI-readable content for ChatGPT &amp; Perplexity. Zero-code setup.
og_image: https://datajelly.com/datajelly-og-image.png
twitter_card: summary_large_image
twitter_image: https://datajelly.com/datajelly-og-image.png
---

# How to Check What Googlebot Actually Sees (Step-by-Step) | DataJelly
> Your page renders in Chrome. Googlebot gets 3KB of empty HTML. Here's how to verify what bots actually receive — and why it matters for indexing.

---

We see this all the time:

3 KB

What Googlebot receives

0 words

Visible content for bots

180 KB

What Chrome renders

If you don't inspect the raw HTML response Googlebot receives, you are guessing. And guessing is why pages get crawled and never indexed.
Google indexes HTML. Not your app. Not your JavaScript. Just HTML.

## The Gap Nobody Talks About

Modern JS apps ship a tiny HTML shell and rely on client-side rendering. The browser downloads your JavaScript, executes it, hydrates the DOM, and renders content. Works great for humans.

Googlebot does something different. It requests your HTML, gets that tiny shell, and *sometimes* queues your page for JavaScript rendering later. Sometimes hours later. Sometimes never.

That gap — between what Chrome shows and what the initial HTTP response contains — is why your page gets crawled and never indexed. If your HTML response is empty, your page doesn't exist to Google. Everything else is noise.

## What's Actually Happening

Most React / Vite / SPA apps return something like this to every request — including Googlebot:

<!DOCTYPE html>
<html>
<head><title>My App</title></head>
<body>
  <div id="root"></div>
  <script type="module" src="/assets/index-abc123.js"></script>
</body>
</html>

That's it. 2–8KB of HTML. One empty div. Several script tags. No real content.
## What the Browser Does

1. Downloads the JavaScript bundle
2. Executes it
3. Hydrates the DOM
4. Renders your content

Result: Full page, all content visible
### What Googlebot Does

1. Requests the HTML
2. Sometimes queues JS rendering
3. Often indexes the initial response
4. Moves on

Result: Empty shell, nothing to index

Your own snapshot system literally waits for DOM stability, router readiness, and hydration before extracting content. Googlebot does not wait that long.

## Step-by-Step: Check What Googlebot Actually Sees

1
## Fetch the Raw HTML as Googlebot

Make a request using a Googlebot user agent. This is the single most important diagnostic step — it shows you exactly what Google's crawler receives.

curl -s -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
  https://yoursite.com | head -100

What you're looking for:

- **HTML size** — how many bytes came back
- **Visible text** — actual words, not script tags
- **Real markup** — h1, paragraphs, links, not just a root div

Hard signals:

HTML size 20KB+ < 5KB

Visible text 200+ chars < 50 chars

Content tags h1, p, a tags Only script tags

If you see mostly script tags, the page is not indexable. Full stop.

2
## Fetch the Same Page as a Browser

Now request the same URL with a normal browser user agent. This gives you the baseline for comparison.

curl -s -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \
  https://yoursite.com | wc -c

Compare these numbers:

MetricBrowserGooglebot
HTML size

180KB

6KB

Word count

1,200

0

Structure

Full DOM

Empty div

If those differ significantly, you have a rendering gap. This breaks in production constantly.

3
## Diff the Actual Content

Don't just compare sizes — inspect the content itself. Run a direct comparison:

- Is your main text present in the raw HTML?
- Are headings (h1, h2) visible?
- Are internal links crawlable?

Example failure we see constantly:

Browser HTML

220KB, full pricing page with tables, CTAs, and FAQ

Googlebot HTML

4KB, empty div, three script tags

Result: Google indexes nothing. The page is invisible.

Or use our [HTTP Bot Comparison Tool](https://datajelly.com/seo-tools/bot-test) to do this automatically — it fetches as multiple user agents and shows you the diff side by side.

4
## Verify Response Behavior (Not Just Content)

Content doesn't matter if the request itself fails. Check the response before inspecting the body:

curl -sI -A "Mozilla/5.0 (compatible; Googlebot/2.1)" https://yoursite.com

- Status must be **200**
- No redirect loops
- No 403 / blocked responses
- No host mismatch issues

Real infrastructure failures we see:

- **Cloudflare returns 5xx** → bot gets an error page instead of content
- **TLS handshake fails** → bot gets nothing at all
- **Wrong host header** → origin server rejects the request
- **CDN cache miss** → origin timeout, bot gets stale or empty response

If the request fails, content doesn't matter. Check the response first.

Our [HTTP Debug Tool](https://datajelly.com/seo-tools/http-debug) checks all of these automatically — status codes, redirects, headers, TLS, and response differences between user agents.

5
## Validate Minimum Content Signals

Use hard thresholds. These aren't "nice-to-have checks" — they map directly to indexing failures:

Visible text

> 200 characters

HTML size

> 15KB

Title tag

Present and populated

H1 tag

Present and populated

Our [Guard system](https://datajelly.com/products/guard) already encodes these exact checks:

- **Blank page:** visible text < 200 characters
- **Script shell:** HTML present but contains no real content
- **Partial hydration:** some content present but critical sections missing

## What Most Guides Get Wrong

Most SEO guides operate on three dangerous assumptions:

"Google will render your page"

Rendering is delayed, inconsistent, and never guaranteed. You can't depend on it.

"JavaScript execution is reliable"

Bundle failures, hydration crashes, and API timeouts silently kill rendering for bots.

"GSC reflects reality"

The URL Inspection tool renders with a full Chrome instance. Real Googlebot crawling is far more constrained.

What they ignore entirely:

- Initial HTML size (the single most important signal)
- Missing content in the first HTTP response
- Infrastructure failures — TLS, CDN, headers

Google does not guarantee rendering. If your HTML is empty, your page is invisible. That's not an edge case — it's the default for every React, Vue, and Angular SPA.

## What We See in Production

These aren't theoretical. They're common, repeatable failure patterns we diagnose on real sites every week.

1
## Empty HTML Shell

HTML: 3KB. Content: none. Just script tags and an empty root div.

Result: Crawled, never indexed. Google has nothing to work with.

2
### Hydration Crash

JavaScript throws an error during hydration. The DOM never renders. This shows up intermittently — sometimes the page works, sometimes it doesn't.

Result: Intermittent indexing, unstable rankings, impossible to debug without raw HTML inspection.

3
### CDN / Origin Blocking

The page works perfectly in a browser. But the bot gets a 403 or 5xx from Cloudflare, Vercel, or your WAF. The origin rejects the request based on user-agent, IP range, or rate limiting.

Result: Google drops the page entirely. No indexing, no ranking.

4
### Deep Link Failure

/pricing works in the browser because client-side routing handles it. But when Googlebot requests /pricing directly, the server returns 404 because there's no server-side route configured.

Result: Key revenue pages never indexed. Your SPA fallback serves humans but not bots.

5
### Content Mismatch

Browser sees 180KB of HTML with 1,500 words. Bot sees 6KB with approximately zero words. Same URL, completely different experience.

Result: Page exists in the index but Google has nothing meaningful to rank it for.

## Solutions Compared: Prerender vs SSR vs Edge
## Prerendering

Static HTML generated ahead of time. Works for fixed routes.

**Breaks when:** Pages are dynamic, routes scale beyond a few hundred, or content changes frequently.
### Server-Side Rendering (SSR)

Server returns full HTML per request. Works reliably.

**Costs:** Complexity, infrastructure overhead, and it's a hard retrofit for existing SPAs. You're effectively rewriting your rendering pipeline.
### Edge Proxy (DataJelly Approach)

Detect bots at the edge. Serve HTML snapshots to search bots. Serve structured Markdown to AI crawlers. Serve your app normally to humans.

**Search bots** → fully rendered HTML snapshot

**AI crawlers** → structured Markdown

**Humans** → your SPA, unchanged

No app rewrite. No infra overhead. Fixes the empty HTML problem immediately.

## Practical Checklist

Run this every time you deploy, launch a new page, or change your rendering setup:

1
Fetch as Googlebot — inspect raw HTML

2
Check HTML size — target > 20KB

3
Verify visible text — target > 200 characters

4
Confirm title and H1 are present

5
Compare bot vs browser output

6
Check HTTP status — must be 200

7
Test deep links directly (not via nav)

8
Verify no CDN or WAF blocking

9
Inspect raw HTML — not the rendered DOM
If any of these fail, indexing will fail.

## Skip the Terminal — Use Our HTTP Bot Comparison Tool

The curl workflow above works, but it's manual and tedious. We built the [HTTP Bot Comparison Tool](https://datajelly.com/seo-tools/bot-test) to automate the entire process — and it checks bot categories that curl can't easily simulate.

Enter any URL and the tool runs two parallel fetches: a **Raw HTTP** request (what bots get on the first hit) and a **JS Rendered** request (what a headless browser produces after executing JavaScript). You get a side-by-side comparison instantly.
## What it checks across bot types
### Search Bots

Googlebot, Bingbot, Yandex

- • HTML size & word count
- • Title & meta description
- • Status code & redirects
- • Content gap (raw vs rendered)
### AI Crawlers

ChatGPT, Claude, Perplexity

- • Markdown vs HTML format detection
- • Content structure quality
- • Word count & readability
- • Whether AI gets usable content
### Social Bots

Facebook, Twitter, LinkedIn

- • Open Graph tags (og:title, og:image)
- • Twitter Card meta
- • Social preview accuracy
- • Missing or generic metadata
## What the results tell you

The tool shows two result sets — **Raw HTTP** and **JS Rendered** — with a comparison view that highlights the gap:

Raw HTTP Results

What bots receive on the first request — URL, status, user-agent, word count, HTML size, title, meta description, social card tags, and the actual HTML content. This is what you'd get from the curl commands above, but automated.

JS Rendered Results

What a headless browser produces after JavaScript execution and DOM hydration. This represents what your users see — and what you *think* Google sees.

Comparison View

Side-by-side diff showing word count difference (e.g., +3,301 / 526%), HTML size gap (e.g., 4.2KB → 21.6KB), format detection (AI Markdown vs HTML), and whether titles or social tags differ between raw and rendered responses.

Example from a real site we tested:

Raw HTTP

628 words

4.2 KB

JS Rendered

3,929 words

21.6 KB

Gap

+526%

content invisible to bots

That's 3,301 words of content that search bots and AI crawlers never see. The page looks perfect in Chrome — and is nearly empty to Googlebot.

Try it on your own site — takes 30 seconds
[Open HTTP Bot Comparison Tool](https://datajelly.com/seo-tools/bot-test)

## Quick Test: Verify Your Site Right Now
## Quick Test: What Do Bots Actually See?

~30 seconds

Most people guess. Don't.

Run this test and look at the actual response your site returns to bots.

1
### Fetch your page as Googlebot

Use your terminal:

`curl -A "Googlebot" https://yourdomain.com`

Look for:

- Real visible text (not just `<div id="root">`)
- Meaningful content in the HTML
- Page size (should not be tiny)

2
### Compare bot vs browser

Now test what a real browser gets:

`curl -A "Mozilla/5.0" https://yourdomain.com`

If these responses are different, Google is indexing a different page than your users see.

Stop guessing — measure it.
### Real example: 253 words vs 13,547

We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.
[![Bot vs browser comparison showing 253 words for Googlebot vs 13,547 words for a rendered browser on the same URL](https://datajelly.com/assets/bot-comparison-proof-BSBvKXDf.png) ](https://datajelly.com/assets/bot-comparison-proof-BSBvKXDf.png)
If your HTML doesn't contain the content, Google doesn't either.
[Compare Googlebot vs browser on your site → HTTP Debug Tool](https://datajelly.com/seo-tools/http-debug)

3
### Check for common failure signals

We see this all the time in production:

- HTML under ~1KB → usually empty shell
- Visible text under ~200 characters → thin or missing content
- Missing <title> or <h1> → weak or broken page
- Large difference between bot vs browser HTML → rendering issue
### Use the DataJelly Visibility Test (Recommended)

You can run this without touching curl. It shows you:

- Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
- Fully rendered browser version
- Side-by-side differences in word count, HTML size, links, and content

[Run Visibility Test — Free](https://datajelly.com/#visibility-test)
### What this test tells you (no guessing)

After running this, you'll know:

- Whether your HTML is actually indexable
- Whether bots are seeing partial content
- Whether rendering is breaking in production

This is the difference between *"I think SEO is set up"* and **"I know what Google is indexing."**

If you don't understand why this happens, read: [Why Google Can't See Your SPA](https://datajelly.com/blog/why-google-cant-see-your-spa)
### If this test fails

You have three real options:

SSR

Works if you can keep it stable in production

Prerendering

Breaks with dynamic content and scale

Edge Rendering

Reflects real production output without app changes

If you do nothing, you will not rank consistently. [Learn how Edge Rendering works →](https://datajelly.com/products/edge)

This issue doesn't show up in Lighthouse. It shows up in rankings.

[Run the Test](https://datajelly.com/#visibility-test) [Ask a Question](https://datajelly.com/contact)

[HTTP Bot Comparison Tool
Compare what browsers vs bots receive from your URL](https://datajelly.com/seo-tools/bot-test) [SEO Page Validator
Check HTML size, word count, and bot-readiness](https://datajelly.com/seo-tools/page-validator)
## Stop guessing. Fix your visibility.

DataJelly serves fully rendered HTML to search bots and structured Markdown to AI crawlers — without changing your app. Set up in under 15 minutes.

Start 14-Day Free Trial[Got a question? Ask us](https://datajelly.com/contact)

No credit card required · Cancel anytime
## See the gap for yourself

Our homepage visibility test compares what bots see vs what your users see. Takes under 60 seconds.
[Run the Visibility Test](https://datajelly.com/?utm=blog-googlebot-sees#visibility-test)
(No signup required)

## Frequently Asked Questions
## How do I check what Googlebot sees?
## What HTML size indicates a problem?
## Why is my page not indexed even though it works in Chrome?
## Does Google always render JavaScript?
## What is a script shell page?
## How do I fix missing content for bots?
## Do AI crawlers behave the same as Googlebot?
## Related Reading

[React SEO Is Broken by Default
Why React apps ship empty HTML and how to fix it.](https://datajelly.com/blog/react-seo-broken-by-default) [Page Crawled But Not Indexed
The real reasons behind 'crawled — currently not indexed'.](https://datajelly.com/blog/crawled-not-indexed) [Why Google Can't See Your SPA
The rendering gap that makes SPAs invisible to search.](https://datajelly.com/blog/why-google-cant-see-your-spa) [How AI Crawlers Read Your Website
AI bots don't render JS. Here's what they actually see.](https://datajelly.com/blog/how-ai-crawlers-read-your-website) [HTTP Bot Comparison Tool
Compare what browsers vs bots receive from your site.](https://datajelly.com/seo-tools/bot-test) [SEO Page Validator
Check HTML size, word count, and bot-readiness in one scan.](https://datajelly.com/seo-tools/page-validator) [Prerender vs SSR vs Edge
Architectural comparison of all three rendering approaches.](https://datajelly.com/blog/prerender-vs-ssr-vs-edge-rendering) [Vite SEO Problems
Why Vite apps ship 3–7KB HTML while browsers see 120KB.](https://datajelly.com/blog/vite-seo-problems)

## Structured Data (JSON-LD)
```json
{"@context":"https://schema.org","@type":"FAQPage","mainEntity":[{"@type":"Question","name":"How do I check what Googlebot sees?","acceptedAnswer":{"@type":"Answer","text":"Fetch the page using a Googlebot user agent and inspect the raw HTML response. Do not rely on browser dev tools \u2014 they show the rendered DOM, not what bots actually receive."}},{"@type":"Question","name":"What HTML size indicates a problem?","acceptedAnswer":{"@type":"Answer","text":"HTML under 5KB is almost always broken. Pages between 5\u201315KB are likely incomplete. Most indexable pages return at least 20KB of HTML with real content markup."}},{"@type":"Question","name":"Why is my page not indexed even though it works in Chrome?","acceptedAnswer":{"@type":"Answer","text":"Because Chrome renders JavaScript and hydrates your app. Googlebot often indexes the initial HTML response \u2014 and if that response is a 3KB shell with only script tags, there\u0027s nothing to index."}},{"@type":"Question","name":"Does Google always render JavaScript?","acceptedAnswer":{"@type":"Answer","text":"No. JavaScript rendering is delayed and inconsistent. Google uses a two-phase system where raw HTML is processed first and JS rendering is queued for later \u2014 sometimes hours or days. You cannot depend on it."}},{"@type":"Question","name":"What is a script shell page?","acceptedAnswer":{"@type":"Answer","text":"A page that returns HTML containing mostly script tags with no visible text content. The body is typically a single empty div plus JavaScript bundles. These pages are not indexable."}},{"@type":"Question","name":"How do I fix missing content for bots?","acceptedAnswer":{"@type":"Answer","text":"Serve real HTML using prerendering, server-side rendering, or an edge proxy like DataJelly that returns fully rendered snapshots to bots while keeping your SPA intact for users."}},{"@type":"Question","name":"Do AI crawlers behave the same as Googlebot?","acceptedAnswer":{"@type":"Answer","text":"No. AI crawlers (ChatGPT, Claude, Perplexity) behave like fast HTTP clients that don\u0027t execute JavaScript at all. They prefer clean structured text \u2014 serving Markdown improves visibility in AI tools."}}]}
```


## Discovery & Navigation
> Semantic links for AI agent traversal.

* [DataJelly Edge](https://datajelly.com/products/edge)
* [DataJelly Guard](https://datajelly.com/products/guard)
* [Features](https://datajelly.com/#features)
* [Pricing](https://datajelly.com/pricing)
* [Visibility Test](https://datajelly.com/visibility-test)
* [Prerendering](https://datajelly.com/prerendering)
* [Prerender Alternative](https://datajelly.com/prerender-alternative)
* [Lovable SEO](https://datajelly.com/lovable-seo)
* [Visibility Layer Guide](https://datajelly.com/guides/visibility-layer)
* [How Snapshots Work](https://datajelly.com/guides/how-snapshots-work)
* [AI SEO Platform](https://datajelly.com/ai-seo-platform)
* [Bot Detection](https://datajelly.com/bot-detection)
* [Dashboard](https://dashboard.datajelly.com/)
* [SEO Tools](https://datajelly.com/seo-tools)
* [Visibility Test](https://datajelly.com/seo-tools/visibility-test)
* [Site Audit](https://datajelly.com/seo-tools/site-audit)
* [Bot Test](https://datajelly.com/seo-tools/bot-test)
* [Social Card Preview](https://datajelly.com/seo-tools/social-card-preview)
* [Robots.txt Tester](https://datajelly.com/seo-tools/robots-txt-tester)
* [Sitemap Validator](https://datajelly.com/seo-tools/sitemap-validator)
* [Structured Data Validator](https://datajelly.com/seo-tools/structured-data-validator)
* [HTTP Header Checker](https://datajelly.com/seo-tools/http-header-checker)
* [Page Speed Analyzer](https://datajelly.com/seo-tools/page-speed-analyzer)
* [SSL Certificate Checker](https://datajelly.com/seo-tools/ssl-checker)
* [DNS Records Viewer](https://datajelly.com/seo-tools/dns-records-viewer)
* [Guides](https://datajelly.com/guides)
* [Getting Started](https://datajelly.com/guides/getting-started)
* [SPA SEO Guide](https://datajelly.com/guides/spa-seo)
* [JavaScript SEO Guide](https://datajelly.com/guides/javascript-seo)
* [SSR Guide](https://datajelly.com/guides/ssr)
* [Search Engine Crawling Guide](https://datajelly.com/guides/search-engine-crawling)
* [Lovable SEO Guide](https://datajelly.com/guides/lovable-seo)
* [AI SEO Testing Guide](https://datajelly.com/guides/ai-seo)
* [SEO Testing Guide](https://datajelly.com/guides/seo-testing)
* [SERP Tracking Guide](https://datajelly.com/guides/serp-tracking)
* [Security Testing Guide](https://datajelly.com/security)
* [About Us](https://datajelly.com/about)
* [Contact](https://datajelly.com/contact)
* [Blog](https://datajelly.com/blog)
* [Terms of Service](https://datajelly.com/terms)
