DJ
DataJelly
Visibility Test
EdgeGuard
PricingSEO ToolsGuidesGet Started
Dashboard

DataJelly Guard • JavaScript SEO Pillar Guide

How to Test What Google Actually Sees

Chrome showing a page is not enough. Lighthouse passing is not enough. HTTP 200 is not enough. User-visible pages can still be thin, canonicalized wrong, noindexed, missing crawlable links, or unreadable to crawlers.

Do not ask “does the page load?” Ask “what did the crawler receive?”

Need page-level monitoring after deploy? See Guard.

What “what Google sees” actually means

  • Final resolved URL and redirect path
  • HTTP status
  • Raw HTML before JavaScript
  • Rendered DOM after JavaScript
  • Title, meta description, canonical, robots directives
  • Visible text and internal links
  • Structured data
  • Resources required to render
  • Search Console selected canonical and indexing status

The real problem

A route can pass shallow checks and still fail indexing

Passes:

  • HTTP status: 200
  • Browser screenshot: looks fine
  • Deploy checks: passed
  • Lighthouse: acceptable
  • Backend logs: quiet

Googlebot raw fetch shows:

  • HTML size: 4–8 KB
  • Visible text: under 100 characters
  • Empty root div
  • No H1
  • No internal links
  • Missing product copy
  • Canonical missing or points elsewhere

Result: Crawled — currently not indexed, duplicate/canonical confusion, poor AI crawler extraction, or thin content classification.

This is not an indexing mystery. It is a crawler-output problem.

The fastest way to test what Google sees

  1. Confirm final URL and redirects.
  2. Fetch raw HTML with a normal user agent.
  3. Fetch raw HTML with Googlebot user agent.
  4. Compare raw HTML to rendered DOM.
  5. Check title, H1, canonical, noindex, and internal links.
  6. Check failed JS/CSS/API requests.
  7. Check Search Console URL Inspection.
  8. Check AI crawler-readable output if AI visibility matters.

Testing mental model: four layers

A. Transport layer

Proves: URL resolution and status behavior. Does not prove: indexable content quality. Common failures: mixed host variants, redirect chains, duplicate 200 URLs.

B. Raw HTML layer

Proves: pre-JS crawler-visible text and directives. Does not prove: post-render UX. Common failures: empty app shell, missing H1, weak links, missing canonical.

C. Rendered DOM layer

Proves: what users and rendered output show after JS. Does not prove: crawler got same signals. Common failures: content only appears after client hydration.

D. Crawler interpretation layer

Proves: Google-selected canonical/indexing outcome. Does not prove: deploy-by-deploy stability. Common failures: Crawled — currently not indexed, duplicate clusters, delayed recovery.

Step 1: Check final URL and redirects

If URL resolution is noisy, every downstream SEO signal becomes noisy.

# macOS/Linux
curl -sIL https://example.com/page
curl -sIL http://example.com/page
curl -sIL https://www.example.com/page
curl -sIL "https://example.com/page?utm_source=test"

# Windows (single-line)
curl.exe -s -I -L https://example.com/page

Failure examples: http and https both 200, www and non-www both 200, trailing-slash variants both indexable, UTM URLs indexable, sitemap URLs redirecting, internal links using mixed variants. Healthy state: one canonical host, one final URL, clean 301/308 redirects, sitemap points to final URLs.

Step 2: Fetch raw HTML like a crawler

Raw HTML is what the crawler receives before JavaScript rendering. This is critical for SPAs, React/Vite/Lovable routes, link discovery, title/canonical/noindex extraction, and thin-content detection.

curl -s https://example.com/page -o raw.html
curl -s -A "Googlebot/2.1 (+http://www.google.com/bot.html)" https://example.com/page -o googlebot.html
curl -sI https://example.com/page
wc -c raw.html googlebot.html

Debug thresholds: raw HTML under 10 KB can be fine for minimal SSR pages, but risky for JavaScript-heavy content pages with little text. Visible text under 200 characters is a serious warning. Empty root div + script bundle dependency is a crawler visibility risk.

Step 3: Compare raw HTML vs rendered DOM

VersionWhat it representsHealthyRisk
Raw HTMLCrawler pre-JS responseMeaningful text, title, canonical, linksEmpty app shell
Rendered DOMBrowser after JSFull page copy and CTAUser-only content gap
Googlebot fetchCrawler-style requestSame critical signals as usersThin or missing content

Compare: raw and rendered visible text length, word count, H1 presence, internal link count, title/canonical/noindex, and key CTA selector presence.

Step 4: Check visible text and word thresholds

These are debugging thresholds, not ranking laws.

Healthy

  • Visible text over 1,000 chars
  • Word count over 300
  • Clear H1 and body sections
  • Internal links present

Risk

  • 200–1,000 visible chars
  • Mostly nav/footer text
  • Low internal links
  • Thin product copy

Broken

  • Under 200 visible chars
  • Empty root shell
  • Loading state only
  • No H1
  • No useful text

Step 5: Check title, meta description, H1, and body copy

Why it matters: These are primary visibility signals for relevance and extraction.

Healthy: Healthy: title/H1 match intent, unique description, clear body sections.

Failure signals: Failure: placeholder title, missing H1, generic or duplicate body copy.

Step 6: Check canonical and noindex

Why it matters: Canonical and robots directives decide index eligibility and consolidation.

Healthy: Healthy: self-canonical on indexable pages, intentional robots directives.

Failure signals: Failure: canonical points elsewhere, accidental noindex, conflicting robots tags.

Step 7: Check internal links and sitemap consistency

Why it matters: Crawl paths and sitemap alignment help discovery and canonical stability.

Healthy: Healthy: crawlable internal links and sitemap URLs that resolve directly.

Failure signals: Failure: orphan pages, JS-only links, sitemap entries that redirect.

Step 8: Check structured data

Why it matters: Structured data improves machine interpretation and rich result eligibility.

Healthy: Healthy: valid schema matching visible content.

Failure signals: Failure: broken JSON-LD, schema not present in crawler-visible output.

Step 9: Check JS/CSS/API resource failures

Why it matters: Render integrity depends on resource health.

Healthy: Healthy: critical assets load, low console/resource error rate.

Failure signals: Failure: blocked JS, failed APIs, hydration crashes, missing CSS.

Step 10: Check Search Console URL Inspection

Why it matters: Search Console confirms Google’s chosen outcome for the inspected URL.

Healthy: Healthy: selected canonical matches target URL and page is indexed.

Failure signals: Failure: Crawled — currently not indexed, Google-selected canonical mismatch.

Step 11: Check AI crawler-readable output

Why it matters: If AI visibility matters, extraction must be reliable beyond browser rendering.

Healthy: Healthy: meaningful crawler-readable content and stable AI Markdown where available.

Failure signals: Failure: AI crawlers extract little text, miss key claims, or lose citation context.

What browser screenshots miss

  • Screenshots do not prove raw HTML quality.
  • Screenshots do not prove Googlebot received the same response.
  • Screenshots do not clearly expose canonical/noindex mistakes.
  • Screenshots do not prove internal links are crawlable.
  • Screenshots do not prove Search Console selected the right canonical.
  • Screenshots do not prove AI crawlers can extract the page.

What Search Console can and cannot tell you

Search Console can show

  • Inspected URL status
  • Crawled/indexed state
  • Selected canonical
  • Crawl timing
  • Rendered screenshot in URL Inspection
  • Indexing exclusions

Search Console cannot reliably show

  • Every deploy preserved content quality
  • Every key page still has enough visible text
  • CTAs/forms still function
  • AI crawlers can read the page
  • The exact deploy where regression started

How Guard helps

Guard gives page-level monitoring for production visibility signals and output regressions across scans: page snapshot history, rendered output changes, raw HTML/rendered output signals where available, visible text length, word count, HTML bytes, title/H1/canonical/noindex drift, resource errors, console errors, DOM/content changes, and Core Web Vitals regressions.

Guard does not replace Search Console. It catches page-output changes before they turn into indexing and visibility problems.

Common mistakes that create false confidence

  • Relying only on Chrome.
  • Relying only on Lighthouse.
  • Assuming Google always renders JavaScript fully.
  • Testing only the homepage.
  • Ignoring raw HTML.
  • Ignoring canonical/noindex directives.
  • Ignoring internal links.
  • Ignoring failed JS/API requests.
  • Checking GSC too late.
  • Treating Crawled — currently not indexed as random.
  • Assuming AI crawlers behave like Googlebot.
  • Using sitemap submission as a fix for thin output.

Full test checklist

URL / redirects

  • Single canonical host
  • 301/308 normalization
  • No duplicate indexable variants
  • No noisy parameters indexable

Raw HTML

  • HTML byte size sanity
  • Visible text presence
  • Title/H1 in source
  • Canonical/noindex in source

Rendered DOM

  • Main copy present
  • Key CTA present
  • H1/sections stable
  • Rendered output not loader-only

SEO signals

  • Title and meta quality
  • Canonical intent
  • Robots directives
  • Structured data validity

Links / crawl paths

  • Internal links crawlable
  • No orphan key pages
  • Sitemap contains final URLs
  • Anchor text relevance

Resources / JavaScript

  • Critical JS/CSS loads
  • API dependencies succeed
  • Console errors monitored
  • Hydration failures absent

Search Console

  • URL Inspection state
  • Google-selected canonical
  • Indexing exclusion reason
  • Recrawl after fixes

AI crawler output

  • Main claims extractable
  • Citation context present
  • AI Markdown available where used
  • No JS-only critical text

Performance / Core Web Vitals

  • TTFB stable
  • LCP regression watch
  • Layout stability
  • Error spikes after deploy

Conversion path

  • Primary CTA works
  • Form submit works
  • Important buttons visible
  • No blocker modal/errors

Command snippets for quick checks

# Redirect chain and final URL
curl -sIL https://example.com/page

# Raw HTML and Googlebot-style fetch
curl -s https://example.com/page -o raw.html
curl -s -A "Googlebot/2.1 (+http://www.google.com/bot.html)" https://example.com/page -o gbot.html

# Headers only
curl -sI https://example.com/page

# Compare byte size
wc -c raw.html gbot.html

# Quick signal extraction
rg -in "<title|rel="canonical"|name="robots"|<h1" raw.html gbot.html

# Windows one-line fetch
curl.exe -s -A "Googlebot/2.1 (+http://www.google.com/bot.html)" https://example.com/page -o gbot.html

FAQ

How do I see what Googlebot sees?

Check final URL resolution, fetch raw HTML, fetch with a Googlebot user agent, compare with rendered output, then confirm URL Inspection in Search Console.

Is curl enough to test Googlebot?

No. Curl is the raw-response layer. You still need rendered output checks and Search Console canonical/indexing state.

Why does my page show in Chrome but not index?

Chrome proves user rendering. It does not prove crawler-visible text, links, canonical integrity, or noindex state.

Does Google render JavaScript?

Yes, but not as a guarantee for every route and every deploy. Strong raw HTML still reduces crawler risk on JavaScript-heavy sites.

What does Crawled — currently not indexed mean?

Google fetched the page but did not add it to the index, often due to thin output, duplication, weak canonical signals, or low-value content.

How do I compare raw HTML and rendered DOM?

Compare visible text length, word count, H1, internal links, canonical/noindex, and key CTA selectors in both versions.

What visible text length is risky?

For important content pages, under 200 visible characters is a serious warning. 200–1,000 is risk. Over 1,000 is usually healthier.

Should I test with JavaScript disabled?

Yes. It is a fast stress test for whether critical copy, links, and directives exist before rendering.

Why do AI crawlers miss my content?

Many AI crawlers use lightweight extraction and may not execute complex client rendering reliably, so thin raw HTML reduces AI citation quality.

What should I check after every deploy?

Final URL, raw HTML bytes, visible text length, title/H1/canonical/noindex, resource failures, and Search Console state on key routes.

Does Guard replace Search Console?

No. Guard provides page-level monitoring and catches production output regressions before they become Search Console problems.

What pages should I test first?

Start with revenue pages, top landing pages, pages with recent ranking drops, and high-intent routes that depend on JavaScript rendering.

DataJelly

SEO snapshots for modern SPAs. Making JavaScript applications search engine friendly with enterprise-grade reliability.

Product

  • DataJelly Edge
  • DataJelly Guard
  • Pricing
  • SEO Tools
  • Visibility Test
  • Dashboard

Resources

  • Blog
  • Guides
  • Getting Started
  • Prerendering
  • SPA SEO Guide

Company

  • About Us
  • Contact
  • Terms of Service
  • Privacy Policy

© 2026 DataJelly. All rights reserved. Built with love for the modern web.