Blog

Guard

April 2026

Accidentally Adding Noindex: How Sites Disappear Overnight

Friday deploy. Minor copy change. Everything green. By Sunday: impressions down 80%, key pages missing from the index, revenue pages gone. Root cause: a single <meta name="robots" content="noindex"> tag shipped globally. No errors. All 200s. We see this constantly.

Reading progress0%

The Real Failure

Noindex doesn't break your site. It removes your site. The page loads, renders, serves content — and crawlers drop it anyway. Search Console eventually shows "Excluded by 'noindex' tag" but by then you've already lost a week of traffic.

A real outage we triaged:

• Deploy: Friday 4:47 PM
• HTML size: unchanged (~84 KB)
• Visible text: unchanged (~1,800 words)
• Status code: 200
• Diff: one line — <meta name="robots" content="noindex"> in the shared <head> component
• Impressions Sunday: −80%

Lighthouse 96. Browser perfect. Index gutted.

What Noindex Actually Does

Noindex is a hard directive. When present, search and AI crawlers (Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot) drop the page. There is no partial credit, no "we'll consider it." It shows up in two places:

HTML: <meta name="robots" content="noindex"> in the document head
HTTP headers: X-Robots-Tag: noindex from the origin, CDN, or edge worker

Crawlers don't interpret intent. They obey directives. The browser experience is irrelevant — what matters is what shows up in the response. For the broader picture of how bots read your responses, see How to Test Your Site for AI Visibility (Fast).

Why Everything Looks Healthy

Every system you rely on says nothing is wrong:

Status codes: 200

Response time: normal

Logs: clean

Error rates: zero

HTML size: unchanged

Visible text: unchanged

Because nothing is broken. The page loads, renders, serves content. This is not a system failure. It's a visibility failure. That's why it slips through every dashboard you have. Same gap that produces silent post-deploy regressions — the system is fine, the page isn't.

Why Tools Miss This

Uptime tools check availability and latency. They do not check indexability, HTML directives, or page-level signals. SEO crawlers run on schedules — once a day, once a week — and miss short outages entirely.

Result: noindex can sit live for 24–72 hours before anyone notices, and the only signal is traffic disappearing. By the time someone opens Search Console, you've already lost a week of impressions and the recovery curve is two more weeks.

If your monitoring stack only watches systems and not pages, you're going to keep eating these. Run the Page Validator or Visibility Test against your top URLs to see how directives look right now.

What We See in Production

Three repeatable patterns. We see all three across React, Vite, Next.js, and Lovable apps.

Staging config leaks to production

Scenario: Staging environments use noindex to keep them out of search. The env flag (or default value) gets copied into a production deploy by mistake.

Signals: HTML unchanged except for the robots tag. HTML size stable (~80 KB). Visible text identical.

Impact: Entire site deindexed within 1–2 crawl cycles. No alert fires because nothing else changed.

Marketing tool injects global noindex

Scenario: A CMS, A/B testing tool, or tag manager injects meta tags. A rule meant for a single campaign page is configured with a selector that matches everything.

Example: Campaign page set to noindex via Google Tag Manager. The trigger fires on every page because the URL filter is missing.

Impact: 100+ pages removed from search in under 48 hours. Hardest to debug because the directive is injected client-side and may not be in your repo at all.

CDN or edge layer adds X-Robots-Tag header

Scenario: A Cloudflare worker, Fastly VCL rule, or origin proxy adds X-Robots-Tag: noindex based on a config that drifted.

Signals: HTML looks correct (100 KB+, all content present). No meta tag in the document. The directive lives in the response headers — invisible in DevTools' Elements panel.

Impact: Pages excluded despite "perfect" HTML. The hardest of the three to detect because every visual check passes. Use the HTTP Bot Comparison tool — it shows full response headers, not just rendered HTML.

Before vs After Deploy

The diff is one line. Everything else is identical.

Before deploy — indexable

<head>
  <title>Pricing | Acme</title>
  <meta name="description" content="..." />
  <link rel="canonical" href="..." />
  <!-- no robots tag -->
</head>

• HTML: 84 KB
• Words: 1,847
• Status: 200
• Headers: clean

After deploy — deindexed

<head>
  <title>Pricing | Acme</title>
  <meta name="description" content="..." />
  <link rel="canonical" href="..." />
  <meta name="robots" content="noindex" />
</head>

• HTML: 84 KB (same)
• Words: 1,847 (same)
• Status: 200 (same)
• Headers: clean

[Screenshot placeholder: terminal output of curl -I https://yoursite.com showing X-Robots-Tag: noindex in the response headers]

How to Detect It

1. Fetch raw HTML and grep

curl -sL https://yourdomain.com | grep -iE 'robots|noindex'

If anything matches, stop and read it. noindex, nofollow, none — all hard directives.

2. Check response headers

curl -sI https://yourdomain.com | grep -i x-robots-tag

This is invisible in the DOM and missed by every visual inspection. Always check headers explicitly. Or use the HTTP Bot Comparison tool — it surfaces full headers for both bot and browser fetches.

3. Diff before vs after deploy

If your HTML and content look identical but indexability changed, the directive is the diff. Track which deploy introduced it.

4. Validate indexability, not rendering

If HTML size is 80 KB, visible text is 1,000+ words, but traffic drops — check directives first, not content. Content didn't change. Indexability did.

5. Cross-check with robots.txt

A noindex meta tag and a robots.txt disallow do different things. If a page is disallowed in robots.txt, Google can't even fetch it to see the noindex. Confusing the two leads to "indexed despite blocked" warnings. Use the Robots.txt Tester to validate both layers together.

What to Alert On

Treat directive changes as critical. They're cheap to detect and catastrophic to miss.

Signal	Severity	Action
New `noindex` on previously indexable URL	Critical	Page-level alert, block deploy if pre-merge
New `X-Robots-Tag` header	Critical	Page-level alert + CDN config audit
Canonical URL changed	Warning	Verify intentional, check for self-referencing
`nofollow` added to internal links	Warning	Review link equity impact
robots.txt disallow added	Critical	Confirm intent, validate against sitemap

Run These Tests Now

Don't take our word for it. Check your own site in under a minute — especially after your most recent deploy.

Quick Test: What Do Bots Actually See?

~30 seconds

Most people guess. Don't.

Run this test and look at the actual response your site returns to bots.

Fetch your page as Googlebot

Use your terminal:

curl -A "Googlebot" https://yourdomain.com

Look for:

Real visible text (not just <div id="root">)
Meaningful content in the HTML
Page size (should not be tiny)

Compare bot vs browser

Now test what a real browser gets:

curl -A "Mozilla/5.0" https://yourdomain.com

If these responses are different, Google is indexing a different page than your users see.

Stop guessing — measure it.

Real example: 253 words vs 13,547

We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

Bot vs browser comparison showing 253 words for Googlebot vs 13,547 words for a rendered browser on the same URL

If your HTML doesn't contain the content, Google doesn't either.

Compare Googlebot vs browser on your site → HTTP Debug Tool

Check for common failure signals

We see this all the time in production:

HTML under ~1KB → usually empty shell
Visible text under ~200 characters → thin or missing content
Missing <title> or <h1> → weak or broken page
Large difference between bot vs browser HTML → rendering issue

Use the DataJelly Visibility Test (Recommended)

You can run this without touching curl. It shows you:

Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
Fully rendered browser version
Side-by-side differences in word count, HTML size, links, and content

Run Visibility Test — Free

What this test tells you (no guessing)

After running this, you'll know:

Whether your HTML is actually indexable
Whether bots are seeing partial content
Whether rendering is breaking in production

This is the difference between "I think SEO is set up" and "I know what Google is indexing."

If you don't understand why this happens, read: Why Google Can't See Your SPA

If this test fails

You have three real options:

SSR

Works if you can keep it stable in production

Prerendering

Breaks with dynamic content and scale

Edge Rendering

Reflects real production output without app changes

If you do nothing, you will not rank consistently. Learn how Edge Rendering works →

This issue doesn't show up in Lighthouse. It shows up in rankings.

Run the Test Ask a Question

Page Validator

Bot-readiness scan including robots directives and indexability.

HTTP Bot Comparison

See response headers (incl. X-Robots-Tag) and HTML for bot vs browser.

Visibility Test

Run a full bot-perspective check on your homepage.

Also useful: Robots.txt Tester for crawl-level rules and HTTP Status Checker for redirects and status codes.

Pre-Deploy Checklist

Run against the homepage and 5–10 critical URLs (pricing, signup, top blog posts) before every production deploy. Fail the deploy on any hit.

HTML directives

no noindex / none

No meta robots noindex in <head>
No nofollow on internal links
Canonical URL points to self

Response headers

no X-Robots-Tag

No X-Robots-Tag: noindex
No X-Robots-Tag: none
Cache-Control sane (not no-store on indexable pages)

Robots.txt

intentional rules only

No new Disallow on indexable paths
Sitemap URL still correct
User-agent rules unchanged

Diff vs previous deploy

0 directive changes

robots tag presence unchanged
X-Robots-Tag presence unchanged
Canonical unchanged

Any failure = block the deploy. Rolling back a noindex tag is faster than rebuilding rankings.

The Guard Approach

Guard monitors the actual HTML and headers your pages return — not just whether the server responds. After every deploy (or on a schedule), it fetches your real pages and compares them to the previous baseline.

Noindex detection — flagged the moment a robots directive appears in HTML or headers.
Header-level X-Robots-Tag — caught even when the HTML is identical.
Canonical and robots.txt drift — tracked deploy-over-deploy with diffs tied to the responsible commit.
Content regressions — word count, key sections, CTAs verified by selector.
Page-level alerts — fires before traffic drops, not after.

Built specifically for the apps where this fails most often: React, Vite, and Lovable apps with shared layouts and tag-manager-driven markup. See how Guard works →

The takeaway

Noindex doesn't break your site — it removes it. Everything will look fine while your visibility disappears. Most teams don't catch it because they're monitoring systems, not pages. That's the gap.

DataJelly Guard closes it. Page-level monitoring of the actual HTML and headers your site returns, with deploy-tied alerts the moment a directive changes. Built for React, Vite, and Lovable apps. Coming soon — get on the early-access list.

Talk to us about Guard early access Run a free visibility test

FAQ

Blog

Guard

April 2026

Accidentally Adding Noindex: How Sites Disappear Overnight

Reading progress0%

The Real Failure

A real outage we triaged:

• Deploy: Friday 4:47 PM
• HTML size: unchanged (~84 KB)
• Visible text: unchanged (~1,800 words)
• Status code: 200
• Diff: one line — <meta name="robots" content="noindex"> in the shared <head> component
• Impressions Sunday: −80%

Lighthouse 96. Browser perfect. Index gutted.

What Noindex Actually Does

HTML: <meta name="robots" content="noindex"> in the document head
HTTP headers: X-Robots-Tag: noindex from the origin, CDN, or edge worker

Why Everything Looks Healthy

Every system you rely on says nothing is wrong:

Status codes: 200

Response time: normal

Logs: clean

Error rates: zero

HTML size: unchanged

Visible text: unchanged

Why Tools Miss This

If your monitoring stack only watches systems and not pages, you're going to keep eating these. Run the Page Validator or Visibility Test against your top URLs to see how directives look right now.

What We See in Production

Three repeatable patterns. We see all three across React, Vite, Next.js, and Lovable apps.

Staging config leaks to production

Scenario: Staging environments use noindex to keep them out of search. The env flag (or default value) gets copied into a production deploy by mistake.

Signals: HTML unchanged except for the robots tag. HTML size stable (~80 KB). Visible text identical.

Impact: Entire site deindexed within 1–2 crawl cycles. No alert fires because nothing else changed.

Marketing tool injects global noindex

Scenario: A CMS, A/B testing tool, or tag manager injects meta tags. A rule meant for a single campaign page is configured with a selector that matches everything.

Example: Campaign page set to noindex via Google Tag Manager. The trigger fires on every page because the URL filter is missing.

Impact: 100+ pages removed from search in under 48 hours. Hardest to debug because the directive is injected client-side and may not be in your repo at all.

CDN or edge layer adds X-Robots-Tag header

Scenario: A Cloudflare worker, Fastly VCL rule, or origin proxy adds X-Robots-Tag: noindex based on a config that drifted.

Signals: HTML looks correct (100 KB+, all content present). No meta tag in the document. The directive lives in the response headers — invisible in DevTools' Elements panel.

Before vs After Deploy

The diff is one line. Everything else is identical.

Before deploy — indexable

<head>
  <title>Pricing | Acme</title>
  <meta name="description" content="..." />
  <link rel="canonical" href="..." />
  <!-- no robots tag -->
</head>

• HTML: 84 KB
• Words: 1,847
• Status: 200
• Headers: clean

After deploy — deindexed

<head>
  <title>Pricing | Acme</title>
  <meta name="description" content="..." />
  <link rel="canonical" href="..." />
  <meta name="robots" content="noindex" />
</head>

• HTML: 84 KB (same)
• Words: 1,847 (same)
• Status: 200 (same)
• Headers: clean

[Screenshot placeholder: terminal output of curl -I https://yoursite.com showing X-Robots-Tag: noindex in the response headers]

How to Detect It

1. Fetch raw HTML and grep

curl -sL https://yourdomain.com | grep -iE 'robots|noindex'

If anything matches, stop and read it. noindex, nofollow, none — all hard directives.

2. Check response headers

curl -sI https://yourdomain.com | grep -i x-robots-tag

This is invisible in the DOM and missed by every visual inspection. Always check headers explicitly. Or use the HTTP Bot Comparison tool — it surfaces full headers for both bot and browser fetches.

3. Diff before vs after deploy

If your HTML and content look identical but indexability changed, the directive is the diff. Track which deploy introduced it.

4. Validate indexability, not rendering

If HTML size is 80 KB, visible text is 1,000+ words, but traffic drops — check directives first, not content. Content didn't change. Indexability did.

5. Cross-check with robots.txt

What to Alert On

Treat directive changes as critical. They're cheap to detect and catastrophic to miss.

Signal	Severity	Action
New `noindex` on previously indexable URL	Critical	Page-level alert, block deploy if pre-merge
New `X-Robots-Tag` header	Critical	Page-level alert + CDN config audit
Canonical URL changed	Warning	Verify intentional, check for self-referencing
`nofollow` added to internal links	Warning	Review link equity impact
robots.txt disallow added	Critical	Confirm intent, validate against sitemap

Run These Tests Now

Don't take our word for it. Check your own site in under a minute — especially after your most recent deploy.

Quick Test: What Do Bots Actually See?

~30 seconds

Most people guess. Don't.

Run this test and look at the actual response your site returns to bots.

Fetch your page as Googlebot

Use your terminal:

curl -A "Googlebot" https://yourdomain.com

Look for:

Real visible text (not just <div id="root">)
Meaningful content in the HTML
Page size (should not be tiny)

Compare bot vs browser

Now test what a real browser gets:

curl -A "Mozilla/5.0" https://yourdomain.com

If these responses are different, Google is indexing a different page than your users see.

Stop guessing — measure it.

Real example: 253 words vs 13,547

We see this constantly. Here's a real example from production: Googlebot saw 253 words and 2 KB of HTML. A browser saw 13,547 words and 77.5 KB. Same URL — completely different content.

If your HTML doesn't contain the content, Google doesn't either.

Compare Googlebot vs browser on your site → HTTP Debug Tool

Check for common failure signals

We see this all the time in production:

HTML under ~1KB → usually empty shell
Visible text under ~200 characters → thin or missing content
Missing <title> or <h1> → weak or broken page
Large difference between bot vs browser HTML → rendering issue

Use the DataJelly Visibility Test (Recommended)

You can run this without touching curl. It shows you:

Raw HTML returned to bots (Googlebot, Bing, GPTBot, etc.)
Fully rendered browser version
Side-by-side differences in word count, HTML size, links, and content

Run Visibility Test — Free

What this test tells you (no guessing)

After running this, you'll know:

Whether your HTML is actually indexable
Whether bots are seeing partial content
Whether rendering is breaking in production

This is the difference between "I think SEO is set up" and "I know what Google is indexing."

If you don't understand why this happens, read: Why Google Can't See Your SPA

If this test fails

You have three real options:

SSR

Works if you can keep it stable in production

Prerendering

Breaks with dynamic content and scale

Edge Rendering

Reflects real production output without app changes

If you do nothing, you will not rank consistently. Learn how Edge Rendering works →

This issue doesn't show up in Lighthouse. It shows up in rankings.

Run the Test Ask a Question

Page Validator

Bot-readiness scan including robots directives and indexability.

HTTP Bot Comparison

See response headers (incl. X-Robots-Tag) and HTML for bot vs browser.

Visibility Test

Run a full bot-perspective check on your homepage.

Also useful: Robots.txt Tester for crawl-level rules and HTTP Status Checker for redirects and status codes.

Pre-Deploy Checklist

Run against the homepage and 5–10 critical URLs (pricing, signup, top blog posts) before every production deploy. Fail the deploy on any hit.

HTML directives

no noindex / none

No meta robots noindex in <head>
No nofollow on internal links
Canonical URL points to self

Response headers

no X-Robots-Tag

No X-Robots-Tag: noindex
No X-Robots-Tag: none
Cache-Control sane (not no-store on indexable pages)

Robots.txt

intentional rules only

No new Disallow on indexable paths
Sitemap URL still correct
User-agent rules unchanged

Diff vs previous deploy

0 directive changes

robots tag presence unchanged
X-Robots-Tag presence unchanged
Canonical unchanged

Any failure = block the deploy. Rolling back a noindex tag is faster than rebuilding rankings.

The Guard Approach

Noindex detection — flagged the moment a robots directive appears in HTML or headers.
Header-level X-Robots-Tag — caught even when the HTML is identical.
Canonical and robots.txt drift — tracked deploy-over-deploy with diffs tied to the responsible commit.
Content regressions — word count, key sections, CTAs verified by selector.
Page-level alerts — fires before traffic drops, not after.

Built specifically for the apps where this fails most often: React, Vite, and Lovable apps with shared layouts and tag-manager-driven markup. See how Guard works →

The takeaway

Noindex doesn't break your site — it removes it. Everything will look fine while your visibility disappears. Most teams don't catch it because they're monitoring systems, not pages. That's the gap.

Talk to us about Guard early access Run a free visibility test

On This Page

The Real Failure

What Noindex Actually Does

Why Everything Looks Healthy

Why Tools Miss This

What We See in Production

Staging config leaks to production

Marketing tool injects global noindex

CDN or edge layer adds X-Robots-Tag header

Before vs After Deploy

How to Detect It

1. Fetch raw HTML and grep

2. Check response headers

3. Diff before vs after deploy

4. Validate indexability, not rendering

5. Cross-check with robots.txt

What to Alert On

Run These Tests Now

Quick Test: What Do Bots Actually See?

Fetch your page as Googlebot

Compare bot vs browser

Real example: 253 words vs 13,547

Check for common failure signals

Use the DataJelly Visibility Test (Recommended)

What this test tells you (no guessing)

If this test fails

Pre-Deploy Checklist

HTML directives

Response headers

Robots.txt

Diff vs previous deploy

The Guard Approach

The takeaway

FAQ

What does noindex do in practice?

How quickly can this impact traffic?

Can one change affect the whole site?

How do I verify if noindex is present?

Why didn't monitoring catch this?

Is header-level noindex common?

What's the fastest prevention method?

Related Reading

On This Page

The Real Failure

What Noindex Actually Does

Why Everything Looks Healthy

Why Tools Miss This

What We See in Production

Staging config leaks to production

Marketing tool injects global noindex

CDN or edge layer adds X-Robots-Tag header

Before vs After Deploy

How to Detect It

1. Fetch raw HTML and grep

2. Check response headers

3. Diff before vs after deploy

4. Validate indexability, not rendering

5. Cross-check with robots.txt

What to Alert On

Run These Tests Now

Quick Test: What Do Bots Actually See?

Fetch your page as Googlebot

Compare bot vs browser

Real example: 253 words vs 13,547

Check for common failure signals

Use the DataJelly Visibility Test (Recommended)

What this test tells you (no guessing)

If this test fails

Pre-Deploy Checklist

HTML directives

Response headers

Robots.txt

Diff vs previous deploy

The Guard Approach

The takeaway

FAQ

What does noindex do in practice?

How quickly can this impact traffic?

Can one change affect the whole site?

How do I verify if noindex is present?