AI crawler policy for GEO — search=yes, ai-train=no, and full Disallow

AI crawler policy is whether ChatGPT, Perplexity, Claude, and Google can fetch your pages in the first place. Search & AI Visibility is not only schema and blog posts — it is access plus extractable answers plus public measurement. Google Search does not use llms.txt for ranking; optional llms.txt is for non-Google crawlers that choose to read it.

LuminaForge publishes its crawler allow-list at robots.txt and citation snapshots on the transparency dashboard. This guide explains the signals many agencies confuse.

Three layers — not one switch

Layer	What it controls	GEO impact
robots.txt `Disallow`	Whether a bot can crawl URLs at all	Full block usually prevents live retrieval and citation
`Content-Signal: search=yes`	Permission to index / retrieve for answers	Intended to allow AI search and RAG-style use
`Content-Signal: ai-train=no`	Opt-out of model training	Does not replace an explicit allow for retrieval

Short answer: ai-train=no is not the same as welcoming AI citations. A full Disallow: / for GPTBot blocks access entirely — stronger than any Content-Signal line.

What LuminaForge allows

On luminaforge.ai, public marketing routes explicitly allow:

GPTBot, OAI-SearchBot, ChatGPT-User
PerplexityBot, Perplexity-User
ClaudeBot, Claude-Web, anthropic-ai
Google-Extended, Applebot-Extended
CCBot, Bytespider, Amazonbot

Admin paths and gated client previews stay disallowed. See the live file: robots.txt.

Common mistakes

Cloudflare one-click “block AI bots” — marketing sells GEO while IT enables a managed block list. Audit robots.txt after any CDN toggle.
Yoast llms.txt without crawler access — auto-generated llms.txt is not a substitute for allowing bots to read your pages.
Training opt-out only — ai-train=no does not tell Perplexity or ChatGPT browse to cite you; you still need fetchable HTML and open rules.
Schema without performance — crawlable but slow pages lose trust signals. We verify Core Web Vitals on transparency.

How AI platforms fetch answers (why policy differs)

Platform	Typical live web use	Citation behavior
Perplexity	Real-time search active	Source links common in answers
Google AI Overviews	Grounded in Google index	Sources from indexed pages
ChatGPT	Browse / search optional	Citations vary by mode and query
Claude	Browse optional	Citations vary by query

Even when a platform leans on an existing search index, blocking AI-specific crawlers can still hurt freshness, llms.txt discovery, and brand-controlled pages you ship after indexation.

After crawlers can reach your site, measure generative share of voice (SoV):

SoV (%) = (engine checks citing your domain ÷ total engine checks) × 100

LuminaForge runs a conversational query bank weekly and publishes results — including early 0% baselines — on /transparency. Client engagements receive the same reporting model.

SMB checklist (5 minutes)

Open /robots.txt — are GPTBot and PerplexityBot allowed on public pages?
Confirm llms.txt and llms-full.txt exist and list services + locations.
Spot-check one service page for FAQPage JSON-LD and a plain-language FAQ block.
Run three conversational queries in Perplexity — is your brand named?
Request a free AI Visibility Snapshot if you want LuminaForge to score the full stack.

Next steps

How to get cited by ChatGPT and Perplexity — full implementation playbook
GEO for home services — local SMB vertical guide
Search & AI Visibility service — how LuminaForge delivers GEO with web development

AI crawler policy for GEO — search=yes, ai-train=no, and full Disallow

Three layers — not one switch

What LuminaForge allows

Common mistakes

How AI platforms fetch answers (why policy differs)

SMB checklist (5 minutes)

Next steps

Related case studies

LuminaForge.ai — the site is the case study

Related field notes

How to get cited by ChatGPT and Perplexity

What is Generative Engine Optimization (GEO) — and why it eats SEO from below

What Google’s June 2026 SEO docs mean for AEO and GEO

Let's build the site that becomes the answer.

Three layers — not one switch

What LuminaForge allows

Common mistakes

How AI platforms fetch answers (why policy differs)

Generative share of voice — measure what access enables

SMB checklist (5 minutes)

Next steps

Related case studies

LuminaForge.ai — the site is the case study

Related field notes

How to get cited by ChatGPT and Perplexity

What is Generative Engine Optimization (GEO) — and why it eats SEO from below

What Google’s June 2026 SEO docs mean for AEO and GEO

Let's build the site that becomes the answer.