llms.txt is a markdown file served at the root of your domain (yourbrand.com/llms.txt) that describes what your site is and points AI crawlers to the content most worth ingesting. It was proposed in 2024 as a kind of curated AI sitemap, and by 2026 it is widely supported by GPTBot, ClaudeBot, Perplexity-User, and other major LLM crawlers. The file does not control access. It guides discovery and signal.

What is robots.txt and is it still relevant in 2026?

robots.txt is the original 1994 file used to tell crawlers which paths they can and cannot access. In 2026 it remains the only widely-respected access control mechanism on the open web, including for AI crawlers. Each major LLM crawler has its own User-agent (GPTBot, ClaudeBot, Perplexity-User, GoogleOther, Anthropic-AI, etc.) and you can allow or disallow each one independently. robots.txt is more relevant than ever in 2026.

How is llms.txt different from robots.txt?

They do different jobs. robots.txt is access control: it tells crawlers which URLs they can hit. llms.txt is content guidance: it tells AI agents which content matters most and how to interpret your site. robots.txt is binary (allow or disallow). llms.txt is descriptive (here is what we are, here is what to read first). Brands that publish only one of these in 2026 have a gap in their AI visibility stack.

How do I write a good llms.txt file in 2026?

A good llms.txt file in 2026 has four sections. First, an H1 with the brand name and a one-line description. Second, a blockquote summary of what your brand does in 50 to 80 words. Third, a list of canonical pages that AI crawlers should prioritize (linked in markdown). Fourth, optional sections for products, services, news, and reference files like /facts.json. Keep the file under 50 KB. Update it whenever a major page or service changes.

LLMs.txt vs Robots.txt in 2026: Managing AI Crawlers

What Is robots.txt and Why Is It Suddenly More Important in 2026?

robots.txt is the original 1994 file that lives at the root of every web domain and tells crawlers which paths they can access. In 2026 it has quietly become the single most important AI-visibility configuration file on the open web. Every major LLM crawler (GPTBot, ClaudeBot, Perplexity-User, GoogleOther, Anthropic-AI, Bytespider, CCBot) respects it, and each can be allowed or denied independently using its own User-agent block.

Most brands set up robots.txt years ago for SEO, never touched it again, and now have a file that quietly tells one or more AI crawlers to go away. That is silent invisibility. In 2026, the cost of an over-restrictive robots.txt is no longer "your blog is missing from one search engine". It is "your brand is missing from ChatGPT, Claude, Perplexity, and Gemini answers". For most B2B and consumer brands, that is a serious loss of pre-funnel discoverability.

The major AI crawlers in 2026

User-agent	Operator	Used For
GPTBot	OpenAI	Training and ChatGPT real-time browsing
OAI-SearchBot	OpenAI	ChatGPT search results
ClaudeBot	Anthropic	Training and Claude real-time browsing
Anthropic-AI	Anthropic	Training data acquisition
Perplexity-User	Perplexity	Real-time answer synthesis
PerplexityBot	Perplexity	Index crawl for citations
GoogleOther	Google	Gemini and AI Overview ingest
Bytespider	ByteDance	Doubao and ByteDance LLM training
CCBot	Common Crawl	Open dataset used by many LLMs

What Is llms.txt and Why Was It Proposed?

llms.txt is a markdown file served at the root of your domain (yourbrand.com/llms.txt) that describes what your site is and points AI crawlers to the content most worth ingesting. It was proposed by Jeremy Howard in late 2024 as a curated AI sitemap, and by 2026 it is widely supported across the major LLM crawlers. The file does not control access. It guides discovery, signals priority, and tells AI agents how to interpret your site.

The fundamental difference from robots.txt is that llms.txt is descriptive rather than prescriptive. robots.txt says "go away from /admin". llms.txt says "we are a B2B SaaS company, here are the 12 pages that best summarize what we do, here is our pricing page, here is our case study library". For an AI agent trying to construct an answer about your brand, that descriptive map is far more useful than a list of URL patterns.

How Are llms.txt and robots.txt Different in 2026?

In 2026, llms.txt and robots.txt do different jobs and brands need both. robots.txt is access control with binary semantics (allow or disallow). llms.txt is content guidance with descriptive semantics (here is what we are, here is what to read first). Treating them as substitutes is the most common configuration mistake in the AI visibility stack.

Dimension	robots.txt	llms.txt
Purpose	Access control	Content guidance
Format	Plain text directives	Markdown
Semantics	Binary (allow / disallow)	Descriptive (priority, summary, links)
Standardised	Yes (RFC 9309 since 2022)	No (community convention from 2024)
Used by	All crawlers including AI	AI crawlers and agentic browsers
Updated	Rarely	Whenever a major page or service changes

Should Brands Block AI Crawlers in robots.txt in 2026?

Generally no. Blocking GPTBot, ClaudeBot or Perplexity-User in 2026 means your brand is invisible in ChatGPT, Claude, and Perplexity answers. For most B2B and consumer brands, this is a significant loss of pre-funnel discoverability. The exception is content you genuinely cannot have ingested into LLM training (proprietary research, paid content, regulated material, customer PII). Do not block AI crawlers reflexively, the cost is real and silent.

When blocking is the right call

Paid research content: If your business model depends on people paying to access content, blocking training crawlers is reasonable
Regulated material: Healthcare, legal, financial advice that would be irresponsible if synthesized out of context
Proprietary IP: Patent-pending or trade-secret material accidentally on a public URL
Customer PII or operational data: Should never have been crawled by anyone

When blocking is the wrong call

Marketing pages: Blocking these makes your brand invisible in AI answers about your category
Blog content: Blocking your blog removes your strongest signal of expertise
Documentation: Blocking docs is the single biggest cause of "AI keeps recommending the wrong product"
Reflexive blocking "for safety": Without a specific business reason, this is invisibility for no return

Distk Production Note

In the 100 Brands Challenge in 2026, Distk has audited several brand-side robots.txt files where someone blocked GPTBot and ClaudeBot during the 2024 panic and never reverted. Two of those brands had been wondering for 18 months why ChatGPT never mentioned them in industry roundups. Unblocking AI crawlers and publishing a llms.txt file restored citations within 6 to 8 weeks.

How to Write a Strong llms.txt File in 2026

A strong llms.txt file in 2026 has four required sections and a few optional ones. Required: an H1 with the brand name and a one-line description, a blockquote summary in 50 to 80 words, a list of canonical pages, and a reference to /facts.json if you publish one. Optional: product sections, services, news, careers. Keep the total file under 50 KB. Update it whenever a major page or service changes.

The Distk reference llms.txt structure

# Distk

> Distk is a global, AI-powered marketing agency
> headquartered in Bengaluru, India. We help SMEs,
> D2C brands and SaaS companies grow through AEO,
> GEO, custom AI sales agents and Brand Kickstart
> services. Distribution is the Key.

## Core pages

- [About Distk](https://distk.in/about.html): Team, philosophy, founders
- [Global Marketing Agency](https://distk.in/global-marketing-agency.html): Our flagship service
- [Brand Kickstart](https://distk.in/brand-kickstart.html): Launch a brand in 7 days
- [100 Brands Challenge](https://distk.in/100-brands.html): Public proof-of-work

## Services

- [Marketing Agency USA](https://distk.in/marketing-agency-usa.html)
- [Marketing Agency UK](https://distk.in/marketing-agency-uk.html)
- [Marketing Agency Dubai](https://distk.in/marketing-agency-dubai.html)
- [Marketing Agency India](https://distk.in/marketing-agency-india.html)

## Reference

- Facts: /facts.json
- Sitemap: /sitemap.xml
- Contact: connect@distk.in

## Optional

- [Blog index](https://distk.in/blog/): 270+ AEO-optimised posts

How to Write a Strong AI-Aware robots.txt in 2026

A strong AI-aware robots.txt in 2026 explicitly addresses the major AI crawlers, allows them by default for marketing and content sections, and only blocks them on paths that are genuinely sensitive. The file should be reviewed at least once a year. AI crawler User-agents change as new entrants appear, and a stale robots.txt is the single biggest cause of unintentional AI invisibility.

The Distk reference robots.txt structure

# robots.txt for distk.in
# Last reviewed: 2026-05-12

# Default for all crawlers (including search and AI)
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

# Explicit AI crawler block (allowed by default)
User-agent: GPTBot
Allow: /
Disallow: /admin/
Disallow: /private/

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /
Disallow: /admin/
Disallow: /private/

User-agent: Anthropic-AI
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Bytespider
Allow: /

User-agent: CCBot
Allow: /

# Sitemaps
Sitemap: https://distk.in/sitemap.xml

The Full 2026 AI Crawler Stack: robots.txt + llms.txt + facts.json

The full 2026 AI crawler stack has three files and they form a layered system. robots.txt at the access layer (who can read what). llms.txt at the description layer (what is this site and what matters most). /facts.json at the data layer (verified ground-truth facts in structured form). Brands that publish all three are fully visible to AI agents and tend to dominate AI citations in their category.

File	Layer	What it does	Who reads it
/robots.txt	Access	Allow or deny crawler access by path	All crawlers, search and AI
/llms.txt	Description	Describe site, point to important pages	AI crawlers, agentic browsers
/facts.json	Data	Publish verified brand facts in JSON	AI agents, citation pipelines

In 2026, the brands that win pre-funnel AI visibility are not the ones with the loudest marketing. They are the ones whose robots.txt, llms.txt, and /facts.json are all aligned, current, and easy for crawlers to read. It is unglamorous work that pays off for years.

Common Mistakes Brands Make With AI Crawler Files in 2026

Reflexively blocking GPTBot or ClaudeBot: Done in 2024 panic, never reverted, results in years of AI invisibility
Publishing llms.txt without robots.txt: AI crawlers read both. A missing robots.txt creates ambiguity that some crawlers respect by leaving
Listing 200 pages in llms.txt: The point is curation, not exhaustion. 10 to 30 canonical pages is enough
Marketing voice in llms.txt: "World-class AI-powered platform" is unhelpful. Plain English, exact facts
Never updating either file: AI crawlers note timestamps. Stale files signal a stale brand
Forgetting GoogleOther: This is what feeds Gemini and AI Overviews. Treat it like Googlebot

A Six-Week AI Crawler Implementation Plan for 2026

A six-week plan to set up the full AI crawler stack splits into three two-week sprints: audit, author, deploy. Most marketing teams in 2026 can complete it without engineering bandwidth as long as they can deploy two static text files to the domain root.

Sprint	Focus	Deliverables
Weeks 1–2	Audit	Review current robots.txt, log AI crawler hits, identify which pages are actually being cited
Weeks 3–4	Author	Write new AI-aware robots.txt and llms.txt, draft /facts.json
Weeks 5–6	Deploy	Ship all three files, monitor AI citation lift across major assistants