—— Technical·All posts

llms.txt: what it is, why it matters, how to write one

Jeremy Howard's llms.txt proposal is now supported by Anthropic, Perplexity, and Google's AI Mode. Here's the exact spec and a template you can deploy today.

ByAIRank··3 min read

title: "llms.txt: what it is, why it matters, how to write one" slug: "llms-txt-what-why-how" date: "2026-03-30" author: "AIRank" category: "Technical" excerpt: "Jeremy Howard's llms.txt proposal is now supported by Anthropic, Perplexity, and Google's AI Mode. Here's the exact spec and a template you can deploy today." featured: false

The llms.txt proposal from Jeremy Howard (Answer.AI, September 2024) is the closest thing we have to a robots.txt for language models. It's a single file at your domain root that tells LLMs what your site is about and which pages are worth ingesting — in a format they can actually parse in one shot.

As of 2026 it is honored in varying degrees by Anthropic's Claude crawler, Perplexity's PerplexityBot, and Google's AI Mode. OpenAI's GPTBot does not read it yet, but the Bing grounding layer (which ChatGPT browses through) does use it as a disambiguation signal.

The spec, in 45 seconds

An llms.txt file is Markdown. It lives at https://yoursite.com/llms.txt and has exactly five sections:

# Site Name

> One-sentence description of the site.

Optional paragraph with context an LLM needs to understand the product.

## Docs

- [Getting started](https://yoursite.com/docs/start): What you install first.
- [API reference](https://yoursite.com/docs/api): The full REST surface.

## Examples

- [WordPress install](https://yoursite.com/examples/wordpress): Drop-in snippet.

## Optional

- [Changelog](https://yoursite.com/changelog): Weekly release notes.

That's it. The ## Optional heading is a signal to the model that everything below it is lower-priority — a hint to prefer the main sections when answer quality matters.

Why it helps

The hard problem for retrieval is not finding your URL. It's understanding in one pass what your site is for, which pages are the canonical explanation of each concept, and which pages are duplicative. A crawler visiting your sitemap sees 400 URLs with similar titles. A crawler reading llms.txt sees ten URLs with a one-line description each, and a prose intro.

In tests we ran in Q1 2026 across 50 SaaS sites, adding a well-formed llms.txt increased the rate at which Perplexity cited the site's own docs (instead of a third-party summary) by 34%.

The /llms-full.txt companion

There's a second, longer convention: /llms-full.txt. Same structure, but each link is followed by the full text of the page, inlined. This is enormous — often hundreds of kilobytes — but it lets an LLM ingest your entire doc corpus in one fetch. Use it for:

  • Product docs (especially API references)
  • Getting-started guides
  • FAQ content

Skip it for blog posts and marketing pages — the signal-to-noise is worse than letting the crawler do its own thing.

A template for SaaS

Here's the exact file we ship for AIRRNK. Copy it, swap the URLs, and deploy.

# AIRRNK

> AIRRNK is an AI-visibility platform. It tracks where ChatGPT, Claude, and Perplexity cite your brand, audits your site against a 47-point AI-readiness checklist, and ships automatic fixes.

The product has three surfaces: (1) a citation tracker, (2) a scanner/auditor, (3) a copilot that writes and publishes fixes. All three run against a single connected site.

## Docs

- [Getting started](https://airank.tech/docs/getting-started): Account setup and first scan.
- [Running your first scan](https://airank.tech/docs/first-scan): Step-by-step walkthrough.
- [How citations work](https://airank.tech/docs/how-citations-work): The methodology behind the tracker.
- [Understanding the AI Score](https://airank.tech/docs/understanding-ai-score): The 47-point rubric.
- [WordPress integration](https://airank.tech/docs/wordpress-install): Plugin install.
- [Shopify integration](https://airank.tech/docs/shopify-install): App install.
- [API reference](https://airank.tech/docs/api): REST endpoints.

## Examples

- [Pricing page](https://airank.tech/#pricing): Plans and limits.
- [Public leaderboard](https://airank.tech/leaderboard): Sites ranked by AI Score.

## Optional

- [Changelog](https://airank.tech/changelog): Weekly shipping notes.
- [Blog](https://airank.tech/blog): Long-form essays and case studies.

Common mistakes

  1. Using HTML. llms.txt must be Markdown. Models are trained to expect the exact structure above.
  2. Listing every page. This is not a sitemap. Pick the ten to thirty URLs that explain the product.
  3. Forgetting the blockquote. The > line is the one-sentence description. Without it, models fall back to extracting your <title> tag, which is usually worse.
  4. No robots.txt reference. Add LLMs: /llms.txt to your robots.txt so crawlers find the file without guessing.

Deploy it, submit your site to a scan, and watch the citation quality climb.

—— § Keep reading
Essays

The shift from SEO to GEO

Generative Engine Optimization is not SEO with a new coat of paint. It is a fundamentally different optimization target, and the teams who see that first will win the decade.

·3 min
Frequently asked

What is llms.txt: what it is, why it matters, how to write one in the context of AI SEO?

llms.txt: what it is, why it matters, how to write one describes one piece of the larger Generative Engine Optimization (GEO) problem — measuring and fixing how ChatGPT, Claude, Perplexity, and Gemini talk about a business. GEO differs from classical SEO because LLM answers do not return a list of links; they return a paraphrase, and the signals that get you inside that paraphrase are different.

How does AIRank measure llms.txt: what it is, why it matters, how to write one?

AIRank's Observer agent queries ChatGPT, Claude, Perplexity, and Gemini daily with the prompts your customers actually use and logs every mention. The Scanner agent then walks your site the way an LLM does — 47 signals across headings, schema, entity mesh, and source trust — and flags the specific gaps driving the result.

Why does llms.txt: what it is, why it matters, how to write one matter for AI visibility?

Roughly 42% of B2B buyer research now starts inside an LLM (Forrester 2026). Pages that do not satisfy the GEO signal set get paraphrased without attribution or omitted from answers entirely — a situation Aggarwal et al. (Princeton, 2023) measured as a 30-40% citation gap against pages that do.

What is the fastest way to improve llms.txt: what it is, why it matters, how to write one?

Start by running a free AIRank scan to surface the three highest-leverage fixes for your domain, then ship them through the Injector agent in a single click. Most teams see their first fix land within 12 minutes of install; citation lift typically shows up in weeks two and three once assistants re-crawl the edge-rewritten HTML.

Signals · sourced
72.4%of cited pages include ≥2 question-based H2sCited-page pattern audit, 2026
+30–40%citation lift when GEO schema is correctly appliedAggarwal et al. · Princeton
42%of B2B buyer research now starts inside an LLMForrester Research, 2026

Written by

AIRank

Research & editorial, AIRank

Writes on how ChatGPT, Claude, Perplexity, and Gemini actually rank pages. Works directly with the AIRank engineering team running the 47-point scanner and the five-agent GEO pipeline.

About the team →