llms.txt
A Markdown file at a site's root that tells AI crawlers what the site is about and which URLs are canonical — a kind of robots.txt for language models.
llms.txt is a file format proposed by Jeremy Howard (Answer.AI) in September 2024. It lives at `https://yoursite.com/llms.txt`, is formatted as Markdown, and contains a site name, a one-sentence description, optional context, and a curated list of important URLs grouped into sections.
The goal is to give an LLM a single shot at understanding what a site is for. Crawling a full sitemap and inferring canonical structure from titles is lossy. Reading a 300-word llms.txt that explicitly says 'here's the product and here are the ten URLs that explain it' is not.
It has a companion format, `/llms-full.txt`, which inlines the full text of each linked page. This is expensive in bytes but is the preferred format for Claude's retrieval and provides the largest citation uplift for documentation-heavy sites.
In AIRRNK
AIRRNK checks for a well-formed llms.txt on every scan. It's a 5-point line item on the 47-point rubric. The WordPress plugin and Shopify app can both auto-generate and serve the file for you, kept in sync with your content.
- Generative Engine Optimization
The practice of making a website more likely to be cited by AI answer engines (ChatGPT, Claude, Perplexity, Google AI Mode) rather than simply ranked on a traditional search results page.
- Schema Markup
Structured data embedded in a page (usually as JSON-LD) that describes what the page is about in a machine-readable vocabulary defined at schema.org.
- Indexing API
An API — most commonly IndexNow (Bing/Yandex) or Google's Indexing API — that pushes URL changes to a search index in real time instead of waiting for crawlers.
What is llms.txt in the context of AI SEO?
llms.txt describes one piece of the larger Generative Engine Optimization (GEO) problem — measuring and fixing how ChatGPT, Claude, Perplexity, and Gemini talk about a business. GEO differs from classical SEO because LLM answers do not return a list of links; they return a paraphrase, and the signals that get you inside that paraphrase are different.
How does AIRank measure llms.txt?
AIRank's Observer agent queries ChatGPT, Claude, Perplexity, and Gemini daily with the prompts your customers actually use and logs every mention. The Scanner agent then walks your site the way an LLM does — 47 signals across headings, schema, entity mesh, and source trust — and flags the specific gaps driving the result.
Why does llms.txt matter for AI visibility?
Roughly 42% of B2B buyer research now starts inside an LLM (Forrester 2026). Pages that do not satisfy the GEO signal set get paraphrased without attribution or omitted from answers entirely — a situation Aggarwal et al. (Princeton, 2023) measured as a 30-40% citation gap against pages that do.
What is the fastest way to improve llms.txt?
Start by running a free AIRank scan to surface the three highest-leverage fixes for your domain, then ship them through the Injector agent in a single click. Most teams see their first fix land within 12 minutes of install; citation lift typically shows up in weeks two and three once assistants re-crawl the edge-rewritten HTML.
Written by
The AIRank Editorial Team
Research & editorial, AIRank
The AIRank editorial team runs the 47-point scanner, the Observer pings, and the GEO research programme every week. Writing is reviewed by the core engineers who build the Injector, Blaster, and Surgeon agents.
About the team →