llms.txt

A Markdown file at a site's root that tells AI crawlers what the site is about and which URLs are canonical — a kind of robots.txt for language models.

llms.txt is a file format proposed by Jeremy Howard (Answer.AI) in September 2024. It lives at `https://yoursite.com/llms.txt`, is formatted as Markdown, and contains a site name, a one-sentence description, optional context, and a curated list of important URLs grouped into sections.

The goal is to give an LLM a single shot at understanding what a site is for. Crawling a full sitemap and inferring canonical structure from titles is lossy. Reading a 300-word llms.txt that explicitly says 'here's the product and here are the ten URLs that explain it' is not.

It has a companion format, `/llms-full.txt`, which inlines the full text of each linked page. This is expensive in bytes but is the preferred format for Claude's retrieval and provides the largest citation uplift for documentation-heavy sites.

In AIRRNK

AIRRNK checks for a well-formed llms.txt on every scan. It's a 5-point line item on the 47-point rubric. The WordPress plugin and Shopify app can both auto-generate and serve the file for you, kept in sync with your content.

Signals · sourced
72.4%of cited pages include ≥2 question-based H2sCited-page pattern audit, 2026
+30–40%citation lift when GEO schema is correctly appliedAggarwal et al. · Princeton
42%of B2B buyer research now starts inside an LLMForrester Research, 2026

Written by

The AIRank Editorial Team

Research & editorial, AIRank

The AIRank editorial team runs the 47-point scanner, the Observer pings, and the GEO research programme every week. Writing is reviewed by the core engineers who build the Injector, Blaster, and Surgeon agents.

About the team →