You have a robots.txt. You probably have a sitemap.xml. Now there's a third file AI crawlers are looking for โ and most websites don't have it yet.
llms.txt is an emerging standard that gives AI assistants a plain-language summary of your site: what it is, who it's for, and which pages matter most. Think of it as a README for language models โ a concise brief that helps ChatGPT, Perplexity, and other AI systems understand your content structure without having to crawl every page.
The file takes about five minutes to create. Here's exactly how to do it.
What llms.txt is (and isn't)
llms.txt was proposed by Jeremy Howard in 2024 as a lightweight convention โ not an official standard yet, but increasingly supported by AI platforms. It lives at the root of your domain (https://yourdomain.com/llms.txt) and contains a brief, structured description of your site in Markdown format.
It is not a replacement for robots.txt. robots.txt controls crawler access (what bots are allowed to crawl). llms.txt controls context (what AI systems should understand about your site). Both serve different purposes and you need both.
Why it matters for citations: When a user asks ChatGPT "what is [topic]?", the AI pulls from indexed content. Sites with a clear llms.txt give the AI a head start in understanding what they're authoritative about โ increasing the chance of appearing in answers.
The llms.txt format
The format is simple Markdown with a few conventional sections. Here's the structure:
# Your Site Name > One-sentence description of what your site is and who it's for. ## About Brief paragraph (2โ4 sentences) explaining your site's purpose, primary audience, and what makes it authoritative on the topics you cover. ## Key Pages - [Homepage](https://yourdomain.com/): What the main page covers - [Feature/Product](https://yourdomain.com/product/): Key offering - [Blog](https://yourdomain.com/blog/): Content categories covered ## Topics Comma-separated list of topics this site is authoritative about. ## Contact Optional: contact information or about page link.
A real example: CiteReady's llms.txt
Here's the actual llms.txt for this site:
# CiteReady > CiteReady is a free GEO audit tool that checks how well a website > is optimized for AI search (ChatGPT, Perplexity, Google AI Overviews). ## About CiteReady analyzes any URL and returns a GEO score across four categories: AI Crawler Access, Content Citability, Structured Data, and Technical Foundation. It is built by SpryTools and available at citeready.sprytools.com. ## Key Pages - [GEO Audit Tool](https://citeready.sprytools.com/): Free audit, no signup required - [Blog](https://citeready.sprytools.com/blog/): Guides on GEO, llms.txt, robots.txt for AI crawlers, and structured data ## Topics Generative Engine Optimization, GEO, AI search visibility, llms.txt, robots.txt AI crawlers, structured data, ChatGPT citations, Perplexity citations, Google AI Overviews
Step-by-step: setting up llms.txt
- Draft your summary Write one sentence that describes what your site does and who it's for. This is the most important line โ it's often what AI systems extract when summarizing your site.
- List your key pages Pick 5โ10 pages that best represent your site. Include your homepage, main product/service pages, and your top content pages. Write a short description next to each link.
- Define your topic authority List the topics you want to be cited for. Be specific. "Marketing" is too broad; "B2B email marketing automation" tells the AI exactly when to cite you.
-
Save as plain text at /llms.txt
Create the file as plain UTF-8 text and place it at the root of your domain. Serve it as
text/plainโ no HTML wrapper needed. -
Verify with a GEO audit
Run a CiteReady audit on your homepage. The Technical Foundation section checks whether
llms.txtexists, is reachable, and contains a valid description.
Common mistakes to avoid
- Serving it as HTML. Some servers convert all requests to HTML. Make sure
/llms.txtreturnsContent-Type: text/plain, not a 301 redirect to an HTML page. - Making it too long. Keep it under 500 words. The point is conciseness โ a wall of text defeats the purpose. AI systems will read your actual pages for depth;
llms.txtis for orientation. - Listing every page. Prioritize. Listing 200 URLs is noise. List the 5โ10 pages that best represent your site's core value.
- Ignoring the description line. The
> descriptionline under the heading is the most valuable part of the file. Don't skip it or make it generic ("Welcome to our website"). - Blocking it in robots.txt. If you have
Disallow: /*.txtin yourrobots.txt, AI crawlers can't read yourllms.txt. Check that/llms.txtis explicitly allowed.
Heads up for Next.js and SPA users: Single-page apps that serve everything through a JavaScript router sometimes don't serve /llms.txt correctly. Place the file in your public/ directory (Next.js, Astro) or your static assets folder to ensure it's served as a real file.
Does llms.txt actually help with citations?
The short answer: yes, especially for newer or smaller sites where AI systems have less training data to work from. A clear llms.txt reduces the cognitive load on the AI โ it doesn't have to infer what your site is about from scattered page content. It knows immediately, and that reduces the risk of misrepresentation or omission.
For established sites with high domain authority, the marginal benefit is smaller but still meaningful โ particularly for Perplexity, which is known to crawl llms.txt files actively as part of its real-time indexing pipeline.
Does your site have a valid llms.txt?
CiteReady checks your llms.txt automatically โ whether it exists, is reachable, and contains a description. Free audit, no account needed.
Check your GEO score โ