Implementing llms.txt: Making Your Website LLM-Friendly
Implementing llms.txt: Making Your Website LLM-Friendly
In the rapidly evolving landscape of AI, ensuring your website is accessible and understandable to large language models (LLMs) is becoming increasingly important. Just as websites have long used robots.txt
to provide guidance to search engine crawlers, a new standard called llms.txt
is emerging to help websites communicate effectively with LLMs. At HelpUsWith.ai, we recently implemented this standard to optimize our content for AI consumption.
What is llms.txt?
The llms.txt
standard is a simple yet powerful way to organize content specifically for large language models. Similar to how robots.txt
provides instructions for web crawlers, llms.txt
offers guidance on how LLMs should interpret and interact with your website’s content.
The core benefit of implementing llms.txt
is that it helps language models understand:
- What content is most relevant and authoritative
- How to navigate your website’s structure
- Which parts of your content to prioritize
- How to accurately represent your information
As Anthropic explains, “By providing a clean, text-only version of your content in a standardized location, you can ensure that language models access the most accurate and up-to-date information when responding to user queries about your organization.”
Key Components of llms.txt
According to the standard, an effective llms.txt
file should:
- Provide a clear, plain-text representation of your website’s key information
- Be placed at
/llms.txt
on your domain - Include core content about your organization, products, or services
- Be structured in a simple, markdown-compatible format
- Be regularly updated to reflect your current information
The standard is also discussed at llmstxt.org, which serves as a community resource for best practices and implementation examples.
How We Implemented llms.txt at HelpUsWith.ai
At HelpUsWith.ai, we implemented the llms.txt
standard through a multi-step process focused on generating clean, structured content. Here’s how we did it:
1. Creating the Core Content Structure
We started by designing a comprehensive structure for our llms.txt
file that includes:
- An introduction to our company and services
- Details about our core offerings
- Our consulting process
- Blog posts for deeper information
- Contact details
2. Automating Blog Post Inclusion
One of the most valuable aspects of our implementation is the automatic inclusion of blog posts. We wrote a Node.js script that:
- Scans our blog directory for all posts
- Extracts titles and descriptions from frontmatter
- Sorts them alphabetically
- Adds them to the
llms.txt
file under a dedicated section
This ensures that as we add new content to our blog, it’s automatically included in our llms.txt
file during the build process.
// Function to get blog posts
function getBlogPosts() {
const blogDir = path.join('src', 'blog');
const blogFiles = glob.sync(`${blogDir}/*.md`);
return blogFiles.map(file => {
const content = fs.readFileSync(file, 'utf8');
const { data } = matter(content);
const title = data.title || path.basename(file, '.md');
const description = data.description || '';
return { title, description, file: path.basename(file) };
}).sort((a, b) => a.title.localeCompare(b.title));
}
3. Generating Clean Markdown
For all our content, we developed a robust cleaning process that:
- Strips HTML tags while preserving content
- Converts HTML headings to proper markdown format
- Ensures correct indentation and spacing
- Maintains markdown formatting for links, emphasis, and lists
This cleaning process is crucial because it ensures that the content is presented in a clean, consistent format that LLMs can easily parse.
// Function to strip HTML tags from content and clean formatting
function cleanMarkdownContent(html) {
// Remove HTML tags
let cleaned = html.replace(/<\/?(?:div|span|section)[^>]*>/g, '');
// Convert HTML headings to markdown
cleaned = cleaned.replace(/<h1[^>]*>(.*?)<\/h1>/g, '# $1');
cleaned = cleaned.replace(/<h2[^>]*>(.*?)<\/h2>/g, '## $1');
// Additional cleaning steps...
return cleaned.trim();
}
4. Integration with Build Process
We integrated the llms.txt
generation into our build process by:
- Adding the necessary scripts to our
package.json
- Creating a dedicated script (
create-llms-txt.js
) to generate the file - Setting up
copy-md-versions.js
to create clean markdown versions of all content - Ensuring these scripts run automatically during build
Our build script now includes:
npm run preprocess && eleventy && node scripts/create-llms-txt.js && node scripts/copy-md-versions.js
5. GitHub Actions Integration
Finally, we ensured that our GitHub Actions workflow automatically generates and deploys the llms.txt
file with each update to our site. This way, our LLM-friendly content is always in sync with the rest of our website.
Benefits of Our Implementation
Our approach to implementing llms.txt
has several key benefits:
- Automation: The entire process runs automatically during our build process
- Completeness: All key content, including blog posts, is included
- Cleanliness: Content is presented in clean, consistent markdown
- Maintenance: As we add new content, it’s automatically incorporated
- Discoverability: LLMs can more easily find and understand our content
Best Practices We Followed
Based on our experience, here are some best practices for implementing llms.txt
:
- Structure content logically: Organize your content in a way that makes sense for LLMs to navigate.
- Use proper markdown: Clean, consistent markdown formatting helps LLMs parse your content.
- Automate wherever possible: Build automation to keep your
llms.txt
in sync with your site. - Include comprehensive information: Don’t just provide basic details; include enough depth for LLMs to understand your offerings.
- Update regularly: Ensure your
llms.txt
is updated whenever your site changes.
Conclusion
Implementing the llms.txt
standard is an important step in making your website more accessible and understandable to large language models. By following the approach we’ve outlined, you can ensure that LLMs have accurate, up-to-date information about your organization when responding to user queries.
As AI continues to evolve, standards like llms.txt
will become increasingly important for businesses looking to ensure their content is properly represented in the AI ecosystem. By getting ahead of this trend, you can position your organization for success in an AI-driven future.
Want to learn more about optimizing your web presence for AI? Contact us to discuss how we can help your organization navigate the evolving AI landscape.