Implementing llms.txt: Making Your Website LLM-Friendly

Implementing llms.txt: Making Your Website LLM-Friendly

In the rapidly evolving landscape of AI, ensuring your website is accessible and understandable to large language models (LLMs) is becoming increasingly important. Just as websites have long used robots.txt to provide guidance to search engine crawlers, a new standard called llms.txt is emerging to help websites communicate effectively with LLMs. At HelpUsWith.ai, we recently implemented this standard to optimize our content for AI consumption.

What is llms.txt?

The llms.txt standard is a simple yet powerful way to organize content specifically for large language models. Similar to how robots.txt provides instructions for web crawlers, llms.txt offers guidance on how LLMs should interpret and interact with your website’s content.

The core benefit of implementing llms.txt is that it helps language models understand:

  1. What content is most relevant and authoritative
  2. How to navigate your website’s structure
  3. Which parts of your content to prioritize
  4. How to accurately represent your information

As Anthropic explains, “By providing a clean, text-only version of your content in a standardized location, you can ensure that language models access the most accurate and up-to-date information when responding to user queries about your organization.”

Key Components of llms.txt

According to the standard, an effective llms.txt file should:

  • Provide a clear, plain-text representation of your website’s key information
  • Be placed at /llms.txt on your domain
  • Include core content about your organization, products, or services
  • Be structured in a simple, markdown-compatible format
  • Be regularly updated to reflect your current information

The standard is also discussed at llmstxt.org, which serves as a community resource for best practices and implementation examples.

How We Implemented llms.txt at HelpUsWith.ai

At HelpUsWith.ai, we implemented the llms.txt standard through a multi-step process focused on generating clean, structured content. Here’s how we did it:

1. Creating the Core Content Structure

We started by designing a comprehensive structure for our llms.txt file that includes:

  • An introduction to our company and services
  • Details about our core offerings
  • Our consulting process
  • Blog posts for deeper information
  • Contact details

2. Automating Blog Post Inclusion

One of the most valuable aspects of our implementation is the automatic inclusion of blog posts. We wrote a Node.js script that:

  • Scans our blog directory for all posts
  • Extracts titles and descriptions from frontmatter
  • Sorts them alphabetically
  • Adds them to the llms.txt file under a dedicated section

This ensures that as we add new content to our blog, it’s automatically included in our llms.txt file during the build process.

// Function to get blog posts
function getBlogPosts() {
  const blogDir = path.join('src', 'blog');
  const blogFiles = glob.sync(`${blogDir}/*.md`);
  
  return blogFiles.map(file => {
    const content = fs.readFileSync(file, 'utf8');
    const { data } = matter(content);
    const title = data.title || path.basename(file, '.md');
    const description = data.description || '';
    return { title, description, file: path.basename(file) };
  }).sort((a, b) => a.title.localeCompare(b.title));
}

3. Generating Clean Markdown

For all our content, we developed a robust cleaning process that:

  • Strips HTML tags while preserving content
  • Converts HTML headings to proper markdown format
  • Ensures correct indentation and spacing
  • Maintains markdown formatting for links, emphasis, and lists

This cleaning process is crucial because it ensures that the content is presented in a clean, consistent format that LLMs can easily parse.

// Function to strip HTML tags from content and clean formatting
function cleanMarkdownContent(html) {
  // Remove HTML tags
  let cleaned = html.replace(/<\/?(?:div|span|section)[^>]*>/g, '');
  
  // Convert HTML headings to markdown
  cleaned = cleaned.replace(/<h1[^>]*>(.*?)<\/h1>/g, '# $1');
  cleaned = cleaned.replace(/<h2[^>]*>(.*?)<\/h2>/g, '## $1');
  
  // Additional cleaning steps...
  
  return cleaned.trim();
}

4. Integration with Build Process

We integrated the llms.txt generation into our build process by:

  1. Adding the necessary scripts to our package.json
  2. Creating a dedicated script (create-llms-txt.js) to generate the file
  3. Setting up copy-md-versions.js to create clean markdown versions of all content
  4. Ensuring these scripts run automatically during build

Our build script now includes:

npm run preprocess && eleventy && node scripts/create-llms-txt.js && node scripts/copy-md-versions.js

5. GitHub Actions Integration

Finally, we ensured that our GitHub Actions workflow automatically generates and deploys the llms.txt file with each update to our site. This way, our LLM-friendly content is always in sync with the rest of our website.

Benefits of Our Implementation

Our approach to implementing llms.txt has several key benefits:

  1. Automation: The entire process runs automatically during our build process
  2. Completeness: All key content, including blog posts, is included
  3. Cleanliness: Content is presented in clean, consistent markdown
  4. Maintenance: As we add new content, it’s automatically incorporated
  5. Discoverability: LLMs can more easily find and understand our content

Best Practices We Followed

Based on our experience, here are some best practices for implementing llms.txt:

  1. Structure content logically: Organize your content in a way that makes sense for LLMs to navigate.
  2. Use proper markdown: Clean, consistent markdown formatting helps LLMs parse your content.
  3. Automate wherever possible: Build automation to keep your llms.txt in sync with your site.
  4. Include comprehensive information: Don’t just provide basic details; include enough depth for LLMs to understand your offerings.
  5. Update regularly: Ensure your llms.txt is updated whenever your site changes.

Conclusion

Implementing the llms.txt standard is an important step in making your website more accessible and understandable to large language models. By following the approach we’ve outlined, you can ensure that LLMs have accurate, up-to-date information about your organization when responding to user queries.

As AI continues to evolve, standards like llms.txt will become increasingly important for businesses looking to ensure their content is properly represented in the AI ecosystem. By getting ahead of this trend, you can position your organization for success in an AI-driven future.

Want to learn more about optimizing your web presence for AI? Contact us to discuss how we can help your organization navigate the evolving AI landscape.