catalog - AI-Ready Documentation Indexer

catalog

AI-Ready Documentation Indexer

Generate llms.txt and llms-full.txt from Markdown/HTML directories. Complete llms.txt standard compliance with enterprise-grade features for AI-powered documentation workflows.

Generated llms.txt (Structured Index)

# Documentation Project
> Complete API and user guide documentation

## Core Documentation
- index.md - Project overview and introduction
- getting-started.md - Quick start guide
- tutorial.md - Step-by-step tutorial

## API Reference
- api/authentication.md - Authentication methods
- api/endpoints.md - API endpoints reference
- api/errors.md - Error handling guide

## Optional
- drafts/future-plans.md - Future development plans
- archive/changelog.md - Historical changes

Generated llms-full.txt (Complete Content)

# Documentation Project
> Complete API and user guide documentation

## index.md
Welcome to our documentation! This project provides...
---
## getting-started.md
To get started with our API, first install the SDK...
---
## api/authentication.md
Authentication is handled via API keys. Generate a new key...
[Full content continues for all files...]

Generated sitemap.xml (SEO Optimization)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://docs.example.com/</loc>
    <lastmod>2024-01-01T00:00:00Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://docs.example.com/getting-started</loc>
    <priority>0.8</priority>
  </url>
</urlset>

Enterprise-Grade Documentation Processing

Complete llms.txt standard compliance with advanced features

πŸ“‹

llms.txt Standard Compliance

Complete H1 β†’ blockquote β†’ sections format. Validates output compliance and ensures proper structure for AI consumption.

🌐

HTML Processing

Full support for HTML files with automatic conversion to Markdown. Extracts metadata from HTML meta tags and preserves content structure.

πŸ—ΊοΈ

Sitemap Generation

XML sitemaps for SEO optimization with intelligent priority assignment and change frequency detection based on content type.

πŸ“Š

Performance Monitoring

Real-time performance and memory usage tracking. Concurrent processing utilities with graceful degradation for large document sets.

πŸ›‘οΈ

Security Enhancements

Path validation, file scanning, input sanitization, and comprehensive security auditing with issue categorization.

🎯

Pattern-Based Processing

Advanced glob pattern support for include/exclude/optional content. Intelligent document ordering with path-based section generation.

Output Formats

Multiple formats for different use cases

πŸ“‹

llms.txt

Structured index with H1 β†’ blockquote β†’ sections format. Perfect for AI context windows with clear organization.

catalog --validate
πŸ“š

llms-full.txt

Complete concatenated content with clear separators. Full document content for comprehensive AI analysis.

catalog --input docs
🎯

llms-ctx.txt

Context-only without optional sections. Optimized for AI systems with limited context windows.

catalog --optional "drafts/**/*"
πŸ—ΊοΈ

sitemap.xml

SEO-optimized XML sitemap with metadata-based priorities and change frequencies for search engines.

catalog --sitemap --base-url https://docs.example.com
πŸ“Š

index.json

Navigation metadata for programmatic access. Comprehensive directory and file metadata with statistics.

catalog --index

πŸ† llms.txt Standard Compliance

Full compliance with the llms.txt standard for AI-ready documentation indexing

βœ“ H1 β†’ blockquote β†’ sections format
βœ“ Proper section hierarchy validation
βœ“ Markdown link syntax compliance
βœ“ Intelligent document ordering
βœ“ Path-based section generation
βœ“ Optional content categorization

Installation

Get catalog running in seconds

⚑

Quick Install

# One-line install script curl -fsSL https://raw.githubusercontent.com/fwdslsh/catalog/main/install.sh | bash # Start indexing catalog --validate

Automatically downloads the right binary for your platform

πŸ“¦

Bun Package

# Install with Bun bun install -g @fwdslsh/catalog # Or add to project bun add @fwdslsh/catalog

Native Bun package for maximum performance

🐳

Docker

# Pull latest image docker pull fwdslsh/catalog:latest # Run catalog docker run --rm fwdslsh/catalog --help

Containerized for consistent environments

Use Cases

Perfect for various documentation workflows

πŸ€–

AI Training Data

Prepare documentation for AI model training or fine-tuning with properly structured and indexed content.

catalog --optional "examples/**/*" --validate
πŸ”

Knowledge Bases

Create searchable knowledge bases with comprehensive indexing and metadata for internal tools.

catalog --index --sitemap --base-url https://kb.company.com
πŸ“–

Documentation Sites

Generate navigation and SEO-optimized sitemaps for documentation websites and static site generators.

catalog --sitemap --sitemap-no-extensions --base-url https://docs.example.com
πŸ”„

CI/CD Pipelines

Automate documentation processing in continuous integration with validation and compliance checking.

catalog --validate --performance-report --exit-on-error

Integration Workflow

Works seamlessly with the fwdslsh ecosystem

🌐

Crawl

Use inform to extract web content

πŸ“‹

Index

catalog generates llms.txt files

πŸ—οΈ

Build

unify creates static sites

πŸ“

Commit

giv generates commit messages

# Complete documentation pipeline
inform https://docs.example.com --output-dir docs
catalog --input docs --output build --base-url https://docs.example.com \
  --optional "archive/**/*" --sitemap --validate --index
unify build --input build --output dist
giv message

🧠 AI Integration Ready

catalog is specifically designed for AI workflows:

  • Context Optimization: llms-ctx.txt for context-limited AI systems
  • Structured Format: Clean, parseable format for RAG applications
  • Metadata Extraction: Rich metadata for vector database indexing
  • Content Validation: Ensures high-quality AI training data

Ready to Index Your Documentation?

Transform your content into AI-ready knowledge bases