catalog

AI-Ready Documentation Indexer

Generate llms.txt, llms-full.txt with PAI (Programmable AI) features. Complete llms.txt standard compliance with advanced embeddings, chunking, and RAG optimization.

Generated llms.txt (Structured Index)

# Documentation Project
> Complete API and user guide documentation

## Core Documentation
- index.md - Project overview and introduction
- getting-started.md - Quick start guide
- tutorial.md - Step-by-step tutorial

## API Reference
- api/authentication.md - Authentication methods
- api/endpoints.md - API endpoints reference
- api/errors.md - Error handling guide

## Optional
- drafts/future-plans.md - Future development plans
- archive/changelog.md - Historical changes

Generated llms-full.txt (Complete Content)

# Documentation Project
> Complete API and user guide documentation

## index.md
Welcome to our documentation! This project provides...
---
## getting-started.md
To get started with our API, first install the SDK...
---
## api/authentication.md
Authentication is handled via API keys. Generate a new key...
[Full content continues for all files...]

Generated sitemap.xml (SEO Optimization)

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://docs.example.com/</loc>
    <lastmod>2024-01-01T00:00:00Z</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://docs.example.com/getting-started</loc>
    <priority>0.8</priority>
  </url>
</urlset>

Enterprise-Grade Documentation Processing

Complete llms.txt standard compliance with advanced features

πŸ“‹

llms.txt Standard Compliance

Complete H1 β†’ blockquote β†’ sections format. Validates output compliance and ensures proper structure for AI consumption.

🌐

HTML Processing

Full support for HTML files with automatic conversion to Markdown. Extracts metadata from HTML meta tags and preserves content structure.

πŸ—ΊοΈ

Sitemap Generation

XML sitemaps for SEO optimization with intelligent priority assignment and change frequency detection based on content type.

πŸ“Š

Performance Monitoring

Real-time performance and memory usage tracking. Concurrent processing utilities with graceful degradation for large document sets.

πŸ›‘οΈ

Security Enhancements

Path validation, file scanning, input sanitization, and comprehensive security auditing with issue categorization.

🎯

Pattern-Based Processing

Advanced glob pattern support for include/exclude/optional content. Intelligent document ordering with path-based section generation.

PAI (Programmable AI) Features - Version 0.2.0

Advanced AI optimization for documentation workflows

🧠

Semantic Search with Embeddings

Generate embeddings for learnings and perform semantic similarity search. Hybrid search combining FTS5 full-text + semantic. Supports OpenAI, Ollama, and custom providers (~10 embeddings/sec).

βœ‚οΈ

RAG-Ready Document Chunking

Intelligent document splitting at heading boundaries with multiple profiles: default, code-heavy, faq, granular, large-context. Optimized for vector database ingestion.

πŸ”—

Link Graph Analysis

Creates graph.json with document relationships, importance scoring, and broken link detection. Analyze content structure and connectivity.

🏷️

Semantic Tag Generation

Rule-based content classification with tags.json for filtering and search. Automatic categorization improves document discovery.

πŸ“¦

Context Bundle Generation

Creates llms-ctx-{size}.txt files for different LLM context windows (2k, 8k, 32k tokens). Optimized context delivery.

🌐

MCP Server Generation

Generate Model Context Protocol server for IDE integration (Cursor, Claude Code). Complete with configuration files for easy setup.

πŸ“₯

Source Integration

Pull documentation from remote sources before processing: GitHub, Git, HTTP, S3. Seamless integration with external documentation systems.

πŸ”„

Framework Integrations

Ready-to-use integrations for LangChain and LlamaIndex. Quick integration with popular AI frameworks.

Learning Indexing

Automatic integration with inscribe learning system

πŸ“

Automatic Indexing

Learnings created with inscribe are automatically indexed via hook system. PostLearningCreated hook spawns catalog index-file subprocess.

⚑

Fast Indexing

Average indexing time ~15ms per learning. Content hash deduplication skips re-indexing unchanged files. Upsert semantics update existing or insert new.

πŸ”

index-file Command

Index a single learning file: catalog index-file <path>. Supports custom database paths and automatic FTS index updates.

Output Formats

Multiple formats for different use cases

Multiple formats for different use cases

πŸ“‹

llms.txt

Structured index with H1 β†’ blockquote β†’ sections format. Perfect for AI context windows with clear organization.

catalog --validate
πŸ“š

llms-full.txt

Complete concatenated content with clear separators. Full document content for comprehensive AI analysis.

catalog --input docs
🎯

llms-ctx.txt

Context-only without optional sections. Optimized for AI systems with limited context windows.

catalog --optional "drafts/**/*"
πŸ—ΊοΈ

sitemap.xml

SEO-optimized XML sitemap with metadata-based priorities and change frequencies for search engines.

catalog --sitemap --base-url https://docs.example.com
πŸ“Š

index.json

Navigation metadata for programmatic access. Comprehensive directory and file metadata with statistics.

catalog --index

πŸ† llms.txt Standard Compliance

Full compliance with the llms.txt standard for AI-ready documentation indexing

βœ“ H1 β†’ blockquote β†’ sections format
βœ“ Proper section hierarchy validation
βœ“ Markdown link syntax compliance
βœ“ Intelligent document ordering
βœ“ Path-based section generation
βœ“ Optional content categorization

Installation

Get catalog running in seconds

⚑

Quick Install

# One-line install script curl -fsSL https://raw.githubusercontent.com/fwdslsh/catalog/main/install.sh | bash # Start indexing catalog --validate

Automatically downloads the right binary for your platform

πŸ“¦

Bun Package

# Install with Bun bun install -g @fwdslsh/catalog # Or add to project bun add @fwdslsh/catalog

Native Bun package for maximum performance

🐳

Docker

# Pull latest image docker pull fwdslsh/catalog:latest # Run catalog docker run --rm fwdslsh/catalog --help

Containerized for consistent environments

Use Cases

Perfect for various documentation workflows

πŸ€–

AI Training Data

Prepare documentation for AI model training or fine-tuning with properly structured and indexed content.

catalog --optional "examples/**/*" --validate
πŸ”

Knowledge Bases

Create searchable knowledge bases with comprehensive indexing and metadata for internal tools.

catalog --index --sitemap --base-url https://kb.company.com
πŸ“–

Documentation Sites

Generate navigation and SEO-optimized sitemaps for documentation websites and static site generators.

catalog --sitemap --sitemap-no-extensions --base-url https://docs.example.com
πŸ”„

CI/CD Pipelines

Automate documentation processing in continuous integration with validation and compliance checking.

catalog --validate --performance-report --exit-on-error

Integration Workflow

Works seamlessly with the fwdslsh ecosystem

🌐

Crawl

Use inform to extract web content

πŸ“‹

Index

catalog generates llms.txt files

πŸš€

Deploy

Ready for any platform

# Complete documentation pipeline
inform https://docs.example.com --output-dir docs
catalog --input docs --output build --base-url https://docs.example.com \
  --optional "archive/**/*" --sitemap --validate --index

🧠 AI Integration Ready

catalog is specifically designed for AI workflows:

  • Context Optimization: llms-ctx.txt for context-limited AI systems
  • Structured Format: Clean, parseable format for RAG applications
  • Metadata Extraction: Rich metadata for vector database indexing
  • Content Validation: Ensures high-quality AI training data

Ready to Index Your Documentation?

Transform your content into AI-ready knowledge bases