catalog
AI-Ready Documentation Indexer
Generate llms.txt, llms-full.txt with PAI (Programmable AI) features. Complete llms.txt standard compliance with advanced embeddings, chunking, and RAG optimization.
Generated llms.txt (Structured Index)
Generated llms-full.txt (Complete Content)
Generated sitemap.xml (SEO Optimization)
Enterprise-Grade Documentation Processing
Complete llms.txt standard compliance with advanced features
llms.txt Standard Compliance
Complete H1 β blockquote β sections format. Validates output compliance and ensures proper structure for AI consumption.
HTML Processing
Full support for HTML files with automatic conversion to Markdown. Extracts metadata from HTML meta tags and preserves content structure.
Sitemap Generation
XML sitemaps for SEO optimization with intelligent priority assignment and change frequency detection based on content type.
Performance Monitoring
Real-time performance and memory usage tracking. Concurrent processing utilities with graceful degradation for large document sets.
Security Enhancements
Path validation, file scanning, input sanitization, and comprehensive security auditing with issue categorization.
Pattern-Based Processing
Advanced glob pattern support for include/exclude/optional content. Intelligent document ordering with path-based section generation.
PAI (Programmable AI) Features - Version 0.2.0
Advanced AI optimization for documentation workflows
Semantic Search with Embeddings
Generate embeddings for learnings and perform semantic similarity search. Hybrid search combining FTS5 full-text + semantic. Supports OpenAI, Ollama, and custom providers (~10 embeddings/sec).
RAG-Ready Document Chunking
Intelligent document splitting at heading boundaries with multiple profiles: default, code-heavy, faq, granular, large-context. Optimized for vector database ingestion.
Link Graph Analysis
Creates graph.json with document relationships, importance scoring, and broken link detection.
Analyze content structure and connectivity.
Semantic Tag Generation
Rule-based content classification with tags.json for filtering and search. Automatic
categorization improves document discovery.
Context Bundle Generation
Creates llms-ctx-{size}.txt files for different LLM context windows (2k, 8k, 32k
tokens). Optimized context delivery.
MCP Server Generation
Generate Model Context Protocol server for IDE integration (Cursor, Claude Code). Complete with configuration files for easy setup.
Source Integration
Pull documentation from remote sources before processing: GitHub, Git, HTTP, S3. Seamless integration with external documentation systems.
Framework Integrations
Ready-to-use integrations for LangChain and LlamaIndex. Quick integration with popular AI frameworks.
Learning Indexing
Automatic integration with inscribe learning system
Automatic Indexing
Learnings created with inscribe are automatically indexed via hook system.
PostLearningCreated hook spawns catalog index-file subprocess.
Fast Indexing
Average indexing time ~15ms per learning. Content hash deduplication skips re-indexing unchanged files. Upsert semantics update existing or insert new.
index-file Command
Index a single learning file: catalog index-file <path>. Supports custom database
paths and automatic FTS index updates.
Output Formats
Multiple formats for different use cases
Multiple formats for different use cases
llms.txt
Structured index with H1 β blockquote β sections format. Perfect for AI context windows with clear organization.
llms-full.txt
Complete concatenated content with clear separators. Full document content for comprehensive AI analysis.
llms-ctx.txt
Context-only without optional sections. Optimized for AI systems with limited context windows.
sitemap.xml
SEO-optimized XML sitemap with metadata-based priorities and change frequencies for search engines.
index.json
Navigation metadata for programmatic access. Comprehensive directory and file metadata with statistics.
π llms.txt Standard Compliance
Full compliance with the llms.txt standard for AI-ready documentation indexing
Installation
Get catalog running in seconds
Quick Install
Automatically downloads the right binary for your platform
Bun Package
Native Bun package for maximum performance
Docker
Containerized for consistent environments
Use Cases
Perfect for various documentation workflows
AI Training Data
Prepare documentation for AI model training or fine-tuning with properly structured and indexed content.
Knowledge Bases
Create searchable knowledge bases with comprehensive indexing and metadata for internal tools.
Documentation Sites
Generate navigation and SEO-optimized sitemaps for documentation websites and static site generators.
CI/CD Pipelines
Automate documentation processing in continuous integration with validation and compliance checking.
Integration Workflow
Works seamlessly with the fwdslsh ecosystem
Crawl
Use inform to extract web content
Index
catalog generates llms.txt files
Deploy
Ready for any platform
# Complete documentation pipeline inform https://docs.example.com --output-dir docs catalog --input docs --output build --base-url https://docs.example.com \ --optional "archive/**/*" --sitemap --validate --index
π§ AI Integration Ready
catalog is specifically designed for AI workflows:
- Context Optimization: llms-ctx.txt for context-limited AI systems
- Structured Format: Clean, parseable format for RAG applications
- Metadata Extraction: Rich metadata for vector database indexing
- Content Validation: Ensures high-quality AI training data
Ready to Index Your Documentation?
Transform your content into AI-ready knowledge bases