catalog

AI-Ready Documentation Indexer

Generate llms.txt, llms-full.txt with PAI (Programmable AI) features. Complete llms.txt standard compliance with advanced embeddings, chunking, and RAG optimization.

📋 Get Started View Source

Generated llms.txt (Structured Index)

# Documentation Project

> Complete API and user guide documentation

## Core Documentation

- index.md - Project overview and introduction

- getting-started.md - Quick start guide

- tutorial.md - Step-by-step tutorial

## API Reference

- api/authentication.md - Authentication methods

- api/endpoints.md - API endpoints reference

- api/errors.md - Error handling guide

## Optional

- drafts/future-plans.md - Future development plans

- archive/changelog.md - Historical changes

Generated llms-full.txt (Complete Content)

# Documentation Project

> Complete API and user guide documentation

## index.md

Welcome to our documentation! This project provides...

---

## getting-started.md

To get started with our API, first install the SDK...

---

## api/authentication.md

Authentication is handled via API keys. Generate a new key...

[Full content continues for all files...]

Generated sitemap.xml (SEO Optimization)

<?xml version="1.0" encoding="UTF-8"?>

<url>

<loc>https://docs.example.com/</loc>

<changefreq>weekly</changefreq>

</url>

<url>

<loc>https://docs.example.com/getting-started</loc>

</url>

</urlset>

Enterprise-Grade Documentation Processing

Complete llms.txt standard compliance with advanced features

📋

llms.txt Standard Compliance

Complete H1 → blockquote → sections format. Validates output compliance and ensures proper structure for AI consumption.

🌐

HTML Processing

Full support for HTML files with automatic conversion to Markdown. Extracts metadata from HTML meta tags and preserves content structure.

🗺️

Sitemap Generation

XML sitemaps for SEO optimization with intelligent priority assignment and change frequency detection based on content type.

📊

Performance Monitoring

Real-time performance and memory usage tracking. Concurrent processing utilities with graceful degradation for large document sets.

🛡️

Security Enhancements

Path validation, file scanning, input sanitization, and comprehensive security auditing with issue categorization.

🎯

Pattern-Based Processing

Advanced glob pattern support for include/exclude/optional content. Intelligent document ordering with path-based section generation.

PAI (Programmable AI) Features - Version 0.2.0

Advanced AI optimization for documentation workflows

🧠

Semantic Search with Embeddings

Generate embeddings for learnings and perform semantic similarity search. Hybrid search combining FTS5 full-text + semantic. Supports OpenAI, Ollama, and custom providers (~10 embeddings/sec).

✂️

RAG-Ready Document Chunking

Intelligent document splitting at heading boundaries with multiple profiles: default, code-heavy, faq, granular, large-context. Optimized for vector database ingestion.

🔗

Link Graph Analysis

Creates graph.json with document relationships, importance scoring, and broken link detection. Analyze content structure and connectivity.

🏷️

Semantic Tag Generation

Rule-based content classification with tags.json for filtering and search. Automatic categorization improves document discovery.

📦

Context Bundle Generation

Creates llms-ctx-{size}.txt files for different LLM context windows (2k, 8k, 32k tokens). Optimized context delivery.

🌐

MCP Server Generation

Generate Model Context Protocol server for IDE integration (Cursor, Claude Code). Complete with configuration files for easy setup.

📥

Source Integration

Pull documentation from remote sources before processing: GitHub, Git, HTTP, S3. Seamless integration with external documentation systems.

🔄

Framework Integrations

Ready-to-use integrations for LangChain and LlamaIndex. Quick integration with popular AI frameworks.

Learning Indexing

Automatic integration with inscribe learning system

📝

Automatic Indexing

Learnings created with inscribe are automatically indexed via hook system. PostLearningCreated hook spawns catalog index-file subprocess.

⚡

Fast Indexing

Average indexing time ~15ms per learning. Content hash deduplication skips re-indexing unchanged files. Upsert semantics update existing or insert new.

🔍

index-file Command

Index a single learning file: catalog index-file <path>. Supports custom database paths and automatic FTS index updates.

Output Formats

Multiple formats for different use cases

📋

llms.txt

Structured index with H1 → blockquote → sections format. Perfect for AI context windows with clear organization.

catalog --validate

📚

llms-full.txt

Complete concatenated content with clear separators. Full document content for comprehensive AI analysis.

catalog --input docs

🎯

llms-ctx.txt

Context-only without optional sections. Optimized for AI systems with limited context windows.

catalog --optional "drafts/**/*"

🗺️

sitemap.xml

SEO-optimized XML sitemap with metadata-based priorities and change frequencies for search engines.

catalog --sitemap --base-url https://docs.example.com

📊

index.json

Navigation metadata for programmatic access. Comprehensive directory and file metadata with statistics.

catalog --index

🏆 llms.txt Standard Compliance

Full compliance with the llms.txt standard for AI-ready documentation indexing

✓ H1 → blockquote → sections format

✓ Proper section hierarchy validation

✓ Markdown link syntax compliance

✓ Intelligent document ordering

✓ Path-based section generation

✓ Optional content categorization

Installation

Get catalog running in seconds

⚡

Quick Install

# One-line install script curl -fsSL https://raw.githubusercontent.com/fwdslsh/catalog/main/install.sh | bash # Start indexing catalog --validate

Automatically downloads the right binary for your platform

📦

Bun Package

# Install with Bun bun install -g @fwdslsh/catalog # Or add to project bun add @fwdslsh/catalog

Native Bun package for maximum performance

🐳

Docker

# Pull latest image docker pull fwdslsh/catalog:latest # Run catalog docker run --rm fwdslsh/catalog --help

Containerized for consistent environments

Use Cases

Perfect for various documentation workflows

🤖

AI Training Data

Prepare documentation for AI model training or fine-tuning with properly structured and indexed content.

catalog --optional "examples/**/*" --validate

🔍

Knowledge Bases

Create searchable knowledge bases with comprehensive indexing and metadata for internal tools.

catalog --index --sitemap --base-url https://kb.company.com

📖

Documentation Sites

Generate navigation and SEO-optimized sitemaps for documentation websites and static site generators.

catalog --sitemap --sitemap-no-extensions --base-url https://docs.example.com

🔄

CI/CD Pipelines

Automate documentation processing in continuous integration with validation and compliance checking.

catalog --validate --performance-report --exit-on-error

Integration Workflow

Works seamlessly with the fwdslsh ecosystem

🌐

Crawl

Use inform to extract web content

📋

Index

catalog generates llms.txt files

🚀

Deploy

Ready for any platform

# Complete documentation pipeline
inform https://docs.example.com --output-dir docs
catalog --input docs --output build --base-url https://docs.example.com \
  --optional "archive/**/*" --sitemap --validate --index

🧠 AI Integration Ready

catalog is specifically designed for AI workflows:

Context Optimization: llms-ctx.txt for context-limited AI systems
Structured Format: Clean, parseable format for RAG applications
Metadata Extraction: Rich metadata for vector database indexing
Content Validation: Ensures high-quality AI training data

Ready to Index Your Documentation?

Transform your content into AI-ready knowledge bases

🚀 Getting Started Guide 📚 View Examples 📥 Download Binary

📂

Source Code

View on GitHub, report issues, contribute

🔄

Ecosystem Overview

See how all fwdslsh tools work together

📋

llms.txt Standard

Learn about the llms.txt standard