catalog Documentation
Command Line Reference
Complete guide to all catalog command-line options and parameters
Basic Syntax
catalog [OPTIONS]
Generate llms.txt, llms-full.txt, and related files from Markdown/HTML directories.
Core Options
--input, -i <path>
Source directory of Markdown/HTML files (default: current directory)
catalog --input docs --output build
--output, -o <path>
Destination directory for generated files (default: current directory)
catalog --input docs --output build
--base-url <url>
Base URL for generating absolute links in output files
catalog --base-url https://docs.example.com
--silent
Suppress non-error output for automation
catalog --input docs --output build --silent
Content Selection
--include <pattern>
Include files matching glob pattern (can be used multiple times)
catalog --include "*.md" --include "guides/*.html"
--exclude <pattern>
Exclude files matching glob pattern (can be used multiple times)
catalog --exclude "**/*draft*" --exclude "temp/*"
--optional <pattern>
Mark files matching glob pattern as optional (can be used multiple times)
catalog --optional "drafts/**/*" --optional "**/CHANGELOG.md"
Output Generation
--validate
Validate generated llms.txt compliance with standard
catalog --input docs --output build --validate
--index
Generate index.json files for directory navigation and metadata
catalog --input docs --output build --index
--sitemap
Generate XML sitemap for search engines (requires --base-url)
catalog --sitemap --base-url https://docs.example.com
--sitemap-no-extensions
Generate sitemap URLs without file extensions for clean URLs
catalog --sitemap --sitemap-no-extensions --base-url https://docs.example.com
Complete Example Workflows
π€ AI Training Pipeline
catalog --input docs --output ai-training \
--optional "examples/**/*" \
--optional "appendix/**/*" \
--validate
Creates AI-optimized documentation with essential content prioritized and supplementary material marked as optional.
π Documentation Website
catalog --input docs --output build \
--base-url https://docs.example.com \
--sitemap --sitemap-no-extensions \
--index --validate
Complete documentation site preparation with SEO optimization, navigation metadata, and standards compliance.
π CI/CD Integration
catalog --input docs --output dist \
--validate \
--silent
Automated documentation processing with validation and silent operation for continuous integration pipelines.
llms.txt Standard Compliance
Understanding the llms.txt format and catalog's enterprise-grade compliance features
π Complete Standard Implementation
catalog provides full compliance with the llms.txt standard for AI-ready documentation indexing:
H1 β Blockquote β Sections Format
Proper structure with title, description, and organized sections
Section Hierarchy Validation
Ensures correct H2 section organization and ordering
Markdown Link Syntax Compliance
Validates proper link formatting and descriptions
Intelligent Document Ordering
Prioritizes important documentation with smart organization
Path-Based Section Generation
Automatic organization using directory structure
Optional Content Categorization
Separates core and supplementary content appropriately
π Standard Format Structure
The llms.txt format follows a specific structure that catalog implements perfectly:
Required Format Elements
# Project Title
> Brief project description
## Section Name
- [file.md](file.md) - Optional file description
- [another-file.md](another-file.md) - Another description
## Another Section
- [section/file.md](section/file.md) - Organized by directory
- [section/other.md](section/other.md) - Maintains structure
## Optional
- [drafts/future.md](drafts/future.md) - Supplementary content
- [archive/old.md](archive/old.md) - Historical documentation
Format Rules
- H1 Title: Single project title at the top
- Blockquote Description: Brief project description following the title
- H2 Sections: Organized content sections with meaningful names
- Markdown Links: Proper link syntax with optional descriptions
- Optional Section: Separate section for supplementary content
- Consistent Structure: Maintains organization and readability
β Validation Features
catalog includes comprehensive validation to ensure your output meets the standard:
Structure Validation
# Validates proper H1 β blockquote β sections format
catalog --validate
# Example validation output:
β
H1 title found: "Documentation Project"
β
Blockquote description found
β
Proper section hierarchy (H2 sections)
β
Valid Markdown link syntax
β
Appropriate content organization
Link Format Checking
# Ensures all links use proper Markdown syntax
β Error: Invalid link format found
Line 15: [file.md] - Missing parentheses
Should be: [file.md](file.md) - Description
β
Suggestion: Use proper Markdown link syntax
URL Validation
# When using --base-url, validates absolute URLs
catalog --base-url https://docs.example.com --validate
β
All URLs properly formatted with base URL
β
No broken or malformed links detected
β
Consistent URL structure maintained
File Processing and Content Extraction
How catalog intelligently processes different file types and extracts metadata
π Supported File Types
Markdown Files (.md, .mdx)
Full support for Markdown with YAML frontmatter extraction and content processing.
HTML Files (.html)
Automatic conversion to Markdown with meta tag extraction and content cleaning.
π§ Intelligent Document Ordering
catalog uses sophisticated logic to organize documents in a meaningful hierarchy:
Index/Root Files
Prioritizes index.md
, readme.md
, home.md
files
Important Documentation
Files containing keywords like catalog
, tutorial
, intro
, getting-started
Path-Based Sections
Automatic organization by directory structure (e.g., api/
, guides/
)
Alphabetical Fallback
Within sections, files are sorted alphabetically for consistent organization
π Metadata Extraction
catalog automatically extracts metadata from multiple sources:
π§Ή Content Processing Pipeline
catalog uses a sophisticated multi-stage processing pipeline:
Discovery & Scanning
Recursive directory traversal with pattern matching and security validation
Content Extraction
YAML frontmatter stripping, HTML processing, and metadata extraction
Organization
Intelligent ordering, section generation, and content categorization
Generation
Multiple output formats with validation and optimization
Output Formats and Features
Comprehensive guide to all output formats generated by catalog
llms.txt (Structured Index)
Standard-compliant structured index with H1 β blockquote β sections format, perfect for AI context windows with clear organization.
# Documentation Project
> Complete API and user guide documentation
## Core Documentation
- [index.md](index.md) - Project overview and introduction
- [getting-started.md](getting-started.md) - Quick start guide
## API Reference
- [api/authentication.md](api/authentication.md) - Authentication methods
- [api/endpoints.md](api/endpoints.md) - API endpoints reference
## Optional
- [drafts/future-plans.md](drafts/future-plans.md) - Future development plans
llms-full.txt (Complete Content)
Complete concatenated content with clear separators for comprehensive AI analysis and training data preparation.
# Documentation Project
> Complete API and user guide documentation
## index.md
# Welcome to Our Documentation
[Complete content with frontmatter stripped]
---
## getting-started.md
# Getting Started Guide
[Full content continues...]
---
[Content continues for all files...]
llms-ctx.txt (Context-Only)
Structured index without optional sections, optimized for AI systems with limited context windows.
# Documentation Project
> Complete API and user guide documentation
## Core Documentation
- [index.md](index.md) - Project overview and introduction
- [getting-started.md](getting-started.md) - Quick start guide
## API Reference
- [api/authentication.md](api/authentication.md) - Authentication methods
- [api/endpoints.md](api/endpoints.md) - API endpoints reference
# Note: Optional sections excluded for context optimization
sitemap.xml (SEO Optimization)
XML sitemap with intelligent priority assignment and change frequency detection for search engine optimization.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://docs.example.com/</loc>
<lastmod>2024-01-15T10:30:00Z</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
index.json (Navigation Metadata)
Comprehensive directory and file metadata for programmatic navigation and content management.
{
"directory": ".",
"generated": "2024-01-15T10:30:00Z",
"files": [
{
"name": "index.md",
"path": "index.md",
"size": 1234,
"modified": "2024-01-15T10:30:00Z",
"type": "md",
"isMarkdown": true
}
],
"summary": {
"totalFiles": 5,
"markdownFiles": 3,
"totalSize": 12543
}
}
Enterprise Features
Advanced capabilities for large-scale documentation processing
π‘οΈ Security and Validation
Path Traversal Prevention
Blocks ../
sequences and validates all file paths to prevent directory traversal attacks.
Input Sanitization
All user inputs are validated and sanitized to prevent injection attacks and ensure safe processing.
Content Scanning
Detects malicious patterns, suspicious URLs, and potentially harmful content during processing.
File Size Limits
Configurable limits prevent processing of extremely large files that could cause memory issues.
π Performance Monitoring
Real-Time Performance Tracking
π Performance Report:
Total Time: 147ms
Memory Usage:
Heap Used: 12.45MB
RSS: 89.23MB
Memory Delta:
Heap: +2.1MB
RSS: +5.7MB
Operations:
file_scanning: 23ms
content_processing: 89ms
sitemap_generation: 12ms
files_processed: 42
total_file_size: 2.3MB
- Detailed timing for all major operations
- Memory usage monitoring and optimization
- Processing statistics and bottleneck identification
- Concurrent processing utilities for large document sets
π§ Error Handling and Recovery
Actionable Error Messages
β Error in file processing: Permission denied: /protected/file.md
Details:
EACCES: permission denied
Suggestions:
β Check file permissions
β Ensure the directory is not locked by another process
β Try running with appropriate permissions
- Graceful degradation when individual files fail
- Comprehensive error categorization with recovery suggestions
- Standard exit codes for reliable automation
- Detailed logging with security-focused error handling
Integration Architecture
How catalog fits into enterprise documentation workflows
Content Sources
Markdown, HTML, Git repos
inform
Web content extraction
catalog
AI-ready indexing
AI Systems
Training, RAG, Context
Integration Benefits
Seamless Workflow
Works perfectly with inform for web content extraction and unify for site generation
Multiple Outputs
Single source generates formats for AI training, context windows, and SEO optimization
CI/CD Ready
Standard exit codes and silent operation for automated documentation pipelines
Enterprise Security
Comprehensive security validation and monitoring for production environments
Continue Learning
Explore more catalog capabilities and related tools