inform Documentation
Command Line Reference
Complete guide to all inform command-line options and parameters
Basic Syntax
inform <URL> [OPTIONS]
Extract content from a website starting at the specified URL.
Essential Options
--output-dir <directory>
Specify the output directory for extracted content. Creates directory if it doesn't exist.
inform https://docs.example.com --output-dir ./content
--max-pages <number>
Limit the maximum number of pages to crawl. Useful for large sites or testing.
inform https://blog.example.com --max-pages 50
--delay <milliseconds>
Add delay between requests to be respectful to target servers.
inform https://api-docs.com --delay 1000
Content Processing Options
--selector <css-selector>
Target specific content using CSS selectors. Perfect for extracting main content areas.
inform https://news.site --selector "article.main-content"
--include-links
Preserve internal links in the extracted Markdown content.
inform https://wiki.example.com --include-links
--follow-external
Follow links to external domains (use with caution).
inform https://linkfarm.site --follow-external --max-pages 10
--exclude-patterns <patterns>
Exclude URLs matching specified patterns (comma-separated).
inform https://docs.site --exclude-patterns "/admin,/login,*.pdf"
Performance and Rate Limiting
--concurrent <number>
Number of concurrent requests (default: 3). Higher values for faster crawling.
inform https://fast-site.com --concurrent 10
--timeout <seconds>
Request timeout in seconds. Useful for slow or unreliable sites.
inform https://slow-site.com --timeout 30
--user-agent <string>
Custom User-Agent string for requests.
inform https://api.site --user-agent "MyBot/1.0"
Advanced Features
Powerful capabilities for complex content extraction workflows
🎯 Content Quality Enhancement
inform automatically cleans and processes content for optimal readability and consistency.
Smart Content Detection
Automatically identifies main content areas, removing navigation, ads, and boilerplate.
Markdown Conversion
Converts HTML to clean, standards-compliant Markdown with proper formatting.
Link Processing
Intelligently handles internal and external links, with options for link preservation.
Image Handling
Downloads images and updates references in Markdown output (optional).
🏗️ Site Structure Analysis
Advanced crawling strategies that understand website architecture.
Sitemap Detection
Automatically discovers and uses XML sitemaps for comprehensive coverage.
Robots.txt Compliance
Respects robots.txt rules and crawl-delay directives.
URL Pattern Recognition
Intelligently follows URL patterns to discover content systematically.
Duplicate Detection
Avoids crawling duplicate content and similar pages.
Integration and Workflows
Connect inform with other tools and systems for powerful content workflows
fwdslsh Ecosystem
Seamlessly integrates with catalog, unify, and giv for complete documentation workflows from extraction to publication.
inform https://docs.site --output-dir content/
catalog --input content/ --output indexed/
unify build --input indexed/ --output site/
Performance Optimization
Optimize crawling for large sites with concurrent requests, rate limiting, and intelligent content targeting.
inform https://large-site.com \
--max-pages 1000 \
--concurrent 5 \
--delay 500
Troubleshooting
Handle JavaScript-heavy sites, rate limiting, and authentication with advanced configuration options.
inform https://spa-site.com \
--selector "main, .content" \
--timeout 30
Continue Learning
Explore more inform capabilities and related tools