inform
High-Performance Web Content Crawler
Crawl websites, extract main content, and convert to clean Markdown. Powered by Bun for maximum performance with concurrent processing and zero dependencies.
Built for Speed and Reliability
Modern web crawling without the complexity
Powered by Bun
Significantly faster than Node.js with built-in optimizations. Native DOM parsing and zero-dependency HTML processing for maximum performance.
Concurrent Crawling
Process multiple pages simultaneously with configurable concurrency limits. Intelligent rate limiting and backoff strategies.
Smart Content Extraction
Intelligently identifies main content by removing navigation, ads, and other non-content elements. Preserves structure and formatting.
Clean Markdown Output
Converts HTML to properly formatted Markdown. Code examples become code blocks, maintains heading hierarchy and link structure.
Structure Preservation
Maintains original folder structure from URLs. /docs/api/ becomes docs/api.md with meaningful filenames based on content.
Flexible Configuration
Configurable delays, concurrency limits, include/exclude patterns, and output formats. Works with any website structure.
Multiple Content Sources
Crawl from websites, Git repositories, and more
Website Crawling
Crawl any website with automatic link discovery and same-domain restriction
Git Repository Downloads
Download specific directories from GitHub repositories without cloning the entire repo
Pattern-Based Filtering
Use glob patterns to include or exclude specific content during crawling
Custom Configuration
Fine-tune performance with custom concurrency limits and request delays
Installation
Get inform running in seconds
Quick Install
Automatically downloads the right binary for your platform
Bun Package
Use with Bun's superior performance and built-in features
Docker
Containerized for consistent environments and CI/CD
Performance Comparison
Why inform is the fastest choice for web crawling
Feature | inform | Puppeteer | Scrapy | wget |
---|---|---|---|---|
Content Extraction | ✓ Smart | ✓ Full | ✓ Custom | ✗ Raw Only |
Markdown Output | ✓ Built-in | ✗ Manual | ✗ Manual | ✗ None |
Concurrent Processing | ✓ Native | ✓ Heavy | ✓ Yes | ✗ Sequential |
Memory Usage | ✓ Low | ✗ High | ✓ Medium | ✓ Low |
Setup Complexity | ✓ Zero | ✗ Complex | ✗ Complex | ✓ Simple |
Integration Workflow
Perfect for documentation pipelines and content management
Crawl Content
Extract content from websites or repositories
Generate Index
Use catalog to create llms.txt files
Build Site
Create static sites with unify
Deploy
Publish to your hosting platform
# Complete documentation workflow inform https://docs.example.com --output-dir docs catalog --input docs --output build --sitemap --base-url https://docs.example.com unify build --input build --output dist giv message # AI-powered commit for the updates
Ready to Start Crawling?
Transform web content into clean, usable Markdown