inform - High-Performance Web Content Crawler

inform

High-Performance Web Content Crawler

Crawl websites, extract main content, and convert to clean Markdown. Powered by Bun for maximum performance with concurrent processing and zero dependencies.

⚡ Get Started View Source

✓ https://docs.example.com/ 12.3KB

✓ https://docs.example.com/api/auth 8.7KB

✓ https://docs.example.com/guides/setup 15.2KB

⏳ https://docs.example.com/tutorials/ crawling...

Pages Crawled

2.3MB

Content Extracted

847ms

Total Time

Concurrent

Built for Speed and Reliability

Modern web crawling without the complexity

🚀

Powered by Bun

Significantly faster than Node.js with built-in optimizations. Native DOM parsing and zero-dependency HTML processing for maximum performance.

⚡

Concurrent Crawling

Process multiple pages simultaneously with configurable concurrency limits. Intelligent rate limiting and backoff strategies.

🎯

Smart Content Extraction

Intelligently identifies main content by removing navigation, ads, and other non-content elements. Preserves structure and formatting.

📝

Clean Markdown Output

Converts HTML to properly formatted Markdown. Code examples become code blocks, maintains heading hierarchy and link structure.

🗂️

Structure Preservation

Maintains original folder structure from URLs. /docs/api/ becomes docs/api.md with meaningful filenames based on content.

🔧

Flexible Configuration

Configurable delays, concurrency limits, include/exclude patterns, and output formats. Works with any website structure.

Installation

Get inform running in seconds

⚡

Quick Install

# One-line install script curl -fsSL https://raw.githubusercontent.com/fwdslsh/inform/main/install.sh | sh # Start crawling immediately inform https://docs.example.com

Automatically downloads the right binary for your platform

📦

Bun Package

# Install globally with Bun bun install -g @fwdslsh/inform # Or add to project bun add @fwdslsh/inform

Use with Bun's superior performance and built-in features

🐳

Docker

# Pull latest image docker pull fwdslsh/inform:latest # Run crawler docker run --rm fwdslsh/inform https://docs.example.com

Containerized for consistent environments and CI/CD

Multiple Content Sources

Crawl from websites, Git repositories, and more

🌐

Website Crawling

inform https://docs.example.com

Crawl any website with automatic link discovery and same-domain restriction

📂

Git Repository Downloads

inform github.com/owner/repo/tree/main/docs

Download specific directories from GitHub repositories without cloning the entire repo

🎯

Pattern-Based Filtering

inform site.com --include "*.md" --exclude "temp/*"

Use glob patterns to include or exclude specific content during crawling

🔧

Custom Configuration

inform site.com --concurrency 10 --delay 100

Fine-tune performance with custom concurrency limits and request delays

Performance Comparison

Why inform is the fastest choice for web crawling

Faster than Node.js crawlers

External dependencies

95%

Content extraction accuracy

5MB

Binary size

Feature	inform	Puppeteer	Scrapy	wget
Content Extraction	✓ Smart	✓ Full	✓ Custom	✗ Raw Only
Markdown Output	✓ Built-in	✗ Manual	✗ Manual	✗ None
Concurrent Processing	✓ Native	✓ Heavy	✓ Yes	✗ Sequential
Memory Usage	✓ Low	✗ High	✓ Medium	✓ Low
Setup Complexity	✓ Zero	✗ Complex	✗ Complex	✓ Simple

Integration Workflow

Perfect for documentation pipelines and content management

🌐

Crawl Content

Extract content from websites or repositories

📋

Generate Index

Use catalog to create llms.txt files

🏗️

Build Site

Create static sites with unify

🚀

Deploy

Publish to your hosting platform

# Complete documentation workflow
inform https://docs.example.com --output-dir docs
catalog --input docs --output build --sitemap --base-url https://docs.example.com
unify build --input build --output dist
giv message # AI-powered commit for the updates

🔄