Getting Started with inform - High-Performance Web Content Crawler

Getting Started with inform

From installation to your first content extraction in minutes
Learn how to extract and convert web content to documentation-ready Markdown with inform, the powerful web crawler designed for content teams and developers.

Installation Options

Choose the installation method that works best for your environment

One-Line Install Script

# Automatically detects your platform and installs the binary
curl -fsSL https://raw.githubusercontent.com/fwdslsh/inform/main/install.sh | sh

# Verify installation
inform --version

# Start crawling immediately
inform https://docs.example.com

The install script automatically detects your platform (Linux, macOS, Windows) and downloads the appropriate binary. No dependencies required!

📦

Install with Bun

# Install globally with Bun
bun install -g @fwdslsh/inform

# Or add to project
bun add @fwdslsh/inform

# Run inform
inform --help

Use with Bun's superior performance and built-in features. Requires Bun v1.0.0 or higher.

🐳

Docker Container

# Pull latest image
docker pull fwdslsh/inform:latest

# Run crawler with mounted output directory
docker run --rm -v $(pwd)/output:/output fwdslsh/inform https://docs.example.com --output-dir /output

# Or use interactive shell
docker run -it fwdslsh/inform bash

Containerized for consistent environments and CI/CD pipelines. Perfect for automated content extraction workflows.

📥

Manual Download

# Download from GitHub Releases
# Visit: https://github.com/fwdslsh/inform/releases

# Linux
wget https://github.com/fwdslsh/inform/releases/latest/download/inform-linux
chmod +x inform-linux
./inform-linux --help

# macOS  
curl -L -o inform-mac https://github.com/fwdslsh/inform/releases/latest/download/inform-mac
chmod +x inform-mac
./inform-mac --help

# Windows
# Download inform-win.exe and run from command prompt

Download pre-built binaries directly. Each release includes binaries for Linux, macOS, and Windows.

Your First Content Extraction

Learn the basics with hands-on examples

🎯

Single Page Extraction

Extract one page to see how inform works

# Extract a single page
inform https://docs.example.com/getting-started

# Output: Creates getting-started.md in current directory

Result: Clean Markdown file with just the main content, navigation and ads removed.

🌐

Full Site Extraction

Crawl an entire documentation site with structure preservation

# Extract up to 50 pages from a documentation site
inform https://docs.example.com \
  --output-dir extracted-docs \
  --max-pages 50 \
  --delay 1000

Result: Complete site structure with organized folders and properly formatted Markdown files.

🚀

Performance Optimized

High-speed extraction with concurrent processing

# High-performance extraction
inform https://large-site.com \
  --concurrency 5 \
  --delay 500 \
  --max-pages 200

Result: Fast parallel processing while respecting server limits and rate limiting.

🎛️

Content Filtering

Extract only specific content with pattern matching

# Extract only documentation pages
inform https://mixed-site.com \
  --include "*/docs/*" \
  --exclude "*/blog/*" \
  --output-dir docs-only

Result: Precisely filtered content matching your inclusion and exclusion patterns.

Understanding Output

How inform structures and formats extracted content

📁 File Structure Preservation

Inform maintains the original site's URL structure for easy navigation:

Original URLs

https://docs.example.com/ https://docs.example.com/guide/setup https://docs.example.com/api/auth https://docs.example.com/tutorials/basics

Generated Files

extracted-docs/ ├── index.md ├── guide/ │ └── setup.md ├── api/ │ └── auth.md └── tutorials/ └── basics.md

📝 Markdown Format

Each extracted file includes metadata and clean formatting:

--- title: "Getting Started Guide" url: "https://docs.example.com/getting-started" extracted_at: "2024-01-15T10:30:00Z" --- # Getting Started Guide Clean content with proper formatting, links, and images preserved. ## Features and Benefits - Lists are properly formatted - **Bold** and *italic* text preserved - [Links](https://example.com) work correctly - Images ![alt text](image-url) are included > Blockquotes and code blocks are maintained ```javascript // Code examples become proper code blocks function example() { return "formatted correctly"; } ```

Common Usage Patterns

Real-world workflows for different use cases

📚

Documentation Migration

5 minutes

Moving from an old documentation platform to a new one

# Step 1: Extract all content
inform https://old-docs.company.com \
  --output-dir migrated-content \
  --max-pages 200 \
  --delay 2000

# Step 2: Review and organize
ls migrated-content/
# Edit files as needed, organize structure

# Step 3: Import to new platform
# Use with unify for static site generation
🔍

Content Research

3 minutes

Analyze competitor documentation and content strategies

# Extract competitor docs for analysis
inform https://competitor.com/docs \
  --output-dir research/competitor-analysis \
  --max-pages 30 \
  --delay 3000

# Create structured analysis
inform https://competitor.com/docs \
  --output-dir research/summaries
🗃️

Content Backup

2 minutes

Preserve important web content for archival purposes

# Create complete backup with date
inform https://important-site.com \
  --output-dir "backups/important-site-$(date +%Y-%m-%d)" \
  --max-pages 100 \
  --delay 1500

# Include images and preserve structure  
inform https://site.com \
  --output-dir backup \
  --max-pages 50

Integration with fwdslsh Ecosystem

Combine inform with other tools for powerful workflows

i

inform

Crawl and extract content

inform https://docs.site.com --output-dir content
c

catalog

Generate llms.txt indexes

catalog --input content --output indexed
u

unify

Build static sites

unify build --input indexed --output dist
g

giv

Generate commit messages

giv message

🔄 Complete Documentation Pipeline

# Extract content from multiple sources
inform https://docs.example.com --output-dir docs
inform https://api.example.com --output-dir api --include "*/reference/*"

# Generate AI-ready indexes  
catalog --input docs --input api --output build --sitemap --base-url https://newdocs.com

# Build beautiful static site
unify build --input build --output dist

# Professional commit with AI
giv message
# "docs: migrate complete documentation with API references"

What's Next?

Continue your inform journey