Getting Started with inform - High-Performance Web Content Crawler

Installation Options

Choose the installation method that works best for your environment

⚡

One-Line Install Script

# Automatically detects your platform and installs the binary
curl -fsSL https://raw.githubusercontent.com/fwdslsh/inform/main/install.sh | sh

# Verify installation
inform --version

# Start crawling immediately
inform https://docs.example.com

The install script automatically detects your platform (Linux, macOS, Windows) and downloads the appropriate binary. No dependencies required!

📦

Install with Bun

# Install globally with Bun
bun install -g @fwdslsh/inform

# Or add to project
bun add @fwdslsh/inform

# Run inform
inform --help

Use with Bun's superior performance and built-in features. Requires Bun v1.0.0 or higher.

🐳

Docker Container

# Pull latest image
docker pull fwdslsh/inform:latest

# Run crawler with mounted output directory
docker run --rm -v $(pwd)/output:/output fwdslsh/inform https://docs.example.com --output-dir /output

# Or use interactive shell
docker run -it fwdslsh/inform bash

Containerized for consistent environments and CI/CD pipelines. Perfect for automated content extraction workflows.

📥

Manual Download

# Download from GitHub Releases
# Visit: https://github.com/fwdslsh/inform/releases

# Linux
wget https://github.com/fwdslsh/inform/releases/latest/download/inform-linux
chmod +x inform-linux
./inform-linux --help

# macOS  
curl -L -o inform-mac https://github.com/fwdslsh/inform/releases/latest/download/inform-mac
chmod +x inform-mac
./inform-mac --help

# Windows
# Download inform-win.exe and run from command prompt

Download pre-built binaries directly. Each release includes binaries for Linux, macOS, and Windows.

Your First Content Extraction

Learn the basics with hands-on examples

🎯

Single Page Extraction

Extract one page to see how inform works

# Extract a single page
inform https://docs.example.com/getting-started

# Output: Creates getting-started.md in current directory

Result: Clean Markdown file with just the main content, navigation and ads removed.

🌐

Full Site Extraction

Crawl an entire documentation site with structure preservation

# Extract up to 50 pages from a documentation site
inform https://docs.example.com \
  --output-dir extracted-docs \
  --max-pages 50 \
  --delay 1000

Result: Complete site structure with organized folders and properly formatted Markdown files.

🚀

Performance Optimized

High-speed extraction with concurrent processing

# High-performance extraction
inform https://large-site.com \
  --concurrency 5 \
  --delay 500 \
  --max-pages 200

Result: Fast parallel processing while respecting server limits and rate limiting.

🎛️

Content Filtering

Extract only specific content with pattern matching

# Extract only documentation pages
inform https://mixed-site.com \
  --include "*/docs/*" \
  --exclude "*/blog/*" \
  --output-dir docs-only

Result: Precisely filtered content matching your inclusion and exclusion patterns.

Understanding Output

How inform structures and formats extracted content

📁 File Structure Preservation

Inform maintains the original site's URL structure for easy navigation:

Original URLs

                                    https://docs.example.com/
https://docs.example.com/guide/setup
https://docs.example.com/api/auth
https://docs.example.com/tutorials/basics
                                

→

Generated Files

                                    extracted-docs/
├── index.md
├── guide/
│   └── setup.md
├── api/
│   └── auth.md
└── tutorials/
    └── basics.md
                                

📝 Markdown Format

Each extracted file includes metadata and clean formatting:

                            ---
title: "Getting Started Guide"
url: "https://docs.example.com/getting-started"
extracted_at: "2024-01-15T10:30:00Z"
---

# Getting Started Guide

Clean content with proper formatting, links, and images preserved.

## Features and Benefits

- Lists are properly formatted
- **Bold** and *italic* text preserved  
- [Links](https://example.com) work correctly
- Images ![alt text](image-url) are included

> Blockquotes and code blocks are maintained

```javascript
// Code examples become proper code blocks
function example() {
    return "formatted correctly";
}
```
                        

Common Usage Patterns

Real-world workflows for different use cases

📚

Documentation Migration

5 minutes

Moving from an old documentation platform to a new one

# Step 1: Extract all content
inform https://old-docs.company.com \
  --output-dir migrated-content \
  --max-pages 200 \
  --delay 2000

# Step 2: Review and organize
ls migrated-content/
# Edit files as needed, organize structure

# Step 3: Import to new platform
# Use with unify for static site generation

📖

Migration Examples Complete migration patterns

🔍

Content Research

3 minutes

Analyze competitor documentation and content strategies

# Extract competitor docs for analysis
inform https://competitor.com/docs \
  --output-dir research/competitor-analysis \
  --max-pages 30 \
  --delay 3000

# Create structured analysis
inform https://competitor.com/docs \
  --output-dir research/summaries

🔬

Research Workflows Content analysis patterns

🗃️

Content Backup

2 minutes

Preserve important web content for archival purposes

# Create complete backup with date
inform https://important-site.com \
  --output-dir "backups/important-site-$(date +%Y-%m-%d)" \
  --max-pages 100 \
  --delay 1500

# Include images and preserve structure  
inform https://site.com \
  --output-dir backup \
  --max-pages 50

💾

Backup Strategies Content preservation methods

Integration with fwdslsh Ecosystem

Combine inform with other tools for powerful workflows

i

inform

Crawl and extract content

inform https://docs.site.com --output-dir content

→

c

catalog

Generate llms.txt indexes

catalog --input content --output indexed

→

u

unify

Build static sites

unify build --input indexed --output dist

→

g

giv

Generate commit messages

giv message

🔄 Complete Documentation Pipeline

# Extract content from multiple sources
inform https://docs.example.com --output-dir docs
inform https://api.example.com --output-dir api --include "*/reference/*"

# Generate AI-ready indexes  
catalog --input docs --input api --output build --sitemap --base-url https://newdocs.com

# Build beautiful static site
unify build --input build --output dist

# Professional commit with AI
giv message
# "docs: migrate complete documentation with API references"

🔄

Complete Ecosystem See how all tools work together

🛠️

Integration Examples Real-world workflow patterns

What's Next?

Continue your inform journey

📚

Complete Documentation

Master all inform features with comprehensive guides and CLI reference

→

🎯

Real-World Examples

Common crawling patterns, migration workflows, and integration scenarios

→

📥

Download Binary

Get the latest release with pre-built binaries for all platforms

→