inform Examples
Documentation Migration Patterns
Proven workflows for migrating documentation between platforms
Legacy Platform Migration
IntermediateComplete workflow for migrating documentation from an old platform to a modern static site generator.
Step 1: Analysis and Planning
# Analyze the current site structure
inform https://old-docs.company.com \
--max-pages 5 \
--verbose \
--output-dir analysis
# Review extracted content structure
ls analysis/
head -20 analysis/index.md
Step 2: Full Content Extraction
# Extract all documentation with conservative settings
inform https://old-docs.company.com \
--output-dir migrated-docs \
--max-pages 200 \
--delay 2000 \
--concurrency 2 \
--include "*/docs/*" \
--include "*/guide/*" \
--exclude "*/internal/*" \
--exclude "*/admin/*"
Step 3: Content Organization
# Organize content for new platform
mkdir -p new-docs-site/content/{guides,api,tutorials}
# Move content to appropriate sections
mv migrated-docs/guide/* new-docs-site/content/guides/
mv migrated-docs/api/* new-docs-site/content/api/
mv migrated-docs/tutorial/* new-docs-site/content/tutorials/
# Generate index files
catalog --input new-docs-site/content \
--output new-docs-site/indexed \
--sitemap \
--base-url https://docs.company.com
Step 4: Build and Deploy
# Build with unify
cd new-docs-site
unify build --input indexed --output dist
# Generate professional commit message
giv message
# "docs: migrate legacy documentation to modern platform"
# Deploy
git add . && git commit -m "$(giv message)"
git push origin main
Expected Results
- Organized content structure preserving original hierarchy
- Clean Markdown files with proper metadata
- Generated sitemap and navigation structure
- Professional deployment with AI-generated commit messages
Multi-Source Consolidation
AdvancedCombine documentation from multiple sources into a unified knowledge base.
Extract from Multiple Sources
# Extract from primary documentation
inform https://docs.mainproduct.com \
--output-dir sources/main-docs \
--max-pages 150 \
--delay 1000
# Extract API documentation
inform https://api.mainproduct.com/docs \
--output-dir sources/api-docs \
--max-pages 50 \
--include "*/reference/*" \
--include "*/endpoints/*"
# Extract community tutorials
inform https://community.mainproduct.com \
--output-dir sources/community \
--max-pages 100 \
--include "*/tutorial/*" \
--include "*/howto/*"
# Extract GitHub repository docs
inform https://github.com/company/product/tree/main/docs \
--output-dir sources/github-docs \
--include "*.md"
Consolidate and Structure
# Create unified structure
mkdir -p consolidated/{core,api,tutorials,community}
# Organize by content type
cp -r sources/main-docs/* consolidated/core/
cp -r sources/api-docs/* consolidated/api/
cp -r sources/community/* consolidated/community/
cp -r sources/github-docs/* consolidated/core/
# Generate comprehensive index
catalog --input consolidated \
--output unified-knowledge-base \
--index \
--sitemap \
--validate \
--base-url https://knowledge.company.com
Create Cross-References
# Generate relationship mapping
find consolidated -name "*.md" | while read file; do
echo "Processing: $file"
# Add cross-reference metadata
# (Custom script to identify related content)
done
# Build final knowledge base
unify build --input unified-knowledge-base --output kb-site
Benefits
- Comprehensive documentation coverage
- Consistent formatting across sources
- Unified search and navigation
- Automated cross-referencing
Content Research Patterns
Intelligence gathering and competitive analysis workflows
Competitive Analysis
IntermediateMulti-Competitor Intelligence
#!/bin/bash
# competitive-research.sh
competitors=(
"https://competitor1.com/docs"
"https://competitor2.com/help"
"https://competitor3.com/guides"
"https://competitor4.com/api"
)
# Extract from each competitor
for url in "${competitors[@]}"; do
domain=$(echo $url | sed 's/https:\/\///' | sed 's/\/.*$//')
echo "Analyzing $domain..."
inform "$url" \
--output-dir "research/$domain" \
--max-pages 30 \
--delay 2000 \
--include "*/docs/*" \
--include "*/guide/*" \
--include "*/api/*"
echo "Completed $domain analysis"
sleep 60 # Respectful delay between competitors
done
# Generate comparative analysis
catalog --input research \
--output competitive-analysis \
--index \
--validate
echo "Competitive research complete!"
echo "Review results in: competitive-analysis/"
Analysis and Insights
# Generate content comparison
find research -name "*.md" | head -20 | xargs wc -w > word-counts.txt
# Create feature comparison matrix
# (Custom analysis script)
python analyze-features.py research/ > feature-matrix.csv
# Generate summary report
echo "# Competitive Analysis Report" > analysis-report.md
echo "" >> analysis-report.md
echo "## Content Volume Analysis" >> analysis-report.md
cat word-counts.txt >> analysis-report.md
echo "" >> analysis-report.md
echo "## Feature Coverage Matrix" >> analysis-report.md
cat feature-matrix.csv >> analysis-report.md
Research Insights
- Content structure and organization comparison
- Feature coverage and documentation depth
- User experience and information architecture
- Messaging and positioning analysis
Industry Knowledge Mining
AdvancedDomain-Specific Content Extraction
# Industry-specific sources
industry_sources=(
"https://techblog.company1.com"
"https://engineering.company2.com"
"https://blog.company3.com"
"https://medium.com/@industry-expert"
)
# Extract industry insights
for source in "${industry_sources[@]}"; do
domain=$(echo $source | sed 's/https:\/\///' | sed 's/\/.*$//')
inform "$source" \
--output-dir "industry-insights/$domain" \
--max-pages 50 \
--delay 1500 \
--include "*technical*" \
--include "*engineering*" \
--include "*architecture*" \
--exclude "*job*" \
--exclude "*hiring*"
done
# Process and categorize content
catalog --input industry-insights \
--output processed-insights \
--optional "drafts/**/*" \
--validate
Content Processing and Analysis
# Generate topic clustering
# (Requires additional analysis tools)
python cluster-topics.py processed-insights/ > topic-clusters.json
# Extract technical patterns
grep -r "architecture\|design pattern\|best practice" processed-insights/ > technical-patterns.txt
# Create trending analysis
python analyze-trends.py processed-insights/ > trend-analysis.json
# Generate final report
python generate-industry-report.py \
--topics topic-clusters.json \
--patterns technical-patterns.txt \
--trends trend-analysis.json \
--output industry-knowledge-report.md
Integration Workflows
Combining inform with other fwdslsh tools for powerful pipelines
🔄 Complete Documentation Pipeline
End-to-end workflow from web crawling to deployment
Content Extraction
# Extract from multiple documentation sources
inform https://old-docs.example.com \
--output-dir raw-content \
--max-pages 100 \
--delay 1000
inform https://api-docs.example.com \
--output-dir raw-content/api \
--max-pages 50 \
--include "*/reference/*"
Content Indexing
# Generate structured indexes and navigation
catalog --input raw-content \
--output structured-content \
--sitemap \
--index \
--base-url https://new-docs.example.com \
--validate
Site Generation
# Build modern static site with navigation
unify build \
--input structured-content \
--output production-site \
--optimize
Version Control
# Professional commit with AI assistance
git add .
giv message
# "docs: migrate and modernize documentation platform"
git push origin main
Pipeline Benefits
- Automated content migration and structuring
- SEO-optimized site generation with navigation
- Professional version control and documentation
- Repeatable and scalable process
🤖 AI-Ready Content Pipeline
Prepare content for AI training and RAG applications
High-Quality Content Extraction
# Extract comprehensive, high-quality content
inform https://comprehensive-docs.example.com \
--output-dir ai-training-content \
--max-pages 500 \
--delay 800 \
--include "*/docs/*" \
--include "*/guide/*" \
--include "*/tutorial/*" \
--include "*/reference/*" \
--exclude "*/blog/*" \
--exclude "*/news/*"
Content Structuring and Validation
# Generate AI-optimized indexes with validation
catalog --input ai-training-content \
--output ai-ready-content \
--validate \
--optional "examples/**/*" \
--optional "advanced/**/*"
# Quality check the content
find ai-ready-content -name "*.md" | xargs wc -w | tail -1
echo "Content validation complete"
AI Integration Preparation
# Create training data sets
mkdir -p ai-datasets/{training,validation,context}
# Split content for different AI purposes
cp ai-ready-content/llms.txt ai-datasets/context/
cp ai-ready-content/llms-full.txt ai-datasets/training/
cp ai-ready-content/llms-ctx.txt ai-datasets/validation/
# Generate metadata for AI systems
echo "AI-ready content prepared with structured indexes"
ls -la ai-datasets/*/
AI-Optimized Features
- Clean, structured content without noise
- Consistent formatting for AI processing
- Hierarchical organization for context
- Quality validation and completeness checks
Specialized Extraction Patterns
Advanced techniques for specific content types and platforms
GitHub Repository Documentation
BeginnerRepository Documentation Extraction
# Extract documentation from GitHub repositories
inform https://github.com/facebook/react/tree/main/docs \
--output-dir react-docs \
--include "*.md" \
--include "*.mdx"
# Extract from multiple open source projects
projects=(
"https://github.com/vuejs/vue/tree/dev/docs"
"https://github.com/angular/angular/tree/main/docs"
"https://github.com/sveltejs/svelte/tree/master/site/content/docs"
)
for project in "${projects[@]}"; do
project_name=$(echo $project | sed 's/.*github.com\/\([^\/]*\)\/\([^\/]*\)\/.*/\1-\2/')
inform "$project" \
--output-dir "oss-docs/$project_name" \
--include "*.md" \
--include "*.mdx"
done
Documentation Comparison
# Generate comparative documentation analysis
catalog --input oss-docs \
--output framework-comparison \
--index \
--sitemap \
--base-url https://framework-docs-comparison.dev
# Build comparison site
unify build \
--input framework-comparison \
--output comparison-site
E-commerce Content Mining
AdvancedProduct Information Extraction
# Extract product documentation and guides
inform https://help.shopify.com \
--output-dir ecommerce-content/shopify \
--max-pages 200 \
--include "*/manual/*" \
--include "*/themes/*" \
--exclude "*/billing/*"
inform https://docs.woocommerce.com \
--output-dir ecommerce-content/woocommerce \
--max-pages 150 \
--include "*/document/*" \
--include "*/tutorial/*"
# Extract best practices and guides
inform https://ecommerce-platforms.com \
--output-dir ecommerce-content/best-practices \
--max-pages 100 \
--include "*/guide/*" \
--include "*/best-practice/*"
Content Organization
# Organize by platform and topic
mkdir -p ecommerce-knowledge/{platforms,guides,tutorials}
# Categorize content
cp -r ecommerce-content/shopify/* ecommerce-knowledge/platforms/shopify/
cp -r ecommerce-content/woocommerce/* ecommerce-knowledge/platforms/woocommerce/
cp -r ecommerce-content/best-practices/* ecommerce-knowledge/guides/
# Generate comprehensive e-commerce knowledge base
catalog --input ecommerce-knowledge \
--output ecommerce-kb \
--index \
--sitemap \
--validate \
--base-url https://ecommerce-knowledge.dev
Technical Blog Aggregation
IntermediateEngineering Blog Extraction
# Extract from major tech company blogs
tech_blogs=(
"https://engineering.fb.com"
"https://blog.google/technology"
"https://eng.uber.com"
"https://medium.engineering"
"https://netflixtechblog.com"
)
for blog in "${tech_blogs[@]}"; do
domain=$(echo $blog | sed 's/https:\/\///' | sed 's/\/.*$//' | sed 's/\./-/g')
inform "$blog" \
--output-dir "tech-blogs/$domain" \
--max-pages 50 \
--delay 2000 \
--include "*engineering*" \
--include "*technical*" \
--include "*architecture*" \
--exclude "*job*" \
--exclude "*career*"
done
Content Aggregation and Analysis
# Create unified tech blog archive
catalog --input tech-blogs \
--output tech-insights \
--index \
--validate
# Generate trend analysis
find tech-blogs -name "*.md" -exec grep -l "microservices\|kubernetes\|serverless" {} \; > trending-topics.txt
# Create searchable archive
unify build \
--input tech-insights \
--output tech-blog-archive
echo "Tech blog aggregation complete!"
echo "Archive available in: tech-blog-archive/"
Automation Scripts
Ready-to-use scripts for common inform workflows
📅 Scheduled Content Monitoring
Monitor websites for content changes and updates
#!/bin/bash
# content-monitor.sh - Monitor websites for changes
# Configuration
SITES=(
"https://docs.example.com"
"https://api.example.com/docs"
"https://help.example.com"
)
BACKUP_DIR="/backup/content-monitoring"
DATE=$(date +%Y-%m-%d)
# Create daily backup directory
mkdir -p "$BACKUP_DIR/$DATE"
# Monitor each site
for site in "${SITES[@]}"; do
site_name=$(echo $site | sed 's/https:\/\///' | sed 's/\/.*$//' | sed 's/\./-/g')
echo "Monitoring: $site"
# Extract current content
inform "$site" \
--output-dir "$BACKUP_DIR/$DATE/$site_name" \
--max-pages 50 \
--delay 1000
# Compare with previous day if exists
previous_date=$(date -d "yesterday" +%Y-%m-%d)
if [ -d "$BACKUP_DIR/$previous_date/$site_name" ]; then
echo "Comparing with previous extraction..."
diff -r "$BACKUP_DIR/$previous_date/$site_name" "$BACKUP_DIR/$DATE/$site_name" > "$BACKUP_DIR/$DATE/$site_name-changes.txt"
if [ -s "$BACKUP_DIR/$DATE/$site_name-changes.txt" ]; then
echo "Changes detected in $site_name!"
# Send notification (email, Slack, etc.)
# notify-send "Content Changes" "Changes detected in $site_name"
fi
fi
done
echo "Content monitoring complete for $DATE"
🔄 Automated Documentation Sync
Keep local documentation in sync with web sources
#!/bin/bash
# doc-sync.sh - Automated documentation synchronization
# Configuration
SOURCE_URL="https://docs.upstream.com"
LOCAL_DOCS="./docs"
BACKUP_DIR="./docs-backup"
SYNC_LOG="./sync.log"
echo "$(date): Starting documentation sync" >> $SYNC_LOG
# Create backup of current docs
if [ -d "$LOCAL_DOCS" ]; then
echo "Creating backup..."
cp -r "$LOCAL_DOCS" "$BACKUP_DIR-$(date +%Y%m%d-%H%M%S)"
fi
# Extract latest documentation
echo "Extracting latest documentation..."
inform "$SOURCE_URL" \
--output-dir "$LOCAL_DOCS-new" \
--max-pages 200 \
--delay 1000 \
--include "*/docs/*" \
--include "*/guide/*"
# Check if extraction was successful
if [ $? -eq 0 ]; then
echo "Extraction successful, updating local docs..."
# Replace old docs with new
rm -rf "$LOCAL_DOCS"
mv "$LOCAL_DOCS-new" "$LOCAL_DOCS"
# Generate index
catalog --input "$LOCAL_DOCS" \
--output "$LOCAL_DOCS-indexed" \
--sitemap \
--index
# Build site
unify build \
--input "$LOCAL_DOCS-indexed" \
--output "./public"
# Commit changes with giv
git add .
commit_message=$(giv message)
git commit -m "$commit_message"
echo "$(date): Sync completed successfully" >> $SYNC_LOG
else
echo "$(date): Sync failed during extraction" >> $SYNC_LOG
exit 1
fi
Troubleshooting Examples
Solutions for common inform challenges and edge cases
🚫 Handling Rate-Limited Sites
# Conservative crawling for rate-sensitive sites
inform https://rate-limited-site.com \
--delay 5000 \
--concurrency 1 \
--max-pages 25 \
--verbose
# Multi-session approach for large sites
sessions=(
"*/docs/*"
"*/api/*"
"*/guide/*"
)
for session in "${sessions[@]}"; do
echo "Processing session: $session"
inform https://large-site.com \
--include "$session" \
--output-dir "./content/$(echo $session | sed 's/[^a-zA-Z0-9]//g')" \
--delay 3000 \
--max-pages 50
echo "Waiting 10 minutes before next session..."
sleep 600
done
🔧 Custom Content Detection
# When standard content detection fails
# Test with smaller sample first
inform https://unusual-site.com/sample-page \
--max-pages 1 \
--verbose
# Review extracted content
head -50 crawled-pages/sample-page.md
# Adjust strategy based on results
# (Future: Custom selector support)
inform https://unusual-site.com \
--output-dir custom-extraction \
--max-pages 20 \
--delay 2000
📦 Large-Scale Content Processing
# Handling very large sites efficiently
#!/bin/bash
# large-scale-crawl.sh
BASE_URL="https://massive-docs.com"
BATCH_SIZE=50
MAX_BATCHES=20
for ((i=1; i<=MAX_BATCHES; i++)); do
echo "Processing batch $i of $MAX_BATCHES"
inform "$BASE_URL" \
--max-pages $BATCH_SIZE \
--output-dir "batches/batch-$i" \
--delay 1000 \
--concurrency 3
# Process batch immediately
catalog --input "batches/batch-$i" \
--output "processed/batch-$i" \
--validate
echo "Batch $i complete. Waiting 2 minutes..."
sleep 120
done
# Combine all batches
echo "Combining all batches..."
mkdir -p final-output
cp -r processed/batch-*/* final-output/
# Generate final index
catalog --input final-output \
--output complete-site \
--sitemap \
--index \
--base-url https://docs.example.com
Continue Exploring
Dive deeper into inform capabilities and related tools