Real-World Examples

Practical workflows for corpus management

Single Source Sync

# Sync single documentation site
bun run packages/inform/src/index.ts sync official-docs --print-cmds

# Output:
# > Cloning https://github.com/owner/repo...
# > Running: gather https://docs.example.com --output-dir .fwdslsh/corpus/sources/official-docs/content
# > Running: catalog .fwdslsh/corpus/sources/official-docs/content --output .fwdslsh/corpus/sources/official-docs/catalog

Multiple Sources

Sync multiple documentation sources

# corpus.yml
sources:
  - id: main-docs
    type: http
    url: https://docs.example.com
  - id: api-reference
    type: git
    url: https://github.com/owner/api-docs
  - id: internal-guides
    type: local
    path: ./internal-docs

# Sync all sources
bun run packages/inform/src/index.ts sync --all --mirror

CI/CD Automation

GitHub Actions workflow for corpus management

# .github/workflows/corpus-sync.yml
name: Corpus Sync
on:
  schedule:
    - cron: '0 0 * * *'
  workflow_dispatch:

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install bun
        run: curl -fsSL https://bun.sh/install | bash
      
      - name: Sync corpus
        run: |
          bun run packages/inform/src/index.ts sync --all \
            --timeout 600000 \
            --print-cmds
      
      - name: Commit manifests
        run: |
          git config user.name "Corpus Bot"
          git config user.email "bot@example.com"
          git add .fwdslsh/corpus/corpus-manifest.yml
          git commit -m "Update corpus manifests"
          git push

Troubleshooting Scenarios

Source Timeout

# Increase timeout for slow sources
export FWD_CORPUS_TIMEOUT=600000
bun run packages/inform/src/index.ts sync problematic-source

Source Failure

# Check logs for details
cat ~/.hyphn/logs/inform-sync.log | jq '.errors'

# Re-sync failed source with verbose output
bun run packages/inform/src/index.ts sync source-id --verbose