Architecture Deep-Dive¶

This document describes archfit's internal architecture in detail sufficient for Claude Code or a human contributor to make informed changes.

System Overview¶

archfit is a single Go binary that scans a repository and evaluates it against seven architectural principles. The architecture enforces a strict separation between data gathering (collectors) and evaluation (resolvers).

                    cmd/archfit/main.go
                    (explicit wiring)
                          │
              ┌───────────┼───────────┐
              │           │           │
         Collectors    Rule Engine   Renderers
         (read-only)   (pure logic)  (output)
              │           │           │
              └─────┬─────┘           │
                    │                 │
                FactStore         ScanResult
              (read-only view)   (findings, scores, metrics)

Package Dependency Graph¶

cmd/archfit/main.go
├── internal/core/scheduler.go          ← orchestrator
│   ├── internal/collector/fs/          ← filesystem facts
│   ├── internal/collector/git/         ← git history facts
│   ├── internal/collector/schema/      ← JSON Schema detection
│   ├── internal/collector/depgraph/    ← import graph (Go)
│   ├── internal/collector/ast/         ← AST analysis (Go, via go/parser)
│   ├── internal/collector/command/     ← timed command execution
│   ├── internal/rule/                  ← engine + registry
│   └── internal/score/                 ← scoring + metrics
├── internal/adapter/exec/              ← subprocess runner
├── internal/adapter/llm/               ← LLM boundary (3 backends)
├── internal/fix/                       ← remediation engine
│   ├── internal/fix/static/            ← deterministic fixers
│   └── internal/fix/llmfix/            ← LLM-assisted fixers
├── internal/config/                    ← .archfit.yaml
├── internal/policy/                    ← org policy enforcement
├── internal/packman/                   ← pack validation
├── internal/report/                    ← renderers (terminal, json, md, sarif)
├── packs/core/                         ← 24 rules
└── packs/agent-tool/                   ← 3 rules

Critical Boundary: Packs Cannot Import Adapters¶

This is the most important architectural invariant. It is enforced by .go-arch-lint.yaml.

packs/*  ──may import──>  internal/model, internal/rule
packs/*  ──MUST NOT──>    internal/adapter/*, internal/collector/*

If a resolver needs a new kind of data, the solution is always: add a Collector, expose facts through FactStore, consume in the resolver.

Data Flow¶

Scan Flow¶

1. main.go parses flags, builds Registry, loads Config
2. main.go calls core.Scan(ctx, ScanInput{Root, Rules, Runner, Depth})
3. scheduler.go runs collectors:
   a. fs.Collect(root)         → RepoFacts      (always)
   b. git.Collect(ctx, runner) → GitFacts        (when Runner != nil)
   c. schema.Collect(repo)     → SchemaFacts     (always)
   d. depgraph.Collect(repo)   → DepGraphFacts   (when Go source exists)
   e. ast.Collect(repo)        → ASTFacts        (Go files; standard + deep modes)
   f. command.Collect(...)     → CommandFacts     (only --depth=deep)
4. scheduler.go builds factStore from collector outputs
5. rule.Engine.Evaluate(ctx, rules, facts) → EvalResult
   - For each rule: call resolver(ctx, facts) → (findings, metrics, error)
   - Sort findings deterministically
6. scheduler.go computes metrics from facts
7. score.Compute(rules, findings) → Scores
8. Return ScanResult{Findings, Metrics, Scores, ...}
9. main.go renders via report.Render() or report.RenderSARIF()
10. main.go checks --fail-on threshold → exit code

Fix Flow¶

1. main.go runs initial scan (same as above)
2. buildFixEngine() registers all Fixers
3. engine.Fix(ctx, FixInput{Root, RuleIDs, DryRun, Facts, Findings, Scanner})
4. For each targeted finding:
   a. Find registered Fixer for rule ID
   b. fixer.Plan(ctx, finding, facts) → []Change
5. If --dry-run or --plan: return plan, stop
6. Snapshot original file contents
7. Apply changes to disk
8. Re-scan via injected Scanner function
9. Compare: finding gone? No new findings?
   - Yes → report success
   - No  → rollback to snapshots, report failure
10. Log to .archfit-fix-log.json

Key Design Decisions¶

Why explicit registration over auto-discovery¶

buildRegistry() in main.go explicitly calls corepack.Register(reg) and agenttool.Register(reg). There is no reflection, no init() side-effects, no plugin system.

Rationale: P3 (shallow explicitness). An agent reading the code can see exactly which packs are active by reading one function. Auto-discovery via reflection or init() is adversarial to agent comprehension.

Why FactStore is an interface¶

Resolvers receive model.FactStore (interface), not a concrete struct. This enables: - Test fakes without filesystem access - Adding new fact types via interface extension (with ADR) - Clear read-only contract — resolvers cannot mutate facts

Why collectors and resolvers are separated¶

Collectors gather data; resolvers interpret it. This separation: - Enables parallel collector execution - Keeps resolvers pure (testable without I/O) - Enforces P5 (aggregation of dangerous capabilities) on archfit itself - Allows the same facts to be consumed by multiple rules

AST Collector Design¶

internal/collector/ast/ uses go/parser from the standard library (no external dependency). Two modes:

standard (default): parses package-level declarations, exported symbols, and function signatures.
deep (--depth=deep): additionally resolves struct field types and interface method sets.

A per-file size cap of 1 MiB prevents pathological files from stalling the scan. Files exceeding the cap produce a parse_skipped entry in ASTFacts rather than silently dropping.

Go is the only language supported. Other languages are out of scope for the AST collector (see PROJECT.md for roadmap).

Why JSON-in-YAML for config¶

Phase 1 parses .archfit.yaml as JSON. YAML 1.2 is a strict JSON superset, so this works. The benefit: zero external dependencies for config parsing. Full YAML support (anchors, block scalars) deferred to when yaml.v3 is added.

Why three LLM backends behind one interface¶

llm.Client with Explain() is implemented by Gemini (real.go), OpenAI (openai.go), and Claude (anthropic.go). The composition chain is: inner → Budget → Cached. This means: - Budget enforcement is backend-agnostic - Cache hits skip the network regardless of backend - Adding a fourth backend is a single file

Extension Points¶

What to extend	Where	ADR needed?
New rule in existing pack	`packs/<pack>/resolvers/`, register in `pack.go`	No
New rule pack	`packs/<name>/`, register in `main.go:buildRegistry()`	No
New collector	`internal/collector/<topic>/`, wire in `scheduler.go`	Yes (FactStore change)
New metric	`internal/score/metrics.go`, wire in `scheduler.go`	No
New fixer	`internal/fix/static/`, register in `main.go:buildFixEngine()`	No
New LLM backend	`internal/adapter/llm/<backend>.go`	No
New output format	`internal/report/<format>.go`	No
New CLI command	`cmd/archfit/main.go`	No (unless new exit codes)
New FactStore method	`internal/model/model.go`	Yes
JSON output schema change	`schemas/output.schema.json`	Yes if non-additive

File Size Guidelines¶

Per CLAUDE.md and project conventions: - SKILL.md: under 400 lines / 10 KB - Remediation guides: under 100 lines each - PR size: ≤ 500 changed lines, ≤ 5 packages - main.go: the only file that imports from all layers — expected to be large