LLM Integration (--with-llm)¶
archfit supports opt-in LLM enrichment: it can call Claude, OpenAI, or Gemini to produce finding-specific explanations on top of the static rule docs. This document is the contract between archfit and its users for that feature.
The full design rationale is in ADR 0003.
TL;DR¶
# Any of these API keys enables LLM support:
export ANTHROPIC_API_KEY=sk-... # Claude (highest priority)
export OPENAI_API_KEY=sk-... # OpenAI
export GOOGLE_API_KEY=... # Gemini
./bin/archfit scan --with-llm . # at most 5 findings are enriched
./bin/archfit explain --with-llm P1.LOC.001
./bin/archfit check --with-llm P7.MRD.001 .
./bin/archfit fix --with-llm --all .
./bin/archfit scan --with-llm --llm-budget=20 --json . | jq '.findings[].llm_suggestion'
Omit --with-llm and nothing changes — the base scan path is byte-identical with or without an API key set.
What it does¶
When --with-llm is set, archfit:
- Runs the normal scan (the scan path does not call the LLM).
- For up to
--llm-budgetfindings (default 5), issues one Gemini call with: - the rule ID, title, rationale, severity, and evidence strength,
- the finding's path, message, and evidence map,
- the repo's
project_typefrom.archfit.yaml. - Attaches an
llm_suggestionobject to each enriched finding. The terminal renderer prints it below the finding; the JSON renderer emits it as a nested field; the SARIF renderer places it inresults[].properties.llm_suggestion.
The LLM is instructed to produce ≤200 words in three short sections: why it matters here, concrete fix, and when to suppress.
What it does NOT do¶
- It never fails the scan. If the API is down, the key is rotated, or the call times out, archfit logs a single stderr line per skipped finding and keeps the static remediation. Exit code behavior is identical to a run without
--with-llm. - It never runs without
--with-llm. The basearchfit scan .makes zero LLM calls, regardless of whetherGOOGLE_API_KEYis set. - It is not deterministic. Golden tests under
testdata/e2e/explicitly run without the flag. Do not pin LLM output as a golden. - It does not auto-fix anything.
archfit fixis a separate feature in Phase 3c. The suggestion is advisory — the agent or human still performs the change.
Supported Providers¶
| Provider | Env Var | Default Model | --llm-backend |
|---|---|---|---|
| Claude (Anthropic) | ANTHROPIC_API_KEY |
claude-sonnet-4-20250514 |
claude |
| OpenAI | OPENAI_API_KEY |
gpt-5.4-mini |
openai |
| Google Gemini | GOOGLE_API_KEY / GEMINI_API_KEY |
gemini-2.5-flash |
gemini |
Auto-detection priority: ANTHROPIC_API_KEY > OPENAI_API_KEY > GOOGLE_API_KEY.
Configuration¶
| Setting | Where | Default |
|---|---|---|
| API key | Environment variable (see above) | required — command exits 4 if missing |
| Model | LLM_MODEL env |
Provider default (see above) |
| Backend | --llm-backend flag |
Auto-detected from env |
| Per-run budget | --llm-budget N |
5 |
| Per-call timeout | (not configurable) | 30s |
API keys are never written to logs, never read from .archfit.yaml, and never embedded in the binary.
Cost safety¶
Two layers guard against runaway cost:
- Budget:
--llm-budget Ncaps the number of calls per run. The default of 5 covers the typical "1–3 new findings per PR" case without surprises. - Cache: identical prompts within one run are served from an in-memory cache for free. Useful when
--llm-budgetis large and multiple findings share evidence.
A disk-backed cache and a daily spend cap are planned for Phase 3b.
Data sent to Gemini¶
When you set --with-llm, archfit sends Gemini:
- The rule's ID, title, severity, rationale, and static remediation.
- The finding's path (repo-relative), message, and evidence map.
- The repo's declared
project_type(if any).
Archfit does not send:
- The repository's source code.
- Environment variables other than the API key (which goes in the Authorization header, not the prompt).
- Git history, commit metadata, or author information.
- The contents of files flagged by a rule. Only the rule's evidence (a small structured map) is transmitted.
If the evidence map contains values longer than 8 KiB total, the prompt is truncated at the boundary with a [truncated] marker.
When to use it¶
- CI: use it on PRs with a small budget. The diff between
main.jsonand the PR's scan is usually only 1–3 findings, well under the default budget. - Local development: use it when an unfamiliar rule fires and the static doc does not quite fit your case.
- Triage: do not use it to mass-audit a large repo. Budget-wise, a fresh scan with many findings is better served by fixing the top-severity ones by hand first and re-running.
When NOT to use it¶
- On proprietary or regulated code where sending evidence to a third party is out of policy.
- When a deterministic, auditable output is required — SARIF's core fields are deterministic without
--with-llm, but thellm_suggestionproperty is not. - In CI gates that compare output byte-for-byte. Use
archfit scan(no LLM) for the gate, thenarchfit scan --with-llmfor the PR comment.
Extension Points¶
The adapter interface (internal/adapter/llm/Client) is provider-agnostic. All three backends (Claude, OpenAI, Gemini) share the same interface, budget, and cache layers. Adding a fourth backend (e.g., local Ollama) requires a single file implementation. See LLM Integration for developer documentation.