Skip to content

LLM Integration (--with-llm)

archfit supports opt-in LLM enrichment: it can call Claude, OpenAI, or Gemini to produce finding-specific explanations on top of the static rule docs. This document is the contract between archfit and its users for that feature.

The full design rationale is in ADR 0003.

TL;DR

# Any of these API keys enables LLM support:
export ANTHROPIC_API_KEY=sk-...     # Claude (highest priority)
export OPENAI_API_KEY=sk-...        # OpenAI
export GOOGLE_API_KEY=...           # Gemini

./bin/archfit scan --with-llm .     # at most 5 findings are enriched
./bin/archfit explain --with-llm P1.LOC.001
./bin/archfit check --with-llm P7.MRD.001 .
./bin/archfit fix --with-llm --all .
./bin/archfit scan --with-llm --llm-budget=20 --json . | jq '.findings[].llm_suggestion'

Omit --with-llm and nothing changes — the base scan path is byte-identical with or without an API key set.

What it does

When --with-llm is set, archfit:

  1. Runs the normal scan (the scan path does not call the LLM).
  2. For up to --llm-budget findings (default 5), issues one Gemini call with:
  3. the rule ID, title, rationale, severity, and evidence strength,
  4. the finding's path, message, and evidence map,
  5. the repo's project_type from .archfit.yaml.
  6. Attaches an llm_suggestion object to each enriched finding. The terminal renderer prints it below the finding; the JSON renderer emits it as a nested field; the SARIF renderer places it in results[].properties.llm_suggestion.

The LLM is instructed to produce ≤200 words in three short sections: why it matters here, concrete fix, and when to suppress.

What it does NOT do

  • It never fails the scan. If the API is down, the key is rotated, or the call times out, archfit logs a single stderr line per skipped finding and keeps the static remediation. Exit code behavior is identical to a run without --with-llm.
  • It never runs without --with-llm. The base archfit scan . makes zero LLM calls, regardless of whether GOOGLE_API_KEY is set.
  • It is not deterministic. Golden tests under testdata/e2e/ explicitly run without the flag. Do not pin LLM output as a golden.
  • It does not auto-fix anything. archfit fix is a separate feature in Phase 3c. The suggestion is advisory — the agent or human still performs the change.

Supported Providers

Provider Env Var Default Model --llm-backend
Claude (Anthropic) ANTHROPIC_API_KEY claude-sonnet-4-20250514 claude
OpenAI OPENAI_API_KEY gpt-5.4-mini openai
Google Gemini GOOGLE_API_KEY / GEMINI_API_KEY gemini-2.5-flash gemini

Auto-detection priority: ANTHROPIC_API_KEY > OPENAI_API_KEY > GOOGLE_API_KEY.

Configuration

Setting Where Default
API key Environment variable (see above) required — command exits 4 if missing
Model LLM_MODEL env Provider default (see above)
Backend --llm-backend flag Auto-detected from env
Per-run budget --llm-budget N 5
Per-call timeout (not configurable) 30s

API keys are never written to logs, never read from .archfit.yaml, and never embedded in the binary.

Cost safety

Two layers guard against runaway cost:

  • Budget: --llm-budget N caps the number of calls per run. The default of 5 covers the typical "1–3 new findings per PR" case without surprises.
  • Cache: identical prompts within one run are served from an in-memory cache for free. Useful when --llm-budget is large and multiple findings share evidence.

A disk-backed cache and a daily spend cap are planned for Phase 3b.

Data sent to Gemini

When you set --with-llm, archfit sends Gemini:

  • The rule's ID, title, severity, rationale, and static remediation.
  • The finding's path (repo-relative), message, and evidence map.
  • The repo's declared project_type (if any).

Archfit does not send:

  • The repository's source code.
  • Environment variables other than the API key (which goes in the Authorization header, not the prompt).
  • Git history, commit metadata, or author information.
  • The contents of files flagged by a rule. Only the rule's evidence (a small structured map) is transmitted.

If the evidence map contains values longer than 8 KiB total, the prompt is truncated at the boundary with a [truncated] marker.

When to use it

  • CI: use it on PRs with a small budget. The diff between main.json and the PR's scan is usually only 1–3 findings, well under the default budget.
  • Local development: use it when an unfamiliar rule fires and the static doc does not quite fit your case.
  • Triage: do not use it to mass-audit a large repo. Budget-wise, a fresh scan with many findings is better served by fixing the top-severity ones by hand first and re-running.

When NOT to use it

  • On proprietary or regulated code where sending evidence to a third party is out of policy.
  • When a deterministic, auditable output is required — SARIF's core fields are deterministic without --with-llm, but the llm_suggestion property is not.
  • In CI gates that compare output byte-for-byte. Use archfit scan (no LLM) for the gate, then archfit scan --with-llm for the PR comment.

Extension Points

The adapter interface (internal/adapter/llm/Client) is provider-agnostic. All three backends (Claude, OpenAI, Gemini) share the same interface, budget, and cache layers. Adding a fourth backend (e.g., local Ollama) requires a single file implementation. See LLM Integration for developer documentation.