Skip to content

Agent Behavior Observatory

Element 1 of the three strategic elements. Builds after the fitness contract.

Overview

The observatory watches how coding agents actually interact with a repository and produces behavioral metrics that static analysis cannot capture. This is archfit's unique differentiator: no other tool observes the agent as it works.

Key Insight

archfit currently asks "Is this repo shaped for agents?" by inspecting static structure. The observatory answers the transformative question: "When an agent actually works on this repo, what happens?"

Behavioral Metrics

Metric What it measures Principle
agent_context_efficiency Files read vs. files needed for the change P1
agent_retry_rate Failed commands / total commands P4
agent_time_to_first_verify Seconds from first edit to first passing test P4
agent_cross_boundary_reads Reads outside the task's vertical slice P1
agent_dangerous_touches Edits in P5-flagged dangerous areas P5
agent_rollback_frequency Self-reverts / total edits P6

Trace Schema

Traces are JSON Lines (one event per line) or a single JSON array:

{
  "schema_version": "0.1.0",
  "agent": "claude-code",
  "session_id": "abc123",
  "repo_commit": "7563385",
  "events": [
    {"type": "file_read", "path": "internal/model/model.go", "ts": "..."},
    {"type": "file_write", "path": "internal/score/metrics.go", "ts": "..."},
    {"type": "command_run", "command": "make test", "exit_code": 0, "duration_ms": 4200}
  ]
}

Event types: file_read, file_write, command_run, command_fail, tool_call, error, context_load.

Package Structure

internal/observer/
├── trace.go            # Trace, Event, EventType types
├── trace_test.go
├── ingest.go           # Parse trace files (JSON Lines format)
├── ingest_test.go
├── metrics.go          # Behavioral metric computation
├── metrics_test.go
├── hotspot.go          # Cross-reference with static findings
├── hotspot_test.go
└── testdata/           # Sample trace files
schemas/
├── trace.schema.json
└── observe-output.schema.json

Implementation Steps

Step Description Status Effort
1.1 Trace schema and types Not started ~150 lines
1.2 Behavioral metrics computation Not started ~200 lines
1.3 Hotspot analysis Not started ~150 lines
1.4 CLI wiring (archfit observe) Not started ~200 lines

Architecture Rules

  • The observer reads trace files (collected data). It never instruments or modifies the agent.
  • All metric functions are pure (no I/O). They receive parsed traces.
  • Hotspot analysis cross-references traces with static findings — no new I/O.
  • The observer does NOT import from internal/adapter/.
  • ADR required: docs/adr/0007-agent-behavior-observatory.md.

CLI Commands

archfit observe --trace-dir .agent-traces/ .    # analyze traces
archfit observe --report .                       # observatory report
archfit observe --json .                         # JSON output

The command is read-only and informational (exit code always 0).

Hotspot Analysis

A hotspot is a directory prefix where agents struggle: - read_fan_out > median * 2 (agent reads too many files for changes in this area) - retry_count > 2 (agent fails repeatedly in this area)

Hotspots are cross-referenced with static scan findings to produce actionable recommendations.

  • development/fitness-contract.md — contract can consume observatory metrics as soft targets
  • development/metrics-and-scoring.md — behavioral metrics complement static metrics
  • internal/score/metrics.go — static metric computation (behavioral metrics follow the same pattern)