Current State¶

This document bridges the gap between:

PROJECT.md (at the repo root): product vision and long-range architecture
CLAUDE.md (at the repo root): contributor workflow, commands, and conventions
The live implementation under src/agenticapi/

Read this before making large design changes. PROJECT.md is aspirational in several areas. This file is the reality check for what is wired today, what is partial, and where new work should start.

How To Use This With The Other Docs¶

Use PROJECT.md to understand the intended direction of the framework.
Use CLAUDE.md to understand how to work in the repository safely.
Use this file to determine whether a capability is already shipped, partially integrated, or still scaffolding.

The Three Main Execution Styles¶

1. Direct handler execution¶

This is the most mature and most heavily used path.

Triggered when no llm or harness is configured, or when an endpoint uses autonomy_level="manual".
The framework parses the request into Intent or Intent[T], resolves Depends() dependencies, injects built-ins like AgentContext, AgentTasks, UploadedFiles, HtmxHeaders, and AgentStream, then calls the handler directly.
This path powers the non-LLM examples, most response types, HTMX support, file handling, typed responses, and streaming handlers.

Primary files:

src/agenticapi/app.py
src/agenticapi/dependencies/scanner.py
src/agenticapi/dependencies/solver.py
src/agenticapi/interface/intent.py

2. Harnessed code-generation execution¶

This is the classic AgenticAPI path described in PROJECT.md.

Triggered when both llm and harness are configured and the endpoint is not manual.
The app first tries a tool-first path when tools are registered.
If that path is not applicable, CodeGenerator produces Python and HarnessEngine.execute() runs the safety pipeline.

What the stock harness path currently does:

Policy evaluation for code/data/resource/runtime policies
Static AST analysis
Approval workflow checks
Sandbox execution
Post-execution monitors and validators
Audit recording

Primary files:

src/agenticapi/app.py
src/agenticapi/runtime/code_generator.py
src/agenticapi/harness/engine.py
src/agenticapi/harness/policy/*
src/agenticapi/harness/sandbox/*

3. Streaming and replayable handler execution¶

Streaming is now a first-class interface path, not an afterthought.

Handlers can accept AgentStream and emit structured events.
SSE and NDJSON transports are both supported.
Approval can pause a live stream and resume later.
Completed streams can be replayed from the in-memory stream store.

Primary files:

src/agenticapi/interface/stream.py
src/agenticapi/interface/stream_store.py
src/agenticapi/interface/approval_registry.py
src/agenticapi/interface/transports/sse.py
src/agenticapi/interface/transports/ndjson.py

Fully Wired Today¶

These areas are implemented end-to-end in the current tree and are safe places to build on:

AgenticApp request lifecycle, auth, sessions, OpenAPI, Swagger/ReDoc, and health/capabilities routes
FastAPI-style dependency injection with caching, nested dependencies, overrides, and async-generator teardown
Direct handler responses including AgentResponse, FileResult, HTMLResult, PlainTextResult, and raw Starlette responses
Typed intent schema extraction at endpoint registration time
Harness safety pipeline for generated code execution
Streaming events, approval pause/resume, and replay routes
Tool registry and @tool decorator
In-memory audit plus persistent SqliteAuditRecorder
Observability substrate: tracing, metrics helpers, propagation helpers, and optional /metrics route via AgenticApp(metrics_url=...)
Extension packaging model, especially extensions/agenticapi-claude-agent-sdk
PromptInjectionPolicy (B5): 10 regex rules, 5 categories, shadow mode, custom patterns
PIIPolicy (B6): 6 detectors, Luhn-validated credit cards, detect/redact/block modes, redact_pii() utility
Pre-LLM text policy invocation (Increment 9): evaluate_intent_text() hook fires before any LLM call or handler execution, automatically scans intent text through all policies that override the hook
Agent memory (C1): MemoryStore protocol, InMemoryMemoryStore, SqliteMemoryStore, scope-based isolation, GDPR forget
Code cache (C5): deterministic code reuse for repeated intents
Eval harness (C6): EvalSet, EvalCase, EvalRunner, 5 built-in judges, YAML loading, agenticapi eval CLI

Present But Not Fully Integrated¶

These are the most important "exists in code, but not fully wired through the stock path" areas.

BudgetPolicy¶

BudgetPolicy is implemented and tested, but the stock AgenticApp plus HarnessEngine path does not automatically wrap every LLM call with:

BudgetPolicy.estimate_and_enforce(...)
BudgetPolicy.record_actual(...)

Important consequences:

Budgeting works in the example and in custom orchestration where it is called explicitly.
Adding BudgetPolicy(...) to HarnessEngine(policies=[...]) is not, by itself, enough to guarantee stock request-path cost enforcement.
Inside PolicyEvaluator, BudgetPolicy.evaluate() is intentionally a no-op compatibility stub.

Relevant files:

src/agenticapi/harness/policy/budget_policy.py
examples/15_budget_policy/app.py

Typed intents with provider-native structured output¶

Typed intent schema extraction is wired, but provider-native structured-output enforcement is only fully exercised by MockBackend today.

IntentParser forwards the schema through LLMPrompt.response_schema.
MockBackend honors response_schema and returns deterministic structured payloads.
The built-in Anthropic, OpenAI, and Gemini backends do not yet translate response_schema into each provider's native structured-output API.

This means:

The typed-intent programming model is real.
True provider-side schema enforcement is still partial.
When using real provider backends, validation and fallback behavior matter more than the docs used to imply.

Relevant files:

src/agenticapi/interface/intent.py
src/agenticapi/runtime/llm/base.py
src/agenticapi/runtime/llm/mock.py
src/agenticapi/runtime/llm/anthropic.py
src/agenticapi/runtime/llm/openai.py
src/agenticapi/runtime/llm/gemini.py

Native tool calling¶

The framework has a real tool-first execution path in AgenticApp._try_tool_first_path(), but built-in provider support is still partial.

LLMResponse.tool_calls and ToolCall exist.
MockBackend fully supports queued tool-call responses.
AgenticApp can dispatch a single returned tool call straight into HarnessEngine.call_tool().
The built-in Anthropic and OpenAI backends pass prompt.tools through to the provider SDKs, but they do not yet normalize provider responses back into LLMResponse.tool_calls and finish_reason.
The built-in Gemini backend does not currently translate prompt.tools into provider-native tool declarations.

Practical takeaway:

The contract is defined.
The stock tool-first path is production-shaped.
Mock and custom backends exercise it fully today; the built-in provider adapters still need normalization work.

Observability auto-instrumentation¶

Observability support exists, but automatic coverage is narrower than some older docs suggested.

Automatic today:

Request count and request duration at the app boundary
Intent-parsing LLM usage in IntentParser
/metrics route registration when metrics_url is configured

Not universally automatic across all paths yet:

Every policy denial
Every budget block
Every tool invocation
Every tool-first or extension-driven LLM interaction
Full cost attribution across all execution modes

Use the record_* helpers explicitly when building new paths.

Areas That Are Mostly Scaffolding Or Early Surface Area¶

These modules exist and are useful, but they are not yet the center of the framework's shipped experience:

agenticapi.application.pipeline
agenticapi.ops
agenticapi.interface.a2a

They are best treated as extension points and early architectural bets, not yet as the most stable core APIs.

AgentMesh (newly shipped)¶

The mesh/ package (AgentMesh, MeshContext) ships in-process multi-agent orchestration. Key characteristics:

@mesh.role(name=...) registers both a role handler and a normal /agent/{name} endpoint.
@mesh.orchestrator(name=..., roles=[...]) registers orchestrator handlers that receive MeshContext.
MeshContext.call(role, payload) performs cycle detection (raises MeshCycleError), budget enforcement (raises BudgetExceeded), and trace propagation (child trace IDs).
In-process only — roles and orchestrators run in the same event loop. Cross-process mesh is a VISION.md Track 1 forward goal.
Budget propagation is per-mesh-call only; integration with BudgetPolicy per-request scopes is future work.

LLM retry (`runtime/llm/retry.py`)¶

RetryConfig + with_retry() provide async exponential-backoff with jitter for transient errors. Not yet wired into stock backends — available as a building block for custom backends or explicit caller use.

Read Order For New Contributors¶

If you need to understand the implementation quickly, read in this order:

src/agenticapi/app.py
src/agenticapi/interface/intent.py
src/agenticapi/dependencies/scanner.py
src/agenticapi/dependencies/solver.py
src/agenticapi/harness/engine.py
src/agenticapi/runtime/llm/base.py
src/agenticapi/interface/stream.py
src/agenticapi/harness/policy/budget_policy.py
src/agenticapi/observability/metrics.py
extensions/agenticapi-claude-agent-sdk/src/agenticapi_claude_agent_sdk/

Compatibility Rules For Ongoing Development¶

Keep the direct handler path working without any LLM configured.
Preserve optional-dependency behavior: missing SDKs must fail lazily and cleanly.
Do not break Depends() overrides or async-generator teardown semantics.
Preserve typed-intent backward compatibility: intent.parameters must remain usable even when intent.payload is typed.
Keep streaming transports backward compatible once emitted event shapes are documented.
Keep the Claude Agent SDK integration as a separate extension package rather than folding it into core.

Scale (Increment 12)¶

141 Python modules, ~26,725 lines of code
1,507 tests (+38 in extensions), 32 examples, 86 __all__ exports
Phase A (control plane): complete
Phase D (DX core): complete (Depends, Intent[T], response_model, @tool, route deps)
Phase E (native function calling): complete — provider-specific tool format translation for Anthropic, OpenAI, Gemini; multi-turn LLMMessage with tool_call_id / tool_calls; integration tests with real APIs
Phase F (streaming): core complete — AgentStream, SSE + NDJSON transports, approval pause/resume, AutonomyPolicy, StreamStore replay
Phase B (safety): partial — B5 PromptInjectionPolicy, B6 PIIPolicy shipped
Trace inspector: shipped — /_trace with search, diff, stats, export
Harness MCP server: shipped — HarnessMCPServer exposing @tool functions with governance
Phase C (agent intelligence): partial -- C1 MemoryStore, C5 CodeCache, C6 EvalHarness shipped
Multi-agent: AgentMesh with @mesh.role / @mesh.orchestrator, MeshContext.call() with cycle detection and budget propagation

Highest-Leverage Next Steps¶

If the goal is to move the codebase closer to the PROJECT.md vision, the best next steps are:

Wire BudgetPolicy into the stock LLM call path instead of leaving it as an explicit integration pattern.
Implement provider-native response_schema handling in the built-in Anthropic, OpenAI, and Gemini backends.
Normalize provider-native tool-call responses into LLMResponse.tool_calls and finish_reason.
Broaden automatic observability coverage for tool-first, streaming, budgeting, and extension-driven execution paths.
Decide which application/, ops/, and a2a/ surfaces are graduating into the stable core and which remain experimental.

Current State¶

How To Use This With The Other Docs¶

The Three Main Execution Styles¶

1. Direct handler execution¶

2. Harnessed code-generation execution¶

3. Streaming and replayable handler execution¶

Fully Wired Today¶

Present But Not Fully Integrated¶

BudgetPolicy¶

Typed intents with provider-native structured output¶

Native tool calling¶

Observability auto-instrumentation¶

Areas That Are Mostly Scaffolding Or Early Surface Area¶

AgentMesh (newly shipped)¶

LLM retry (runtime/llm/retry.py)¶

Read Order For New Contributors¶

Compatibility Rules For Ongoing Development¶

Scale (Increment 12)¶

Highest-Leverage Next Steps¶

LLM retry (`runtime/llm/retry.py`)¶