Testing Guide¶

Current Test Inventory¶

The live repository currently contains:

112+ test files under tests/
1,507 collected core tests (excluding benchmarks)
32 example apps exercised by the E2E suite
6 real-provider integration tests (API key gated)
38 extension tests for agenticapi-claude-agent-sdk

Directory Structure¶

tests/
    unit/           Core behavior, regressions, streaming, typed intents, observability
    integration/    Cross-module request and auth flows
    e2e/            Example apps and full request-cycle coverage
    benchmarks/     Performance regression checks

What The Suite Covers¶

Major coverage areas in the current tree:

AgenticApp request lifecycle and HTTP behavior
intent parsing and typed intents
dependency injection and route-level dependencies
harness policies, sandbox, approval workflow, and audit
observability helpers and propagation
file handling, HTMX, response types, and OpenAPI
tool registration, @tool, and native tool-call data types
streaming events, replay, resume, and autonomy escalation
end-to-end validation of all example apps

Running Tests¶

# All tests
uv run pytest

# Faster loop
uv run pytest --ignore=tests/benchmarks

# With coverage
uv run pytest --cov=src/agenticapi --cov-report=term-missing --ignore=tests/benchmarks

# Focused suites
uv run pytest tests/unit -q
uv run pytest tests/integration -q
uv run pytest tests/e2e -v

# Specific areas
uv run pytest tests/unit/test_streaming.py -xvs
uv run pytest tests/unit/test_typed_intents.py -xvs
uv run pytest tests/unit/harness/policy/test_budget_policy.py -xvs
uv run pytest tests/unit/observability/test_metrics.py -xvs

# Benchmarks
uv run pytest tests/benchmarks

# Skip provider-key tests
uv run pytest -m "not requires_llm"

# Extension tests
uv pip install -e extensions/agenticapi-claude-agent-sdk --no-deps
uv run pytest extensions/agenticapi-claude-agent-sdk/tests

Common Helpers¶

`AgentTestCase`¶

Use AgentTestCase when the test needs an app, a handler, and optional mock LLM or policy configuration.

`mock_llm`¶

Use mock_llm(...) for deterministic LLM behavior without provider SDKs.

`MockSandbox`¶

Use MockSandbox when the test is about orchestration around sandbox execution rather than the real subprocess runtime.

E2E Guidance¶

tests/e2e/test_examples.py protects the public surface area of the framework. When a feature changes user-facing behavior:

update the relevant example
extend or adjust the E2E coverage

Current Hot Paths¶

Run focused tests when you change:

src/agenticapi/app.py
src/agenticapi/interface/intent.py
src/agenticapi/interface/stream.py
src/agenticapi/dependencies/*
src/agenticapi/harness/*
src/agenticapi/runtime/llm/*
src/agenticapi/observability/*