Testing Guide¶
Current Test Inventory¶
The live repository currently contains:
112+test files undertests/1,507collected core tests (excluding benchmarks)32example apps exercised by the E2E suite6real-provider integration tests (API key gated)38extension tests foragenticapi-claude-agent-sdk
Directory Structure¶
tests/
unit/ Core behavior, regressions, streaming, typed intents, observability
integration/ Cross-module request and auth flows
e2e/ Example apps and full request-cycle coverage
benchmarks/ Performance regression checks
What The Suite Covers¶
Major coverage areas in the current tree:
AgenticApprequest lifecycle and HTTP behavior- intent parsing and typed intents
- dependency injection and route-level dependencies
- harness policies, sandbox, approval workflow, and audit
- observability helpers and propagation
- file handling, HTMX, response types, and OpenAPI
- tool registration,
@tool, and native tool-call data types - streaming events, replay, resume, and autonomy escalation
- end-to-end validation of all example apps
Running Tests¶
# All tests
uv run pytest
# Faster loop
uv run pytest --ignore=tests/benchmarks
# With coverage
uv run pytest --cov=src/agenticapi --cov-report=term-missing --ignore=tests/benchmarks
# Focused suites
uv run pytest tests/unit -q
uv run pytest tests/integration -q
uv run pytest tests/e2e -v
# Specific areas
uv run pytest tests/unit/test_streaming.py -xvs
uv run pytest tests/unit/test_typed_intents.py -xvs
uv run pytest tests/unit/harness/policy/test_budget_policy.py -xvs
uv run pytest tests/unit/observability/test_metrics.py -xvs
# Benchmarks
uv run pytest tests/benchmarks
# Skip provider-key tests
uv run pytest -m "not requires_llm"
# Extension tests
uv pip install -e extensions/agenticapi-claude-agent-sdk --no-deps
uv run pytest extensions/agenticapi-claude-agent-sdk/tests
Common Helpers¶
AgentTestCase¶
Use AgentTestCase when the test needs an app, a handler, and optional mock LLM or policy configuration.
mock_llm¶
Use mock_llm(...) for deterministic LLM behavior without provider SDKs.
MockSandbox¶
Use MockSandbox when the test is about orchestration around sandbox execution rather than the real subprocess runtime.
E2E Guidance¶
tests/e2e/test_examples.py protects the public surface area of the framework. When a feature changes user-facing behavior:
- update the relevant example
- extend or adjust the E2E coverage
Current Hot Paths¶
Run focused tests when you change:
src/agenticapi/app.pysrc/agenticapi/interface/intent.pysrc/agenticapi/interface/stream.pysrc/agenticapi/dependencies/*src/agenticapi/harness/*src/agenticapi/runtime/llm/*src/agenticapi/observability/*