Testing Guide¶
Current Suite Shape¶
The live tree currently contains:
65test files undertests/1,016collected core tests20runnable example apps exercised by the E2E suite6extension test files and38extension tests foragenticapi-claude-agent-sdk
The test inventory has moved quickly. Prefer the live tree over stale hard-coded file counts in older docs.
Directory Layout¶
tests/
unit/ Core behavior, regressions, streaming, typed intents, observability
integration/ Cross-module request and auth flows
e2e/ Full example-app and request-cycle coverage
benchmarks/ Performance regression checks
extensions/agenticapi-claude-agent-sdk/tests/
Offline extension test suite with stubbed SDK behavior
What The Core Suite Covers¶
High-signal coverage areas in the current tree:
- app lifecycle and HTTP behavior
- intent parsing and typed intents
- dependency injection and route-level dependencies
- harness policies, sandbox, approval workflow, and audit
- observability helpers and propagation
- file handling, HTMX, custom responses, and OpenAPI
- tool registry,
@tool, and native tool-call data types - streaming events, replay, resume, and autonomy escalation
- end-to-end validation of all example apps
Running Tests¶
# All core tests
uv run pytest
# Faster local loop
uv run pytest --ignore=tests/benchmarks
# With coverage
uv run pytest --cov=src/agenticapi --cov-report=term-missing --ignore=tests/benchmarks
# Focused directories
uv run pytest tests/unit -q
uv run pytest tests/integration -q
uv run pytest tests/e2e -v
# Specific modules
uv run pytest tests/unit/test_streaming.py -xvs
uv run pytest tests/unit/test_typed_intents.py -xvs
uv run pytest tests/unit/harness/policy/test_budget_policy.py -xvs
# Benchmarks
uv run pytest tests/benchmarks
# Skip tests requiring real provider keys
uv run pytest -m "not requires_llm"
# Extension suite
uv pip install -e extensions/agenticapi-claude-agent-sdk --no-deps
uv run pytest extensions/agenticapi-claude-agent-sdk/tests
Common Test Helpers¶
AgentTestCase¶
Use AgentTestCase for endpoint-centric tests that need an app, mock LLM responses, or harness policies.
mock_llm¶
Use mock_llm(...) when you need deterministic LLM behavior without touching provider SDKs.
MockSandbox¶
Use MockSandbox when the test should exercise sandbox orchestration without spawning a real subprocess.
Assertion helpers¶
Use assert_code_safe, assert_policy_enforced, and related helpers when the test is really about policy semantics rather than HTTP behavior.
E2E Expectations¶
tests/e2e/test_examples.py is important because it protects the public surface area of the framework:
- every example app still imports
- auto-registered routes still behave as documented
- framework features continue to compose in real apps, not just in isolated unit tests
When adding a feature that changes documented behavior, update the relevant example and then extend the E2E suite.
Current Hot Spots¶
If you change any of the following modules, run their focused tests before the full suite:
src/agenticapi/app.pysrc/agenticapi/interface/intent.pysrc/agenticapi/interface/stream.pysrc/agenticapi/dependencies/*src/agenticapi/harness/*src/agenticapi/runtime/llm/*src/agenticapi/observability/*
Practical Rule¶
Do not rely on a single unit test file to validate framework behavior. For user-facing features, keep the coverage stack layered:
- unit tests for the local mechanism
- integration tests for request-path behavior
- example or E2E coverage for public API reality