Testing Guide¶

Current Suite Shape¶

The live tree currently contains:

65 test files under tests/
1,016 collected core tests
20 runnable example apps exercised by the E2E suite
6 extension test files and 38 extension tests for agenticapi-claude-agent-sdk

The test inventory has moved quickly. Prefer the live tree over stale hard-coded file counts in older docs.

Directory Layout¶

tests/
    unit/           Core behavior, regressions, streaming, typed intents, observability
    integration/    Cross-module request and auth flows
    e2e/            Full example-app and request-cycle coverage
    benchmarks/     Performance regression checks

extensions/agenticapi-claude-agent-sdk/tests/
    Offline extension test suite with stubbed SDK behavior

What The Core Suite Covers¶

High-signal coverage areas in the current tree:

app lifecycle and HTTP behavior
intent parsing and typed intents
dependency injection and route-level dependencies
harness policies, sandbox, approval workflow, and audit
observability helpers and propagation
file handling, HTMX, custom responses, and OpenAPI
tool registry, @tool, and native tool-call data types
streaming events, replay, resume, and autonomy escalation
end-to-end validation of all example apps

Running Tests¶

# All core tests
uv run pytest

# Faster local loop
uv run pytest --ignore=tests/benchmarks

# With coverage
uv run pytest --cov=src/agenticapi --cov-report=term-missing --ignore=tests/benchmarks

# Focused directories
uv run pytest tests/unit -q
uv run pytest tests/integration -q
uv run pytest tests/e2e -v

# Specific modules
uv run pytest tests/unit/test_streaming.py -xvs
uv run pytest tests/unit/test_typed_intents.py -xvs
uv run pytest tests/unit/harness/policy/test_budget_policy.py -xvs

# Benchmarks
uv run pytest tests/benchmarks

# Skip tests requiring real provider keys
uv run pytest -m "not requires_llm"

# Extension suite
uv pip install -e extensions/agenticapi-claude-agent-sdk --no-deps
uv run pytest extensions/agenticapi-claude-agent-sdk/tests

Common Test Helpers¶

`AgentTestCase`¶

Use AgentTestCase for endpoint-centric tests that need an app, mock LLM responses, or harness policies.

`mock_llm`¶

Use mock_llm(...) when you need deterministic LLM behavior without touching provider SDKs.

`MockSandbox`¶

Use MockSandbox when the test should exercise sandbox orchestration without spawning a real subprocess.

Assertion helpers¶

Use assert_code_safe, assert_policy_enforced, and related helpers when the test is really about policy semantics rather than HTTP behavior.

E2E Expectations¶

tests/e2e/test_examples.py is important because it protects the public surface area of the framework:

every example app still imports
auto-registered routes still behave as documented
framework features continue to compose in real apps, not just in isolated unit tests

When adding a feature that changes documented behavior, update the relevant example and then extend the E2E suite.

Current Hot Spots¶

If you change any of the following modules, run their focused tests before the full suite:

src/agenticapi/app.py
src/agenticapi/interface/intent.py
src/agenticapi/interface/stream.py
src/agenticapi/dependencies/*
src/agenticapi/harness/*
src/agenticapi/runtime/llm/*
src/agenticapi/observability/*

Practical Rule¶

Do not rely on a single unit test file to validate framework behavior. For user-facing features, keep the coverage stack layered:

unit tests for the local mechanism
integration tests for request-path behavior
example or E2E coverage for public API reality