Testing Utilities¶
mock_llm¶
mock_llm ¶
Context manager that provides a MockBackend with predefined responses.
Yields a MockBackend configured with the given responses. Responses are consumed in FIFO order as generate() is called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
responses
|
list[str]
|
List of response strings to return in order. |
required |
Yields:
| Type | Description |
|---|---|
Generator[MockBackend]
|
A configured MockBackend instance. |
Example
with mock_llm(responses=["SELECT COUNT() FROM orders"]) as backend: response = await backend.generate(prompt) assert response.content == "SELECT COUNT() FROM orders"
Source code in src/agenticapi/testing/mocks.py
MockSandbox¶
MockSandbox ¶
Bases: SandboxRuntime
Mock sandbox for unit testing.
Returns predefined results based on pattern matching against the executed code. Raises SandboxViolation if the code contains any of the denied operations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
allowed_results
|
dict[str, Any] | None
|
Mapping of code substrings to return values. If a key is found in the code, its value is used as the sandbox output. |
None
|
denied_operations
|
list[str] | None
|
List of code substrings that trigger a SandboxViolation when found in the code. |
None
|
Example
sandbox = MockSandbox( allowed_results={"SELECT COUNT()": [{"count": 42}]}, denied_operations=["DROP TABLE"], ) async with sandbox as sb: result = await sb.execute("SELECT COUNT() FROM orders") assert result.return_value == [{"count": 42}]
Source code in src/agenticapi/testing/mocks.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
__init__ ¶
__init__(
*,
allowed_results: dict[str, Any] | None = None,
denied_operations: list[str] | None = None,
) -> None
Initialize the mock sandbox.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
allowed_results
|
dict[str, Any] | None
|
Mapping of code substrings to return values. |
None
|
denied_operations
|
list[str] | None
|
List of code substrings that trigger violations. |
None
|
Source code in src/agenticapi/testing/mocks.py
execute
async
¶
execute(
code: str,
tools: Any = None,
resource_limits: ResourceLimits | None = None,
) -> SandboxResult
Execute code against mock rules.
Checks denied operations first, then matches allowed results. Returns a default SandboxResult if no match is found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
The Python source code to "execute". |
required |
tools
|
Any
|
Ignored in mock implementation. |
None
|
resource_limits
|
ResourceLimits | None
|
Ignored in mock implementation. |
None
|
Returns:
| Type | Description |
|---|---|
SandboxResult
|
SandboxResult with matched or default output. |
Raises:
| Type | Description |
|---|---|
SandboxViolation
|
If the code contains a denied operation. |
Source code in src/agenticapi/testing/mocks.py
__aenter__
async
¶
Assertions¶
assert_code_safe ¶
Assert that code passes static safety analysis.
Runs AST-based static analysis on the provided code and raises AssertionError if any safety violations with severity "error" are found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
Python source code to check. |
required |
denied_modules
|
list[str] | None
|
Optional list of denied module names. |
None
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the code has safety violations. |
Source code in src/agenticapi/testing/assertions.py
assert_policy_enforced ¶
Assert that all policies allow the code.
Evaluates the code against each policy. If any policy denies the code, raises AssertionError with violation details.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
Python source code to evaluate. |
required |
policies
|
list[Policy]
|
List of policies to check against. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If any policy denies the code. |
Source code in src/agenticapi/testing/assertions.py
assert_intent_parsed ¶
Assert that a raw intent string parses to the expected action.
Uses the keyword-based parser (no LLM) for deterministic testing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw
|
str
|
The raw natural language intent string. |
required |
expected_action
|
IntentAction
|
The expected IntentAction after parsing. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the parsed action does not match expected. |
Source code in src/agenticapi/testing/assertions.py
Fixtures¶
create_test_app ¶
create_test_app(
*,
policies: list[Policy] | None = None,
llm_responses: list[str] | None = None,
title: str = "TestApp",
) -> AgenticApp
Create an AgenticApp configured for testing.
Builds an app with optional mock LLM backend and harness engine. Useful for integration tests that need a fully wired application.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
policies
|
list[Policy] | None
|
Optional list of policies for the harness engine. |
None
|
llm_responses
|
list[str] | None
|
Optional list of LLM response strings. If provided, a MockBackend is created and a HarnessEngine is configured. |
None
|
title
|
str
|
Application title (default "TestApp"). |
'TestApp'
|
Returns:
| Type | Description |
|---|---|
AgenticApp
|
A configured AgenticApp ready for testing. |
Example
app = create_test_app( policies=[CodePolicy(denied_modules=["os"])], llm_responses=["SELECT COUNT(*) FROM orders"], )
@app.agent_endpoint(name="test") async def test_agent(intent, context): return {"result": "ok"}
response = await app.process_intent("show orders")
Source code in src/agenticapi/testing/fixtures.py
Benchmarks¶
BenchmarkRunner ¶
Lightweight benchmark runner for performance measurement.
Measures execution time of synchronous functions and stores results for subsequent assertion against targets.
Example
runner = BenchmarkRunner() result = runner.run("intent_parse", fn=parser.parse, iterations=100) runner.assert_within_target("intent_parse", target_ms=50.0)
Source code in src/agenticapi/testing/benchmark.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
__init__ ¶
run ¶
run(
name: str,
fn: Callable[..., Any],
*,
args: tuple[Any, ...] = (),
kwargs: dict[str, Any] | None = None,
iterations: int = 100,
warmup: int = 5,
) -> BenchmarkResult
Run a synchronous benchmark.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name for this benchmark. |
required |
fn
|
Callable[..., Any]
|
The function to benchmark. |
required |
args
|
tuple[Any, ...]
|
Positional arguments for fn. |
()
|
kwargs
|
dict[str, Any] | None
|
Keyword arguments for fn. |
None
|
iterations
|
int
|
Number of iterations to measure. |
100
|
warmup
|
int
|
Number of warmup iterations (not measured). |
5
|
Returns:
| Type | Description |
|---|---|
BenchmarkResult
|
BenchmarkResult with timing statistics. |
Source code in src/agenticapi/testing/benchmark.py
assert_within_target ¶
Assert that a benchmark's mean time is within target.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The benchmark name to check. |
required |
target_ms
|
float
|
Maximum allowed mean time in milliseconds. |
required |
Raises:
| Type | Description |
|---|---|
AssertionError
|
If the mean time exceeds the target. |
KeyError
|
If no result exists for the given name. |
Source code in src/agenticapi/testing/benchmark.py
BenchmarkResult
dataclass
¶
Result of a benchmark run.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the benchmark. |
iterations |
int
|
Number of iterations run. |
total_ms |
float
|
Total time in milliseconds. |
mean_ms |
float
|
Mean time per iteration in milliseconds. |
min_ms |
float
|
Minimum time in milliseconds. |
max_ms |
float
|
Maximum time in milliseconds. |