Sandbox & Analysis¶
SandboxRuntime (Base)¶
SandboxRuntime ¶
Bases: ABC
Abstract base class for sandbox execution environments.
Provides isolated code execution with resource limits and metrics collection. Implementations must support async context manager protocol for resource cleanup.
Phase 1: ProcessSandbox (subprocess-based isolation) Phase 2: ContainerSandbox (container-based isolation)
Source code in src/agenticapi/harness/sandbox/base.py
execute
abstractmethod
async
¶
Execute code in the sandbox.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
Python source code to execute. |
required |
tools
|
Any
|
ToolRegistry or similar providing available tools. |
required |
resource_limits
|
ResourceLimits
|
Resource constraints for execution. |
required |
Returns:
| Type | Description |
|---|---|
SandboxResult
|
SandboxResult with output, return value, and metrics. |
Raises:
| Type | Description |
|---|---|
SandboxViolation
|
If a security violation is detected. |
CodeExecutionError
|
If the code fails to execute. |
Source code in src/agenticapi/harness/sandbox/base.py
__aenter__
abstractmethod
async
¶
__aexit__
abstractmethod
async
¶
ProcessSandbox¶
ProcessSandbox ¶
Bases: SandboxRuntime
Subprocess-based sandbox for executing generated code.
Runs code in a separate Python subprocess with timeout enforcement. Captures stdout/stderr and measures wall-clock execution time.
This is the Phase 1 implementation. It provides process-level isolation but not kernel-level sandboxing. For production multi-tenant use, upgrade to ContainerSandbox (Phase 2).
Example
async with ProcessSandbox() as sandbox: result = await sandbox.execute( code="result = 2 + 2", tools=None, resource_limits=ResourceLimits(max_execution_time_seconds=10), ) assert result.return_value == 4
Source code in src/agenticapi/harness/sandbox/process.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | |
__init__ ¶
Initialize the process sandbox.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
resource_limits
|
ResourceLimits | None
|
Default resource limits. Can be overridden per-execution in the execute() call. |
None
|
Source code in src/agenticapi/harness/sandbox/process.py
execute
async
¶
execute(
code: str,
tools: Any = None,
resource_limits: ResourceLimits | None = None,
sandbox_data: dict[str, Any] | None = None,
) -> SandboxResult
Execute code in an isolated subprocess.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
Python source code to execute. |
required |
tools
|
Any
|
ToolRegistry (currently unused in Phase 1). |
None
|
resource_limits
|
ResourceLimits | None
|
Resource limits to enforce. Falls back to default limits if not provided. |
None
|
sandbox_data
|
dict[str, Any] | None
|
Optional dict of pre-fetched data to inject into
the execution namespace as the |
None
|
Returns:
| Type | Description |
|---|---|
SandboxResult
|
SandboxResult with captured output and metrics. |
Raises:
| Type | Description |
|---|---|
CodeExecutionError
|
If the code fails to execute. |
SandboxViolation
|
If execution times out. |
Source code in src/agenticapi/harness/sandbox/process.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | |
__aenter__
async
¶
ResourceLimits¶
ResourceLimits
dataclass
¶
Resource limits for sandbox execution.
Attributes:
| Name | Type | Description |
|---|---|---|
max_cpu_seconds |
float
|
Maximum CPU time allowed in seconds. |
max_memory_mb |
int
|
Maximum memory usage in megabytes. |
max_execution_time_seconds |
float
|
Maximum wall-clock time in seconds. |
Source code in src/agenticapi/harness/sandbox/base.py
SandboxResult¶
SandboxResult
dataclass
¶
Result of sandbox code execution.
Attributes:
| Name | Type | Description |
|---|---|---|
output |
Any
|
The primary output of the executed code. |
return_value |
Any
|
The return value of the executed code. |
metrics |
ResourceMetrics
|
Resource usage metrics from execution. |
stdout |
str
|
Captured standard output. |
stderr |
str
|
Captured standard error. |
Source code in src/agenticapi/harness/sandbox/base.py
ResourceMetrics¶
ResourceMetrics
dataclass
¶
Metrics collected during sandbox execution.
Attributes:
| Name | Type | Description |
|---|---|---|
cpu_time_ms |
float
|
CPU time consumed in milliseconds. |
memory_peak_mb |
float
|
Peak memory usage in megabytes. |
wall_time_ms |
float
|
Wall-clock time in milliseconds. |
Source code in src/agenticapi/harness/sandbox/base.py
Static Analysis¶
check_code_safety ¶
check_code_safety(
code: str,
*,
allowed_modules: list[str] | None = None,
denied_modules: list[str] | None = None,
deny_eval_exec: bool = True,
deny_dynamic_import: bool = True,
) -> SafetyResult
Check generated code safety using AST analysis.
Parses the code into an AST and walks all nodes to detect dangerous patterns. Returns a SafetyResult indicating whether the code is safe to execute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
Python source code to analyze. |
required |
allowed_modules
|
list[str] | None
|
Whitelist of allowed modules (if provided, only these modules may be imported). |
None
|
denied_modules
|
list[str] | None
|
Blacklist of denied modules. |
None
|
deny_eval_exec
|
bool
|
Whether to flag eval()/exec() as violations. |
True
|
deny_dynamic_import
|
bool
|
Whether to flag import() as violations. |
True
|
Returns:
| Type | Description |
|---|---|
SafetyResult
|
SafetyResult with safe=True if no violations, or safe=False |
SafetyResult
|
with a list of SafetyViolation objects. |
Source code in src/agenticapi/harness/sandbox/static_analysis.py
SafetyResult¶
SafetyResult
dataclass
¶
Result of static safety analysis.
Attributes:
| Name | Type | Description |
|---|---|---|
safe |
bool
|
Whether the code passed all safety checks. |
violations |
list[SafetyViolation]
|
List of violations found (empty if safe). |
Source code in src/agenticapi/harness/sandbox/static_analysis.py
SafetyViolation¶
SafetyViolation
dataclass
¶
A single safety violation detected by static analysis.
Attributes:
| Name | Type | Description |
|---|---|---|
rule |
str
|
Identifier for the violated rule. |
description |
str
|
Human-readable description of the violation. |
line |
int
|
Line number where the violation was found. |
col |
int
|
Column offset where the violation was found. |
severity |
str
|
Severity level ("error" or "warning"). |
Source code in src/agenticapi/harness/sandbox/static_analysis.py
Monitors¶
ResourceMonitor ¶
Monitors resource usage against configured limits.
Checks that CPU time, memory, and wall time stayed within the configured resource limits.
Example
monitor = ResourceMonitor(limits=ResourceLimits(max_cpu_seconds=10)) result = await monitor.on_execution_complete(sandbox_result, code="...")
Source code in src/agenticapi/harness/sandbox/monitors.py
__init__ ¶
Initialize with resource limits to check against.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limits
|
ResourceLimits
|
The resource limits to enforce. |
required |
on_execution_complete
async
¶
Check resource usage against limits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
SandboxResult
|
The sandbox execution result. |
required |
code
|
str
|
The code that was executed. |
required |
Returns:
| Type | Description |
|---|---|
MonitorResult
|
MonitorResult with violations if limits exceeded. |
Source code in src/agenticapi/harness/sandbox/monitors.py
OutputSizeMonitor ¶
Monitors output size to prevent memory issues.
Checks that the combined size of stdout, stderr, and return value does not exceed a configurable limit.
Example
monitor = OutputSizeMonitor(max_output_bytes=1_000_000) result = await monitor.on_execution_complete(sandbox_result, code="...")
Source code in src/agenticapi/harness/sandbox/monitors.py
__init__ ¶
Initialize with maximum output size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_output_bytes
|
int
|
Maximum allowed output size in bytes. |
1000000
|
on_execution_complete
async
¶
Check output size against limit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
SandboxResult
|
The sandbox execution result. |
required |
code
|
str
|
The code that was executed. |
required |
Returns:
| Type | Description |
|---|---|
MonitorResult
|
MonitorResult with violations if output too large. |
Source code in src/agenticapi/harness/sandbox/monitors.py
Validators¶
OutputTypeValidator ¶
Validates that execution output is JSON-serializable.
Ensures the return value can be safely serialized for API responses.
Example
validator = OutputTypeValidator() result = await validator.validate(sandbox_result, code="...", intent_action="read")
Source code in src/agenticapi/harness/sandbox/validators.py
validate
async
¶
Check that the return value is JSON-serializable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
SandboxResult
|
The sandbox execution result. |
required |
code
|
str
|
The code that was executed. |
required |
intent_action
|
str
|
The intent action type. |
required |
Returns:
| Type | Description |
|---|---|
ValidationResult
|
ValidationResult with errors if output cannot be serialized. |
Source code in src/agenticapi/harness/sandbox/validators.py
ReadOnlyValidator ¶
Validates that read intents did not produce write-like output.
Checks stderr and stdout for patterns that suggest write operations occurred during what should have been a read-only operation.
Example
validator = ReadOnlyValidator() result = await validator.validate(sandbox_result, code="...", intent_action="read")
Source code in src/agenticapi/harness/sandbox/validators.py
validate
async
¶
Check for write patterns in read-only operations.
Only validates when intent_action is "read". Other actions are allowed to have write-like output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
SandboxResult
|
The sandbox execution result. |
required |
code
|
str
|
The code that was executed. |
required |
intent_action
|
str
|
The intent action type. |
required |
Returns:
| Type | Description |
|---|---|
ValidationResult
|
ValidationResult with warnings if write patterns detected. |