Sandbox & Analysis¶

SandboxRuntime (Base)¶

SandboxRuntime ¶

Bases: ABC

Abstract base class for sandbox execution environments.

Provides isolated code execution with resource limits and metrics collection. Implementations must support async context manager protocol for resource cleanup.

Phase 1: ProcessSandbox (subprocess-based isolation) Phase 2: ContainerSandbox (container-based isolation)

Source code in src/agenticapi/harness/sandbox/base.py

class SandboxRuntime(ABC):
    """Abstract base class for sandbox execution environments.

    Provides isolated code execution with resource limits and metrics
    collection. Implementations must support async context manager
    protocol for resource cleanup.

    Phase 1: ProcessSandbox (subprocess-based isolation)
    Phase 2: ContainerSandbox (container-based isolation)
    """

    @abstractmethod
    async def execute(
        self,
        code: str,
        tools: Any,
        resource_limits: ResourceLimits,
    ) -> SandboxResult:
        """Execute code in the sandbox.

        Args:
            code: Python source code to execute.
            tools: ToolRegistry or similar providing available tools.
            resource_limits: Resource constraints for execution.

        Returns:
            SandboxResult with output, return value, and metrics.

        Raises:
            SandboxViolation: If a security violation is detected.
            CodeExecutionError: If the code fails to execute.
        """
        ...

    @abstractmethod
    async def __aenter__(self) -> SandboxRuntime:
        """Enter the sandbox context."""
        ...

    @abstractmethod
    async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
        """Exit the sandbox context and clean up resources."""
        ...

execute `abstractmethod` `async` ¶

execute(
    code: str, tools: Any, resource_limits: ResourceLimits
) -> SandboxResult

Execute code in the sandbox.

Parameters:

Name	Type	Description	Default
`code`	`str`	Python source code to execute.	required
`tools`	`Any`	ToolRegistry or similar providing available tools.	required
`resource_limits`	`ResourceLimits`	Resource constraints for execution.	required

Returns:

Type	Description
`SandboxResult`	SandboxResult with output, return value, and metrics.

Raises:

Type	Description
`SandboxViolation`	If a security violation is detected.
`CodeExecutionError`	If the code fails to execute.

Source code in src/agenticapi/harness/sandbox/base.py

@abstractmethod
async def execute(
    self,
    code: str,
    tools: Any,
    resource_limits: ResourceLimits,
) -> SandboxResult:
    """Execute code in the sandbox.

    Args:
        code: Python source code to execute.
        tools: ToolRegistry or similar providing available tools.
        resource_limits: Resource constraints for execution.

    Returns:
        SandboxResult with output, return value, and metrics.

    Raises:
        SandboxViolation: If a security violation is detected.
        CodeExecutionError: If the code fails to execute.
    """
    ...

aenter `abstractmethod` `async` ¶

__aenter__() -> SandboxRuntime

Enter the sandbox context.

Source code in src/agenticapi/harness/sandbox/base.py

@abstractmethod
async def __aenter__(self) -> SandboxRuntime:
    """Enter the sandbox context."""
    ...

aexit `abstractmethod` `async` ¶

__aexit__(exc_type: Any, exc_val: Any, exc_tb: Any) -> None

Exit the sandbox context and clean up resources.

Source code in src/agenticapi/harness/sandbox/base.py

@abstractmethod
async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
    """Exit the sandbox context and clean up resources."""
    ...

ProcessSandbox¶

ProcessSandbox ¶

Bases: SandboxRuntime

Subprocess-based sandbox for executing generated code.

Runs code in a separate Python subprocess with timeout enforcement. Captures stdout/stderr and measures wall-clock execution time.

This is the Phase 1 implementation. It provides process-level isolation but not kernel-level sandboxing. For production multi-tenant use, upgrade to ContainerSandbox (Phase 2).

Example

async with ProcessSandbox() as sandbox: result = await sandbox.execute( code="result = 2 + 2", tools=None, resource_limits=ResourceLimits(max_execution_time_seconds=10), ) assert result.return_value == 4

Source code in src/agenticapi/harness/sandbox/process.py

class ProcessSandbox(SandboxRuntime):
    """Subprocess-based sandbox for executing generated code.

    Runs code in a separate Python subprocess with timeout enforcement.
    Captures stdout/stderr and measures wall-clock execution time.

    This is the Phase 1 implementation. It provides process-level
    isolation but not kernel-level sandboxing. For production
    multi-tenant use, upgrade to ContainerSandbox (Phase 2).

    Example:
        async with ProcessSandbox() as sandbox:
            result = await sandbox.execute(
                code="result = 2 + 2",
                tools=None,
                resource_limits=ResourceLimits(max_execution_time_seconds=10),
            )
            assert result.return_value == 4
    """

    def __init__(self, *, resource_limits: ResourceLimits | None = None) -> None:
        """Initialize the process sandbox.

        Args:
            resource_limits: Default resource limits. Can be overridden
                per-execution in the execute() call.
        """
        self._default_limits = resource_limits or ResourceLimits()

    async def execute(
        self,
        code: str,
        tools: Any = None,
        resource_limits: ResourceLimits | None = None,
        sandbox_data: dict[str, Any] | None = None,
    ) -> SandboxResult:
        """Execute code in an isolated subprocess.

        Args:
            code: Python source code to execute.
            tools: ToolRegistry (currently unused in Phase 1).
            resource_limits: Resource limits to enforce. Falls back to
                default limits if not provided.
            sandbox_data: Optional dict of pre-fetched data to inject into
                the execution namespace as the ``data`` variable.

        Returns:
            SandboxResult with captured output and metrics.

        Raises:
            CodeExecutionError: If the code fails to execute.
            SandboxViolation: If execution times out.
        """
        limits = resource_limits or self._default_limits
        timeout = limits.max_execution_time_seconds

        # Build the wrapper script (base64-encode user code and data for safe transport)
        code_b64 = base64.b64encode(code.encode("utf-8")).decode("ascii")
        data_json = json.dumps(sandbox_data or {}, default=str)
        data_b64 = base64.b64encode(data_json.encode("utf-8")).decode("ascii")
        wrapper_code = _WRAPPER_TEMPLATE.format(code_b64=code_b64, data_b64=data_b64)

        # Write to a temporary file and execute
        start_time = time.monotonic()
        tmp_path: str | None = None

        try:
            with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as tmp:
                tmp_path = tmp.name
                tmp.write(wrapper_code)

            process = await asyncio.create_subprocess_exec(
                "python",
                tmp_path,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )

            try:
                stdout_bytes, stderr_bytes = await asyncio.wait_for(
                    process.communicate(),
                    timeout=timeout,
                )
            except TimeoutError as exc:
                process.kill()
                await process.wait()
                wall_time = (time.monotonic() - start_time) * 1000
                logger.error("sandbox_timeout", timeout=timeout, wall_time_ms=wall_time)
                raise SandboxViolation(f"Code execution timed out after {timeout} seconds") from exc

        except SandboxViolation:
            raise
        except Exception as exc:
            wall_time = (time.monotonic() - start_time) * 1000
            logger.error("sandbox_execution_failed", error=str(exc), wall_time_ms=wall_time)
            raise CodeExecutionError(f"Sandbox execution failed: {exc}") from exc
        finally:
            # Clean up temp file
            if tmp_path is not None:
                import os

                with contextlib.suppress(OSError):
                    os.unlink(tmp_path)

        wall_time = (time.monotonic() - start_time) * 1000
        stdout_str = stdout_bytes.decode("utf-8", errors="replace")
        stderr_str = stderr_bytes.decode("utf-8", errors="replace")

        # Parse the result from stdout
        output: Any = None
        return_value: Any = None

        if "__SANDBOX_RESULT__" in stdout_str:
            parts = stdout_str.split("__SANDBOX_RESULT__", 1)
            user_stdout = parts[0].rstrip("\n")
            result_json = parts[1].strip()

            try:
                parsed = json.loads(result_json)
                output = parsed.get("output")
                return_value = parsed.get("return_value")
                error = parsed.get("error")

                if error:
                    logger.warning("sandbox_code_error", error=error[:500])
                    raise CodeExecutionError(f"Code execution error:\n{error}")
            except json.JSONDecodeError as json_err:
                logger.warning(
                    "sandbox_result_json_parse_failed",
                    error=str(json_err),
                    result_preview=result_json[:200] if result_json else "",
                )
                user_stdout = stdout_str
        else:
            user_stdout = stdout_str

        # Check return code
        if process.returncode != 0 and output != "error":
            logger.warning(
                "sandbox_nonzero_exit",
                returncode=process.returncode,
                stderr=stderr_str[:500],
            )
            raise CodeExecutionError(f"Subprocess exited with code {process.returncode}: {stderr_str[:500]}")

        metrics = ResourceMetrics(
            cpu_time_ms=0.0,  # Not measurable in basic subprocess mode
            memory_peak_mb=0.0,  # Not measurable in basic subprocess mode
            wall_time_ms=wall_time,
        )

        logger.info("sandbox_execution_complete", wall_time_ms=wall_time, has_return_value=return_value is not None)

        return SandboxResult(
            output=output,
            return_value=return_value,
            metrics=metrics,
            stdout=user_stdout,
            stderr=stderr_str,
        )

    async def __aenter__(self) -> ProcessSandbox:
        """Enter the sandbox context."""
        return self

    async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
        """Exit the sandbox context."""

init ¶

__init__(
    *, resource_limits: ResourceLimits | None = None
) -> None

Initialize the process sandbox.

Parameters:

Name	Type	Description	Default
`resource_limits`	`ResourceLimits \| None`	Default resource limits. Can be overridden per-execution in the execute() call.	`None`

Source code in src/agenticapi/harness/sandbox/process.py

def __init__(self, *, resource_limits: ResourceLimits | None = None) -> None:
    """Initialize the process sandbox.

    Args:
        resource_limits: Default resource limits. Can be overridden
            per-execution in the execute() call.
    """
    self._default_limits = resource_limits or ResourceLimits()

execute `async` ¶

execute(
    code: str,
    tools: Any = None,
    resource_limits: ResourceLimits | None = None,
    sandbox_data: dict[str, Any] | None = None,
) -> SandboxResult

Execute code in an isolated subprocess.

Parameters:

Name	Type	Description	Default
`code`	`str`	Python source code to execute.	required
`tools`	`Any`	ToolRegistry (currently unused in Phase 1).	`None`
`resource_limits`	`ResourceLimits \| None`	Resource limits to enforce. Falls back to default limits if not provided.	`None`
`sandbox_data`	`dict[str, Any] \| None`	Optional dict of pre-fetched data to inject into the execution namespace as the `data` variable.	`None`

Returns:

Type	Description
`SandboxResult`	SandboxResult with captured output and metrics.

Raises:

Type	Description
`CodeExecutionError`	If the code fails to execute.
`SandboxViolation`	If execution times out.

Source code in src/agenticapi/harness/sandbox/process.py

async def execute(
    self,
    code: str,
    tools: Any = None,
    resource_limits: ResourceLimits | None = None,
    sandbox_data: dict[str, Any] | None = None,
) -> SandboxResult:
    """Execute code in an isolated subprocess.

    Args:
        code: Python source code to execute.
        tools: ToolRegistry (currently unused in Phase 1).
        resource_limits: Resource limits to enforce. Falls back to
            default limits if not provided.
        sandbox_data: Optional dict of pre-fetched data to inject into
            the execution namespace as the ``data`` variable.

    Returns:
        SandboxResult with captured output and metrics.

    Raises:
        CodeExecutionError: If the code fails to execute.
        SandboxViolation: If execution times out.
    """
    limits = resource_limits or self._default_limits
    timeout = limits.max_execution_time_seconds

    # Build the wrapper script (base64-encode user code and data for safe transport)
    code_b64 = base64.b64encode(code.encode("utf-8")).decode("ascii")
    data_json = json.dumps(sandbox_data or {}, default=str)
    data_b64 = base64.b64encode(data_json.encode("utf-8")).decode("ascii")
    wrapper_code = _WRAPPER_TEMPLATE.format(code_b64=code_b64, data_b64=data_b64)

    # Write to a temporary file and execute
    start_time = time.monotonic()
    tmp_path: str | None = None

    try:
        with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as tmp:
            tmp_path = tmp.name
            tmp.write(wrapper_code)

        process = await asyncio.create_subprocess_exec(
            "python",
            tmp_path,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

        try:
            stdout_bytes, stderr_bytes = await asyncio.wait_for(
                process.communicate(),
                timeout=timeout,
            )
        except TimeoutError as exc:
            process.kill()
            await process.wait()
            wall_time = (time.monotonic() - start_time) * 1000
            logger.error("sandbox_timeout", timeout=timeout, wall_time_ms=wall_time)
            raise SandboxViolation(f"Code execution timed out after {timeout} seconds") from exc

    except SandboxViolation:
        raise
    except Exception as exc:
        wall_time = (time.monotonic() - start_time) * 1000
        logger.error("sandbox_execution_failed", error=str(exc), wall_time_ms=wall_time)
        raise CodeExecutionError(f"Sandbox execution failed: {exc}") from exc
    finally:
        # Clean up temp file
        if tmp_path is not None:
            import os

            with contextlib.suppress(OSError):
                os.unlink(tmp_path)

    wall_time = (time.monotonic() - start_time) * 1000
    stdout_str = stdout_bytes.decode("utf-8", errors="replace")
    stderr_str = stderr_bytes.decode("utf-8", errors="replace")

    # Parse the result from stdout
    output: Any = None
    return_value: Any = None

    if "__SANDBOX_RESULT__" in stdout_str:
        parts = stdout_str.split("__SANDBOX_RESULT__", 1)
        user_stdout = parts[0].rstrip("\n")
        result_json = parts[1].strip()

        try:
            parsed = json.loads(result_json)
            output = parsed.get("output")
            return_value = parsed.get("return_value")
            error = parsed.get("error")

            if error:
                logger.warning("sandbox_code_error", error=error[:500])
                raise CodeExecutionError(f"Code execution error:\n{error}")
        except json.JSONDecodeError as json_err:
            logger.warning(
                "sandbox_result_json_parse_failed",
                error=str(json_err),
                result_preview=result_json[:200] if result_json else "",
            )
            user_stdout = stdout_str
    else:
        user_stdout = stdout_str

    # Check return code
    if process.returncode != 0 and output != "error":
        logger.warning(
            "sandbox_nonzero_exit",
            returncode=process.returncode,
            stderr=stderr_str[:500],
        )
        raise CodeExecutionError(f"Subprocess exited with code {process.returncode}: {stderr_str[:500]}")

    metrics = ResourceMetrics(
        cpu_time_ms=0.0,  # Not measurable in basic subprocess mode
        memory_peak_mb=0.0,  # Not measurable in basic subprocess mode
        wall_time_ms=wall_time,
    )

    logger.info("sandbox_execution_complete", wall_time_ms=wall_time, has_return_value=return_value is not None)

    return SandboxResult(
        output=output,
        return_value=return_value,
        metrics=metrics,
        stdout=user_stdout,
        stderr=stderr_str,
    )

aenter `async` ¶

__aenter__() -> ProcessSandbox

Enter the sandbox context.

Source code in src/agenticapi/harness/sandbox/process.py

async def __aenter__(self) -> ProcessSandbox:
    """Enter the sandbox context."""
    return self

aexit `async` ¶

__aexit__(exc_type: Any, exc_val: Any, exc_tb: Any) -> None

Exit the sandbox context.

Source code in src/agenticapi/harness/sandbox/process.py

async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
    """Exit the sandbox context."""

ResourceLimits¶

ResourceLimits `dataclass` ¶

Resource limits for sandbox execution.

Attributes:

Name	Type	Description
`max_cpu_seconds`	`float`	Maximum CPU time allowed in seconds.
`max_memory_mb`	`int`	Maximum memory usage in megabytes.
`max_execution_time_seconds`	`float`	Maximum wall-clock time in seconds.

Source code in src/agenticapi/harness/sandbox/base.py

@dataclass(frozen=True, slots=True)
class ResourceLimits:
    """Resource limits for sandbox execution.

    Attributes:
        max_cpu_seconds: Maximum CPU time allowed in seconds.
        max_memory_mb: Maximum memory usage in megabytes.
        max_execution_time_seconds: Maximum wall-clock time in seconds.
    """

    max_cpu_seconds: float = 30.0
    max_memory_mb: int = 512
    max_execution_time_seconds: float = 60.0

SandboxResult¶

SandboxResult `dataclass` ¶

Result of sandbox code execution.

Attributes:

Name	Type	Description
`output`	`Any`	The primary output of the executed code.
`return_value`	`Any`	The return value of the executed code.
`metrics`	`ResourceMetrics`	Resource usage metrics from execution.
`stdout`	`str`	Captured standard output.
`stderr`	`str`	Captured standard error.

Source code in src/agenticapi/harness/sandbox/base.py

@dataclass(slots=True)
class SandboxResult:
    """Result of sandbox code execution.

    Attributes:
        output: The primary output of the executed code.
        return_value: The return value of the executed code.
        metrics: Resource usage metrics from execution.
        stdout: Captured standard output.
        stderr: Captured standard error.
    """

    output: Any
    return_value: Any
    metrics: ResourceMetrics
    stdout: str = ""
    stderr: str = ""

ResourceMetrics¶

ResourceMetrics `dataclass` ¶

Metrics collected during sandbox execution.

Attributes:

Name	Type	Description
`cpu_time_ms`	`float`	CPU time consumed in milliseconds.
`memory_peak_mb`	`float`	Peak memory usage in megabytes.
`wall_time_ms`	`float`	Wall-clock time in milliseconds.

Source code in src/agenticapi/harness/sandbox/base.py

@dataclass(frozen=True, slots=True)
class ResourceMetrics:
    """Metrics collected during sandbox execution.

    Attributes:
        cpu_time_ms: CPU time consumed in milliseconds.
        memory_peak_mb: Peak memory usage in megabytes.
        wall_time_ms: Wall-clock time in milliseconds.
    """

    cpu_time_ms: float
    memory_peak_mb: float
    wall_time_ms: float

Static Analysis¶

check_code_safety ¶

check_code_safety(
    code: str,
    *,
    allowed_modules: list[str] | None = None,
    denied_modules: list[str] | None = None,
    deny_eval_exec: bool = True,
    deny_dynamic_import: bool = True,
) -> SafetyResult

Check generated code safety using AST analysis.

Parses the code into an AST and walks all nodes to detect dangerous patterns. Returns a SafetyResult indicating whether the code is safe to execute.

Parameters:

Name	Type	Description	Default
`code`	`str`	Python source code to analyze.	required
`allowed_modules`	`list[str] \| None`	Whitelist of allowed modules (if provided, only these modules may be imported).	`None`
`denied_modules`	`list[str] \| None`	Blacklist of denied modules.	`None`
`deny_eval_exec`	`bool`	Whether to flag eval()/exec() as violations.	`True`
`deny_dynamic_import`	`bool`	Whether to flag import() as violations.	`True`

Returns:

Type	Description
`SafetyResult`	SafetyResult with safe=True if no violations, or safe=False
`SafetyResult`	with a list of SafetyViolation objects.

Source code in src/agenticapi/harness/sandbox/static_analysis.py

def check_code_safety(
    code: str,
    *,
    allowed_modules: list[str] | None = None,
    denied_modules: list[str] | None = None,
    deny_eval_exec: bool = True,
    deny_dynamic_import: bool = True,
) -> SafetyResult:
    """Check generated code safety using AST analysis.

    Parses the code into an AST and walks all nodes to detect
    dangerous patterns. Returns a SafetyResult indicating whether
    the code is safe to execute.

    Args:
        code: Python source code to analyze.
        allowed_modules: Whitelist of allowed modules (if provided,
            only these modules may be imported).
        denied_modules: Blacklist of denied modules.
        deny_eval_exec: Whether to flag eval()/exec() as violations.
        deny_dynamic_import: Whether to flag __import__() as violations.

    Returns:
        SafetyResult with safe=True if no violations, or safe=False
        with a list of SafetyViolation objects.
    """
    violations: list[SafetyViolation] = []

    try:
        tree = ast.parse(code)
    except SyntaxError as e:
        violations.append(
            SafetyViolation(
                rule="syntax_error",
                description=f"Code has syntax error: {e}",
                line=e.lineno or 0,
                col=e.offset or 0,
                severity="error",
            )
        )
        return SafetyResult(safe=False, violations=violations)

    for node in ast.walk(tree):
        _check_imports(node, violations, allowed_modules=allowed_modules, denied_modules=denied_modules)
        _check_dangerous_calls(node, violations, deny_eval_exec=deny_eval_exec, deny_dynamic_import=deny_dynamic_import)
        _check_dangerous_builtins(node, violations)
        _check_file_io(node, violations)

    has_errors = any(v.severity == "error" for v in violations)
    return SafetyResult(safe=not has_errors, violations=violations)

SafetyResult¶

SafetyResult `dataclass` ¶

Result of static safety analysis.

Attributes:

Name	Type	Description
`safe`	`bool`	Whether the code passed all safety checks.
`violations`	`list[SafetyViolation]`	List of violations found (empty if safe).

Source code in src/agenticapi/harness/sandbox/static_analysis.py

@dataclass(frozen=True, slots=True)
class SafetyResult:
    """Result of static safety analysis.

    Attributes:
        safe: Whether the code passed all safety checks.
        violations: List of violations found (empty if safe).
    """

    safe: bool
    violations: list[SafetyViolation] = field(default_factory=list)

SafetyViolation¶

SafetyViolation `dataclass` ¶

A single safety violation detected by static analysis.

Attributes:

Name	Type	Description
`rule`	`str`	Identifier for the violated rule.
`description`	`str`	Human-readable description of the violation.
`line`	`int`	Line number where the violation was found.
`col`	`int`	Column offset where the violation was found.
`severity`	`str`	Severity level ("error" or "warning").

Source code in src/agenticapi/harness/sandbox/static_analysis.py

@dataclass(frozen=True, slots=True)
class SafetyViolation:
    """A single safety violation detected by static analysis.

    Attributes:
        rule: Identifier for the violated rule.
        description: Human-readable description of the violation.
        line: Line number where the violation was found.
        col: Column offset where the violation was found.
        severity: Severity level ("error" or "warning").
    """

    rule: str
    description: str
    line: int
    col: int
    severity: str  # "error" | "warning"

Monitors¶

ResourceMonitor ¶

Monitors resource usage against configured limits.

Checks that CPU time, memory, and wall time stayed within the configured resource limits.

Example

monitor = ResourceMonitor(limits=ResourceLimits(max_cpu_seconds=10)) result = await monitor.on_execution_complete(sandbox_result, code="...")

Source code in src/agenticapi/harness/sandbox/monitors.py

class ResourceMonitor:
    """Monitors resource usage against configured limits.

    Checks that CPU time, memory, and wall time stayed within
    the configured resource limits.

    Example:
        monitor = ResourceMonitor(limits=ResourceLimits(max_cpu_seconds=10))
        result = await monitor.on_execution_complete(sandbox_result, code="...")
    """

    def __init__(self, *, limits: ResourceLimits) -> None:
        """Initialize with resource limits to check against.

        Args:
            limits: The resource limits to enforce.
        """
        self._limits = limits

    async def on_execution_complete(
        self,
        result: SandboxResult,
        *,
        code: str,
    ) -> MonitorResult:
        """Check resource usage against limits.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.

        Returns:
            MonitorResult with violations if limits exceeded.
        """
        violations: list[str] = []
        warnings: list[str] = []

        metrics = result.metrics

        cpu_limit_ms = self._limits.max_cpu_seconds * 1000
        if metrics.cpu_time_ms > cpu_limit_ms:
            violations.append(f"CPU time {metrics.cpu_time_ms:.1f}ms exceeded limit of {cpu_limit_ms:.1f}ms")
        elif metrics.cpu_time_ms > cpu_limit_ms * 0.8:
            warnings.append(f"CPU time {metrics.cpu_time_ms:.1f}ms approaching limit of {cpu_limit_ms:.1f}ms")

        if metrics.memory_peak_mb > self._limits.max_memory_mb:
            violations.append(f"Memory {metrics.memory_peak_mb:.1f}MB exceeded limit of {self._limits.max_memory_mb}MB")

        wall_limit_ms = self._limits.max_execution_time_seconds * 1000
        if metrics.wall_time_ms > wall_limit_ms:
            violations.append(f"Wall time {metrics.wall_time_ms:.1f}ms exceeded limit of {wall_limit_ms:.1f}ms")

        passed = len(violations) == 0

        if not passed:
            logger.warning(
                "resource_monitor_violation",
                violations=violations,
                cpu_time_ms=metrics.cpu_time_ms,
                memory_peak_mb=metrics.memory_peak_mb,
                wall_time_ms=metrics.wall_time_ms,
            )

        return MonitorResult(passed=passed, warnings=warnings, violations=violations)

init ¶

__init__(*, limits: ResourceLimits) -> None

Initialize with resource limits to check against.

Parameters:

Name	Type	Description	Default
`limits`	`ResourceLimits`	The resource limits to enforce.	required

Source code in src/agenticapi/harness/sandbox/monitors.py

def __init__(self, *, limits: ResourceLimits) -> None:
    """Initialize with resource limits to check against.

    Args:
        limits: The resource limits to enforce.
    """
    self._limits = limits

on_execution_complete `async` ¶

on_execution_complete(
    result: SandboxResult, *, code: str
) -> MonitorResult

Check resource usage against limits.

Parameters:

Name	Type	Description	Default
`result`	`SandboxResult`	The sandbox execution result.	required
`code`	`str`	The code that was executed.	required

Returns:

Type	Description
`MonitorResult`	MonitorResult with violations if limits exceeded.

Source code in src/agenticapi/harness/sandbox/monitors.py

async def on_execution_complete(
    self,
    result: SandboxResult,
    *,
    code: str,
) -> MonitorResult:
    """Check resource usage against limits.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.

    Returns:
        MonitorResult with violations if limits exceeded.
    """
    violations: list[str] = []
    warnings: list[str] = []

    metrics = result.metrics

    cpu_limit_ms = self._limits.max_cpu_seconds * 1000
    if metrics.cpu_time_ms > cpu_limit_ms:
        violations.append(f"CPU time {metrics.cpu_time_ms:.1f}ms exceeded limit of {cpu_limit_ms:.1f}ms")
    elif metrics.cpu_time_ms > cpu_limit_ms * 0.8:
        warnings.append(f"CPU time {metrics.cpu_time_ms:.1f}ms approaching limit of {cpu_limit_ms:.1f}ms")

    if metrics.memory_peak_mb > self._limits.max_memory_mb:
        violations.append(f"Memory {metrics.memory_peak_mb:.1f}MB exceeded limit of {self._limits.max_memory_mb}MB")

    wall_limit_ms = self._limits.max_execution_time_seconds * 1000
    if metrics.wall_time_ms > wall_limit_ms:
        violations.append(f"Wall time {metrics.wall_time_ms:.1f}ms exceeded limit of {wall_limit_ms:.1f}ms")

    passed = len(violations) == 0

    if not passed:
        logger.warning(
            "resource_monitor_violation",
            violations=violations,
            cpu_time_ms=metrics.cpu_time_ms,
            memory_peak_mb=metrics.memory_peak_mb,
            wall_time_ms=metrics.wall_time_ms,
        )

    return MonitorResult(passed=passed, warnings=warnings, violations=violations)

OutputSizeMonitor ¶

Monitors output size to prevent memory issues.

Checks that the combined size of stdout, stderr, and return value does not exceed a configurable limit.

Example

monitor = OutputSizeMonitor(max_output_bytes=1_000_000) result = await monitor.on_execution_complete(sandbox_result, code="...")

Source code in src/agenticapi/harness/sandbox/monitors.py

class OutputSizeMonitor:
    """Monitors output size to prevent memory issues.

    Checks that the combined size of stdout, stderr, and return value
    does not exceed a configurable limit.

    Example:
        monitor = OutputSizeMonitor(max_output_bytes=1_000_000)
        result = await monitor.on_execution_complete(sandbox_result, code="...")
    """

    def __init__(self, *, max_output_bytes: int = 1_000_000) -> None:
        """Initialize with maximum output size.

        Args:
            max_output_bytes: Maximum allowed output size in bytes.
        """
        self._max_output_bytes = max_output_bytes

    async def on_execution_complete(
        self,
        result: SandboxResult,
        *,
        code: str,
    ) -> MonitorResult:
        """Check output size against limit.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.

        Returns:
            MonitorResult with violations if output too large.
        """
        total_size = sys.getsizeof(result.stdout) + sys.getsizeof(result.stderr)

        try:
            total_size += len(json.dumps(result.return_value, default=str).encode())
        except (TypeError, ValueError):
            total_size += sys.getsizeof(result.return_value)

        violations: list[str] = []
        warnings: list[str] = []

        if total_size > self._max_output_bytes:
            violations.append(f"Output size {total_size} bytes exceeded limit of {self._max_output_bytes} bytes")
        elif total_size > self._max_output_bytes * 0.8:
            warnings.append(f"Output size {total_size} bytes approaching limit of {self._max_output_bytes} bytes")

        passed = len(violations) == 0

        if not passed:
            logger.warning("output_size_monitor_violation", total_size=total_size)

        return MonitorResult(passed=passed, warnings=warnings, violations=violations)

init ¶

__init__(*, max_output_bytes: int = 1000000) -> None

Initialize with maximum output size.

Parameters:

Name	Type	Description	Default
`max_output_bytes`	`int`	Maximum allowed output size in bytes.	`1000000`

Source code in src/agenticapi/harness/sandbox/monitors.py

def __init__(self, *, max_output_bytes: int = 1_000_000) -> None:
    """Initialize with maximum output size.

    Args:
        max_output_bytes: Maximum allowed output size in bytes.
    """
    self._max_output_bytes = max_output_bytes

on_execution_complete `async` ¶

on_execution_complete(
    result: SandboxResult, *, code: str
) -> MonitorResult

Check output size against limit.

Parameters:

Name	Type	Description	Default
`result`	`SandboxResult`	The sandbox execution result.	required
`code`	`str`	The code that was executed.	required

Returns:

Type	Description
`MonitorResult`	MonitorResult with violations if output too large.

Source code in src/agenticapi/harness/sandbox/monitors.py

async def on_execution_complete(
    self,
    result: SandboxResult,
    *,
    code: str,
) -> MonitorResult:
    """Check output size against limit.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.

    Returns:
        MonitorResult with violations if output too large.
    """
    total_size = sys.getsizeof(result.stdout) + sys.getsizeof(result.stderr)

    try:
        total_size += len(json.dumps(result.return_value, default=str).encode())
    except (TypeError, ValueError):
        total_size += sys.getsizeof(result.return_value)

    violations: list[str] = []
    warnings: list[str] = []

    if total_size > self._max_output_bytes:
        violations.append(f"Output size {total_size} bytes exceeded limit of {self._max_output_bytes} bytes")
    elif total_size > self._max_output_bytes * 0.8:
        warnings.append(f"Output size {total_size} bytes approaching limit of {self._max_output_bytes} bytes")

    passed = len(violations) == 0

    if not passed:
        logger.warning("output_size_monitor_violation", total_size=total_size)

    return MonitorResult(passed=passed, warnings=warnings, violations=violations)

Validators¶

OutputTypeValidator ¶

Validates that execution output is JSON-serializable.

Ensures the return value can be safely serialized for API responses.

Example

validator = OutputTypeValidator() result = await validator.validate(sandbox_result, code="...", intent_action="read")

Source code in src/agenticapi/harness/sandbox/validators.py

class OutputTypeValidator:
    """Validates that execution output is JSON-serializable.

    Ensures the return value can be safely serialized for API responses.

    Example:
        validator = OutputTypeValidator()
        result = await validator.validate(sandbox_result, code="...", intent_action="read")
    """

    async def validate(
        self,
        result: SandboxResult,
        *,
        code: str,
        intent_action: str,
    ) -> ValidationResult:
        """Check that the return value is JSON-serializable.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.
            intent_action: The intent action type.

        Returns:
            ValidationResult with errors if output cannot be serialized.
        """
        if result.return_value is None:
            return ValidationResult(valid=True)

        try:
            json.dumps(result.return_value, default=str)
            return ValidationResult(valid=True)
        except (TypeError, ValueError, OverflowError) as exc:
            logger.warning(
                "output_type_validation_failed",
                error=str(exc),
            )
            return ValidationResult(
                valid=False,
                errors=[f"Return value is not JSON-serializable: {exc}"],
            )

validate `async` ¶

validate(
    result: SandboxResult, *, code: str, intent_action: str
) -> ValidationResult

Check that the return value is JSON-serializable.

Parameters:

Name	Type	Description	Default
`result`	`SandboxResult`	The sandbox execution result.	required
`code`	`str`	The code that was executed.	required
`intent_action`	`str`	The intent action type.	required

Returns:

Type	Description
`ValidationResult`	ValidationResult with errors if output cannot be serialized.

Source code in src/agenticapi/harness/sandbox/validators.py

async def validate(
    self,
    result: SandboxResult,
    *,
    code: str,
    intent_action: str,
) -> ValidationResult:
    """Check that the return value is JSON-serializable.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.
        intent_action: The intent action type.

    Returns:
        ValidationResult with errors if output cannot be serialized.
    """
    if result.return_value is None:
        return ValidationResult(valid=True)

    try:
        json.dumps(result.return_value, default=str)
        return ValidationResult(valid=True)
    except (TypeError, ValueError, OverflowError) as exc:
        logger.warning(
            "output_type_validation_failed",
            error=str(exc),
        )
        return ValidationResult(
            valid=False,
            errors=[f"Return value is not JSON-serializable: {exc}"],
        )

ReadOnlyValidator ¶

Validates that read intents did not produce write-like output.

Checks stderr and stdout for patterns that suggest write operations occurred during what should have been a read-only operation.

Example

validator = ReadOnlyValidator() result = await validator.validate(sandbox_result, code="...", intent_action="read")

Source code in src/agenticapi/harness/sandbox/validators.py

class ReadOnlyValidator:
    """Validates that read intents did not produce write-like output.

    Checks stderr and stdout for patterns that suggest write operations
    occurred during what should have been a read-only operation.

    Example:
        validator = ReadOnlyValidator()
        result = await validator.validate(sandbox_result, code="...", intent_action="read")
    """

    WRITE_PATTERNS: ClassVar[list[str]] = [
        "INSERT INTO",
        "UPDATE ",
        "DELETE FROM",
        "DROP TABLE",
        "ALTER TABLE",
        "CREATE TABLE",
        "TRUNCATE",
    ]

    async def validate(
        self,
        result: SandboxResult,
        *,
        code: str,
        intent_action: str,
    ) -> ValidationResult:
        """Check for write patterns in read-only operations.

        Only validates when intent_action is "read". Other actions
        are allowed to have write-like output.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.
            intent_action: The intent action type.

        Returns:
            ValidationResult with warnings if write patterns detected.
        """
        if intent_action != "read":
            return ValidationResult(valid=True)

        warnings: list[str] = []
        combined_output = f"{result.stdout} {result.stderr}".upper()

        for pattern in self.WRITE_PATTERNS:
            if pattern in combined_output:
                warnings.append(f"Read-only operation produced write-like output containing '{pattern}'")

        if warnings:
            logger.warning(
                "read_only_validation_warning",
                warnings=warnings,
                intent_action=intent_action,
            )

        return ValidationResult(valid=True, warnings=warnings)

validate `async` ¶

validate(
    result: SandboxResult, *, code: str, intent_action: str
) -> ValidationResult

Check for write patterns in read-only operations.

Only validates when intent_action is "read". Other actions are allowed to have write-like output.

Parameters:

Name	Type	Description	Default
`result`	`SandboxResult`	The sandbox execution result.	required
`code`	`str`	The code that was executed.	required
`intent_action`	`str`	The intent action type.	required

Returns:

Type	Description
`ValidationResult`	ValidationResult with warnings if write patterns detected.

Source code in src/agenticapi/harness/sandbox/validators.py

async def validate(
    self,
    result: SandboxResult,
    *,
    code: str,
    intent_action: str,
) -> ValidationResult:
    """Check for write patterns in read-only operations.

    Only validates when intent_action is "read". Other actions
    are allowed to have write-like output.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.
        intent_action: The intent action type.

    Returns:
        ValidationResult with warnings if write patterns detected.
    """
    if intent_action != "read":
        return ValidationResult(valid=True)

    warnings: list[str] = []
    combined_output = f"{result.stdout} {result.stderr}".upper()

    for pattern in self.WRITE_PATTERNS:
        if pattern in combined_output:
            warnings.append(f"Read-only operation produced write-like output containing '{pattern}'")

    if warnings:
        logger.warning(
            "read_only_validation_warning",
            warnings=warnings,
            intent_action=intent_action,
        )

    return ValidationResult(valid=True, warnings=warnings)

Sandbox & Analysis¶

SandboxRuntime (Base)¶

SandboxRuntime ¶

execute abstractmethod async ¶

__aenter__ abstractmethod async ¶

__aexit__ abstractmethod async ¶

ProcessSandbox¶

ProcessSandbox ¶

__init__ ¶

execute async ¶

__aenter__ async ¶

__aexit__ async ¶

ResourceLimits¶

ResourceLimits dataclass ¶

SandboxResult¶

SandboxResult dataclass ¶

ResourceMetrics¶

ResourceMetrics dataclass ¶

Static Analysis¶

check_code_safety ¶

SafetyResult¶

SafetyResult dataclass ¶

SafetyViolation¶

SafetyViolation dataclass ¶

Monitors¶

ResourceMonitor ¶

__init__ ¶

on_execution_complete async ¶

OutputSizeMonitor ¶

__init__ ¶

on_execution_complete async ¶

Validators¶

OutputTypeValidator ¶

validate async ¶

ReadOnlyValidator ¶

validate async ¶

execute `abstractmethod` `async` ¶

aenter `abstractmethod` `async` ¶

aexit `abstractmethod` `async` ¶

init ¶

execute `async` ¶

aenter `async` ¶

aexit `async` ¶

ResourceLimits `dataclass` ¶

SandboxResult `dataclass` ¶

ResourceMetrics `dataclass` ¶

SafetyResult `dataclass` ¶

SafetyViolation `dataclass` ¶

init ¶

on_execution_complete `async` ¶

init ¶

on_execution_complete `async` ¶

validate `async` ¶

validate `async` ¶