Skip to content

Sandbox & Analysis

SandboxRuntime (Base)

SandboxRuntime

Bases: ABC

Abstract base class for sandbox execution environments.

Provides isolated code execution with resource limits and metrics collection. Implementations must support async context manager protocol for resource cleanup.

Phase 1: ProcessSandbox (subprocess-based isolation) Phase 2: ContainerSandbox (container-based isolation)

Source code in src/agenticapi/harness/sandbox/base.py
class SandboxRuntime(ABC):
    """Abstract base class for sandbox execution environments.

    Provides isolated code execution with resource limits and metrics
    collection. Implementations must support async context manager
    protocol for resource cleanup.

    Phase 1: ProcessSandbox (subprocess-based isolation)
    Phase 2: ContainerSandbox (container-based isolation)
    """

    @abstractmethod
    async def execute(
        self,
        code: str,
        tools: Any,
        resource_limits: ResourceLimits,
    ) -> SandboxResult:
        """Execute code in the sandbox.

        Args:
            code: Python source code to execute.
            tools: ToolRegistry or similar providing available tools.
            resource_limits: Resource constraints for execution.

        Returns:
            SandboxResult with output, return value, and metrics.

        Raises:
            SandboxViolation: If a security violation is detected.
            CodeExecutionError: If the code fails to execute.
        """
        ...

    @abstractmethod
    async def __aenter__(self) -> SandboxRuntime:
        """Enter the sandbox context."""
        ...

    @abstractmethod
    async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
        """Exit the sandbox context and clean up resources."""
        ...

execute abstractmethod async

execute(
    code: str, tools: Any, resource_limits: ResourceLimits
) -> SandboxResult

Execute code in the sandbox.

Parameters:

Name Type Description Default
code str

Python source code to execute.

required
tools Any

ToolRegistry or similar providing available tools.

required
resource_limits ResourceLimits

Resource constraints for execution.

required

Returns:

Type Description
SandboxResult

SandboxResult with output, return value, and metrics.

Raises:

Type Description
SandboxViolation

If a security violation is detected.

CodeExecutionError

If the code fails to execute.

Source code in src/agenticapi/harness/sandbox/base.py
@abstractmethod
async def execute(
    self,
    code: str,
    tools: Any,
    resource_limits: ResourceLimits,
) -> SandboxResult:
    """Execute code in the sandbox.

    Args:
        code: Python source code to execute.
        tools: ToolRegistry or similar providing available tools.
        resource_limits: Resource constraints for execution.

    Returns:
        SandboxResult with output, return value, and metrics.

    Raises:
        SandboxViolation: If a security violation is detected.
        CodeExecutionError: If the code fails to execute.
    """
    ...

__aenter__ abstractmethod async

__aenter__() -> SandboxRuntime

Enter the sandbox context.

Source code in src/agenticapi/harness/sandbox/base.py
@abstractmethod
async def __aenter__(self) -> SandboxRuntime:
    """Enter the sandbox context."""
    ...

__aexit__ abstractmethod async

__aexit__(exc_type: Any, exc_val: Any, exc_tb: Any) -> None

Exit the sandbox context and clean up resources.

Source code in src/agenticapi/harness/sandbox/base.py
@abstractmethod
async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
    """Exit the sandbox context and clean up resources."""
    ...

ProcessSandbox

ProcessSandbox

Bases: SandboxRuntime

Subprocess-based sandbox for executing generated code.

Runs code in a separate Python subprocess with timeout enforcement. Captures stdout/stderr and measures wall-clock execution time.

This is the Phase 1 implementation. It provides process-level isolation but not kernel-level sandboxing. For production multi-tenant use, upgrade to ContainerSandbox (Phase 2).

Example

async with ProcessSandbox() as sandbox: result = await sandbox.execute( code="result = 2 + 2", tools=None, resource_limits=ResourceLimits(max_execution_time_seconds=10), ) assert result.return_value == 4

Source code in src/agenticapi/harness/sandbox/process.py
class ProcessSandbox(SandboxRuntime):
    """Subprocess-based sandbox for executing generated code.

    Runs code in a separate Python subprocess with timeout enforcement.
    Captures stdout/stderr and measures wall-clock execution time.

    This is the Phase 1 implementation. It provides process-level
    isolation but not kernel-level sandboxing. For production
    multi-tenant use, upgrade to ContainerSandbox (Phase 2).

    Example:
        async with ProcessSandbox() as sandbox:
            result = await sandbox.execute(
                code="result = 2 + 2",
                tools=None,
                resource_limits=ResourceLimits(max_execution_time_seconds=10),
            )
            assert result.return_value == 4
    """

    def __init__(self, *, resource_limits: ResourceLimits | None = None) -> None:
        """Initialize the process sandbox.

        Args:
            resource_limits: Default resource limits. Can be overridden
                per-execution in the execute() call.
        """
        self._default_limits = resource_limits or ResourceLimits()

    async def execute(
        self,
        code: str,
        tools: Any = None,
        resource_limits: ResourceLimits | None = None,
        sandbox_data: dict[str, Any] | None = None,
    ) -> SandboxResult:
        """Execute code in an isolated subprocess.

        Args:
            code: Python source code to execute.
            tools: ToolRegistry (currently unused in Phase 1).
            resource_limits: Resource limits to enforce. Falls back to
                default limits if not provided.
            sandbox_data: Optional dict of pre-fetched data to inject into
                the execution namespace as the ``data`` variable.

        Returns:
            SandboxResult with captured output and metrics.

        Raises:
            CodeExecutionError: If the code fails to execute.
            SandboxViolation: If execution times out.
        """
        limits = resource_limits or self._default_limits
        timeout = limits.max_execution_time_seconds

        # Build the wrapper script (base64-encode user code and data for safe transport)
        code_b64 = base64.b64encode(code.encode("utf-8")).decode("ascii")
        data_json = json.dumps(sandbox_data or {}, default=str)
        data_b64 = base64.b64encode(data_json.encode("utf-8")).decode("ascii")
        wrapper_code = _WRAPPER_TEMPLATE.format(code_b64=code_b64, data_b64=data_b64)

        # Write to a temporary file and execute
        start_time = time.monotonic()
        tmp_path: str | None = None

        try:
            with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as tmp:
                tmp_path = tmp.name
                tmp.write(wrapper_code)

            process = await asyncio.create_subprocess_exec(
                "python",
                tmp_path,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )

            try:
                stdout_bytes, stderr_bytes = await asyncio.wait_for(
                    process.communicate(),
                    timeout=timeout,
                )
            except TimeoutError as exc:
                process.kill()
                await process.wait()
                wall_time = (time.monotonic() - start_time) * 1000
                logger.error("sandbox_timeout", timeout=timeout, wall_time_ms=wall_time)
                raise SandboxViolation(f"Code execution timed out after {timeout} seconds") from exc

        except SandboxViolation:
            raise
        except Exception as exc:
            wall_time = (time.monotonic() - start_time) * 1000
            logger.error("sandbox_execution_failed", error=str(exc), wall_time_ms=wall_time)
            raise CodeExecutionError(f"Sandbox execution failed: {exc}") from exc
        finally:
            # Clean up temp file
            if tmp_path is not None:
                import os

                with contextlib.suppress(OSError):
                    os.unlink(tmp_path)

        wall_time = (time.monotonic() - start_time) * 1000
        stdout_str = stdout_bytes.decode("utf-8", errors="replace")
        stderr_str = stderr_bytes.decode("utf-8", errors="replace")

        # Parse the result from stdout
        output: Any = None
        return_value: Any = None

        if "__SANDBOX_RESULT__" in stdout_str:
            parts = stdout_str.split("__SANDBOX_RESULT__", 1)
            user_stdout = parts[0].rstrip("\n")
            result_json = parts[1].strip()

            try:
                parsed = json.loads(result_json)
                output = parsed.get("output")
                return_value = parsed.get("return_value")
                error = parsed.get("error")

                if error:
                    logger.warning("sandbox_code_error", error=error[:500])
                    raise CodeExecutionError(f"Code execution error:\n{error}")
            except json.JSONDecodeError as json_err:
                logger.warning(
                    "sandbox_result_json_parse_failed",
                    error=str(json_err),
                    result_preview=result_json[:200] if result_json else "",
                )
                user_stdout = stdout_str
        else:
            user_stdout = stdout_str

        # Check return code
        if process.returncode != 0 and output != "error":
            logger.warning(
                "sandbox_nonzero_exit",
                returncode=process.returncode,
                stderr=stderr_str[:500],
            )
            raise CodeExecutionError(f"Subprocess exited with code {process.returncode}: {stderr_str[:500]}")

        metrics = ResourceMetrics(
            cpu_time_ms=0.0,  # Not measurable in basic subprocess mode
            memory_peak_mb=0.0,  # Not measurable in basic subprocess mode
            wall_time_ms=wall_time,
        )

        logger.info("sandbox_execution_complete", wall_time_ms=wall_time, has_return_value=return_value is not None)

        return SandboxResult(
            output=output,
            return_value=return_value,
            metrics=metrics,
            stdout=user_stdout,
            stderr=stderr_str,
        )

    async def __aenter__(self) -> ProcessSandbox:
        """Enter the sandbox context."""
        return self

    async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
        """Exit the sandbox context."""

__init__

__init__(
    *, resource_limits: ResourceLimits | None = None
) -> None

Initialize the process sandbox.

Parameters:

Name Type Description Default
resource_limits ResourceLimits | None

Default resource limits. Can be overridden per-execution in the execute() call.

None
Source code in src/agenticapi/harness/sandbox/process.py
def __init__(self, *, resource_limits: ResourceLimits | None = None) -> None:
    """Initialize the process sandbox.

    Args:
        resource_limits: Default resource limits. Can be overridden
            per-execution in the execute() call.
    """
    self._default_limits = resource_limits or ResourceLimits()

execute async

execute(
    code: str,
    tools: Any = None,
    resource_limits: ResourceLimits | None = None,
    sandbox_data: dict[str, Any] | None = None,
) -> SandboxResult

Execute code in an isolated subprocess.

Parameters:

Name Type Description Default
code str

Python source code to execute.

required
tools Any

ToolRegistry (currently unused in Phase 1).

None
resource_limits ResourceLimits | None

Resource limits to enforce. Falls back to default limits if not provided.

None
sandbox_data dict[str, Any] | None

Optional dict of pre-fetched data to inject into the execution namespace as the data variable.

None

Returns:

Type Description
SandboxResult

SandboxResult with captured output and metrics.

Raises:

Type Description
CodeExecutionError

If the code fails to execute.

SandboxViolation

If execution times out.

Source code in src/agenticapi/harness/sandbox/process.py
async def execute(
    self,
    code: str,
    tools: Any = None,
    resource_limits: ResourceLimits | None = None,
    sandbox_data: dict[str, Any] | None = None,
) -> SandboxResult:
    """Execute code in an isolated subprocess.

    Args:
        code: Python source code to execute.
        tools: ToolRegistry (currently unused in Phase 1).
        resource_limits: Resource limits to enforce. Falls back to
            default limits if not provided.
        sandbox_data: Optional dict of pre-fetched data to inject into
            the execution namespace as the ``data`` variable.

    Returns:
        SandboxResult with captured output and metrics.

    Raises:
        CodeExecutionError: If the code fails to execute.
        SandboxViolation: If execution times out.
    """
    limits = resource_limits or self._default_limits
    timeout = limits.max_execution_time_seconds

    # Build the wrapper script (base64-encode user code and data for safe transport)
    code_b64 = base64.b64encode(code.encode("utf-8")).decode("ascii")
    data_json = json.dumps(sandbox_data or {}, default=str)
    data_b64 = base64.b64encode(data_json.encode("utf-8")).decode("ascii")
    wrapper_code = _WRAPPER_TEMPLATE.format(code_b64=code_b64, data_b64=data_b64)

    # Write to a temporary file and execute
    start_time = time.monotonic()
    tmp_path: str | None = None

    try:
        with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as tmp:
            tmp_path = tmp.name
            tmp.write(wrapper_code)

        process = await asyncio.create_subprocess_exec(
            "python",
            tmp_path,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

        try:
            stdout_bytes, stderr_bytes = await asyncio.wait_for(
                process.communicate(),
                timeout=timeout,
            )
        except TimeoutError as exc:
            process.kill()
            await process.wait()
            wall_time = (time.monotonic() - start_time) * 1000
            logger.error("sandbox_timeout", timeout=timeout, wall_time_ms=wall_time)
            raise SandboxViolation(f"Code execution timed out after {timeout} seconds") from exc

    except SandboxViolation:
        raise
    except Exception as exc:
        wall_time = (time.monotonic() - start_time) * 1000
        logger.error("sandbox_execution_failed", error=str(exc), wall_time_ms=wall_time)
        raise CodeExecutionError(f"Sandbox execution failed: {exc}") from exc
    finally:
        # Clean up temp file
        if tmp_path is not None:
            import os

            with contextlib.suppress(OSError):
                os.unlink(tmp_path)

    wall_time = (time.monotonic() - start_time) * 1000
    stdout_str = stdout_bytes.decode("utf-8", errors="replace")
    stderr_str = stderr_bytes.decode("utf-8", errors="replace")

    # Parse the result from stdout
    output: Any = None
    return_value: Any = None

    if "__SANDBOX_RESULT__" in stdout_str:
        parts = stdout_str.split("__SANDBOX_RESULT__", 1)
        user_stdout = parts[0].rstrip("\n")
        result_json = parts[1].strip()

        try:
            parsed = json.loads(result_json)
            output = parsed.get("output")
            return_value = parsed.get("return_value")
            error = parsed.get("error")

            if error:
                logger.warning("sandbox_code_error", error=error[:500])
                raise CodeExecutionError(f"Code execution error:\n{error}")
        except json.JSONDecodeError as json_err:
            logger.warning(
                "sandbox_result_json_parse_failed",
                error=str(json_err),
                result_preview=result_json[:200] if result_json else "",
            )
            user_stdout = stdout_str
    else:
        user_stdout = stdout_str

    # Check return code
    if process.returncode != 0 and output != "error":
        logger.warning(
            "sandbox_nonzero_exit",
            returncode=process.returncode,
            stderr=stderr_str[:500],
        )
        raise CodeExecutionError(f"Subprocess exited with code {process.returncode}: {stderr_str[:500]}")

    metrics = ResourceMetrics(
        cpu_time_ms=0.0,  # Not measurable in basic subprocess mode
        memory_peak_mb=0.0,  # Not measurable in basic subprocess mode
        wall_time_ms=wall_time,
    )

    logger.info("sandbox_execution_complete", wall_time_ms=wall_time, has_return_value=return_value is not None)

    return SandboxResult(
        output=output,
        return_value=return_value,
        metrics=metrics,
        stdout=user_stdout,
        stderr=stderr_str,
    )

__aenter__ async

__aenter__() -> ProcessSandbox

Enter the sandbox context.

Source code in src/agenticapi/harness/sandbox/process.py
async def __aenter__(self) -> ProcessSandbox:
    """Enter the sandbox context."""
    return self

__aexit__ async

__aexit__(exc_type: Any, exc_val: Any, exc_tb: Any) -> None

Exit the sandbox context.

Source code in src/agenticapi/harness/sandbox/process.py
async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
    """Exit the sandbox context."""

ResourceLimits

ResourceLimits dataclass

Resource limits for sandbox execution.

Attributes:

Name Type Description
max_cpu_seconds float

Maximum CPU time allowed in seconds.

max_memory_mb int

Maximum memory usage in megabytes.

max_execution_time_seconds float

Maximum wall-clock time in seconds.

Source code in src/agenticapi/harness/sandbox/base.py
@dataclass(frozen=True, slots=True)
class ResourceLimits:
    """Resource limits for sandbox execution.

    Attributes:
        max_cpu_seconds: Maximum CPU time allowed in seconds.
        max_memory_mb: Maximum memory usage in megabytes.
        max_execution_time_seconds: Maximum wall-clock time in seconds.
    """

    max_cpu_seconds: float = 30.0
    max_memory_mb: int = 512
    max_execution_time_seconds: float = 60.0

SandboxResult

SandboxResult dataclass

Result of sandbox code execution.

Attributes:

Name Type Description
output Any

The primary output of the executed code.

return_value Any

The return value of the executed code.

metrics ResourceMetrics

Resource usage metrics from execution.

stdout str

Captured standard output.

stderr str

Captured standard error.

Source code in src/agenticapi/harness/sandbox/base.py
@dataclass(slots=True)
class SandboxResult:
    """Result of sandbox code execution.

    Attributes:
        output: The primary output of the executed code.
        return_value: The return value of the executed code.
        metrics: Resource usage metrics from execution.
        stdout: Captured standard output.
        stderr: Captured standard error.
    """

    output: Any
    return_value: Any
    metrics: ResourceMetrics
    stdout: str = ""
    stderr: str = ""

ResourceMetrics

ResourceMetrics dataclass

Metrics collected during sandbox execution.

Attributes:

Name Type Description
cpu_time_ms float

CPU time consumed in milliseconds.

memory_peak_mb float

Peak memory usage in megabytes.

wall_time_ms float

Wall-clock time in milliseconds.

Source code in src/agenticapi/harness/sandbox/base.py
@dataclass(frozen=True, slots=True)
class ResourceMetrics:
    """Metrics collected during sandbox execution.

    Attributes:
        cpu_time_ms: CPU time consumed in milliseconds.
        memory_peak_mb: Peak memory usage in megabytes.
        wall_time_ms: Wall-clock time in milliseconds.
    """

    cpu_time_ms: float
    memory_peak_mb: float
    wall_time_ms: float

Static Analysis

check_code_safety

check_code_safety(
    code: str,
    *,
    allowed_modules: list[str] | None = None,
    denied_modules: list[str] | None = None,
    deny_eval_exec: bool = True,
    deny_dynamic_import: bool = True,
) -> SafetyResult

Check generated code safety using AST analysis.

Parses the code into an AST and walks all nodes to detect dangerous patterns. Returns a SafetyResult indicating whether the code is safe to execute.

Parameters:

Name Type Description Default
code str

Python source code to analyze.

required
allowed_modules list[str] | None

Whitelist of allowed modules (if provided, only these modules may be imported).

None
denied_modules list[str] | None

Blacklist of denied modules.

None
deny_eval_exec bool

Whether to flag eval()/exec() as violations.

True
deny_dynamic_import bool

Whether to flag import() as violations.

True

Returns:

Type Description
SafetyResult

SafetyResult with safe=True if no violations, or safe=False

SafetyResult

with a list of SafetyViolation objects.

Source code in src/agenticapi/harness/sandbox/static_analysis.py
def check_code_safety(
    code: str,
    *,
    allowed_modules: list[str] | None = None,
    denied_modules: list[str] | None = None,
    deny_eval_exec: bool = True,
    deny_dynamic_import: bool = True,
) -> SafetyResult:
    """Check generated code safety using AST analysis.

    Parses the code into an AST and walks all nodes to detect
    dangerous patterns. Returns a SafetyResult indicating whether
    the code is safe to execute.

    Args:
        code: Python source code to analyze.
        allowed_modules: Whitelist of allowed modules (if provided,
            only these modules may be imported).
        denied_modules: Blacklist of denied modules.
        deny_eval_exec: Whether to flag eval()/exec() as violations.
        deny_dynamic_import: Whether to flag __import__() as violations.

    Returns:
        SafetyResult with safe=True if no violations, or safe=False
        with a list of SafetyViolation objects.
    """
    violations: list[SafetyViolation] = []

    try:
        tree = ast.parse(code)
    except SyntaxError as e:
        violations.append(
            SafetyViolation(
                rule="syntax_error",
                description=f"Code has syntax error: {e}",
                line=e.lineno or 0,
                col=e.offset or 0,
                severity="error",
            )
        )
        return SafetyResult(safe=False, violations=violations)

    for node in ast.walk(tree):
        _check_imports(node, violations, allowed_modules=allowed_modules, denied_modules=denied_modules)
        _check_dangerous_calls(node, violations, deny_eval_exec=deny_eval_exec, deny_dynamic_import=deny_dynamic_import)
        _check_dangerous_builtins(node, violations)
        _check_file_io(node, violations)

    has_errors = any(v.severity == "error" for v in violations)
    return SafetyResult(safe=not has_errors, violations=violations)

SafetyResult

SafetyResult dataclass

Result of static safety analysis.

Attributes:

Name Type Description
safe bool

Whether the code passed all safety checks.

violations list[SafetyViolation]

List of violations found (empty if safe).

Source code in src/agenticapi/harness/sandbox/static_analysis.py
@dataclass(frozen=True, slots=True)
class SafetyResult:
    """Result of static safety analysis.

    Attributes:
        safe: Whether the code passed all safety checks.
        violations: List of violations found (empty if safe).
    """

    safe: bool
    violations: list[SafetyViolation] = field(default_factory=list)

SafetyViolation

SafetyViolation dataclass

A single safety violation detected by static analysis.

Attributes:

Name Type Description
rule str

Identifier for the violated rule.

description str

Human-readable description of the violation.

line int

Line number where the violation was found.

col int

Column offset where the violation was found.

severity str

Severity level ("error" or "warning").

Source code in src/agenticapi/harness/sandbox/static_analysis.py
@dataclass(frozen=True, slots=True)
class SafetyViolation:
    """A single safety violation detected by static analysis.

    Attributes:
        rule: Identifier for the violated rule.
        description: Human-readable description of the violation.
        line: Line number where the violation was found.
        col: Column offset where the violation was found.
        severity: Severity level ("error" or "warning").
    """

    rule: str
    description: str
    line: int
    col: int
    severity: str  # "error" | "warning"

Monitors

ResourceMonitor

Monitors resource usage against configured limits.

Checks that CPU time, memory, and wall time stayed within the configured resource limits.

Example

monitor = ResourceMonitor(limits=ResourceLimits(max_cpu_seconds=10)) result = await monitor.on_execution_complete(sandbox_result, code="...")

Source code in src/agenticapi/harness/sandbox/monitors.py
class ResourceMonitor:
    """Monitors resource usage against configured limits.

    Checks that CPU time, memory, and wall time stayed within
    the configured resource limits.

    Example:
        monitor = ResourceMonitor(limits=ResourceLimits(max_cpu_seconds=10))
        result = await monitor.on_execution_complete(sandbox_result, code="...")
    """

    def __init__(self, *, limits: ResourceLimits) -> None:
        """Initialize with resource limits to check against.

        Args:
            limits: The resource limits to enforce.
        """
        self._limits = limits

    async def on_execution_complete(
        self,
        result: SandboxResult,
        *,
        code: str,
    ) -> MonitorResult:
        """Check resource usage against limits.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.

        Returns:
            MonitorResult with violations if limits exceeded.
        """
        violations: list[str] = []
        warnings: list[str] = []

        metrics = result.metrics

        cpu_limit_ms = self._limits.max_cpu_seconds * 1000
        if metrics.cpu_time_ms > cpu_limit_ms:
            violations.append(f"CPU time {metrics.cpu_time_ms:.1f}ms exceeded limit of {cpu_limit_ms:.1f}ms")
        elif metrics.cpu_time_ms > cpu_limit_ms * 0.8:
            warnings.append(f"CPU time {metrics.cpu_time_ms:.1f}ms approaching limit of {cpu_limit_ms:.1f}ms")

        if metrics.memory_peak_mb > self._limits.max_memory_mb:
            violations.append(f"Memory {metrics.memory_peak_mb:.1f}MB exceeded limit of {self._limits.max_memory_mb}MB")

        wall_limit_ms = self._limits.max_execution_time_seconds * 1000
        if metrics.wall_time_ms > wall_limit_ms:
            violations.append(f"Wall time {metrics.wall_time_ms:.1f}ms exceeded limit of {wall_limit_ms:.1f}ms")

        passed = len(violations) == 0

        if not passed:
            logger.warning(
                "resource_monitor_violation",
                violations=violations,
                cpu_time_ms=metrics.cpu_time_ms,
                memory_peak_mb=metrics.memory_peak_mb,
                wall_time_ms=metrics.wall_time_ms,
            )

        return MonitorResult(passed=passed, warnings=warnings, violations=violations)

__init__

__init__(*, limits: ResourceLimits) -> None

Initialize with resource limits to check against.

Parameters:

Name Type Description Default
limits ResourceLimits

The resource limits to enforce.

required
Source code in src/agenticapi/harness/sandbox/monitors.py
def __init__(self, *, limits: ResourceLimits) -> None:
    """Initialize with resource limits to check against.

    Args:
        limits: The resource limits to enforce.
    """
    self._limits = limits

on_execution_complete async

on_execution_complete(
    result: SandboxResult, *, code: str
) -> MonitorResult

Check resource usage against limits.

Parameters:

Name Type Description Default
result SandboxResult

The sandbox execution result.

required
code str

The code that was executed.

required

Returns:

Type Description
MonitorResult

MonitorResult with violations if limits exceeded.

Source code in src/agenticapi/harness/sandbox/monitors.py
async def on_execution_complete(
    self,
    result: SandboxResult,
    *,
    code: str,
) -> MonitorResult:
    """Check resource usage against limits.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.

    Returns:
        MonitorResult with violations if limits exceeded.
    """
    violations: list[str] = []
    warnings: list[str] = []

    metrics = result.metrics

    cpu_limit_ms = self._limits.max_cpu_seconds * 1000
    if metrics.cpu_time_ms > cpu_limit_ms:
        violations.append(f"CPU time {metrics.cpu_time_ms:.1f}ms exceeded limit of {cpu_limit_ms:.1f}ms")
    elif metrics.cpu_time_ms > cpu_limit_ms * 0.8:
        warnings.append(f"CPU time {metrics.cpu_time_ms:.1f}ms approaching limit of {cpu_limit_ms:.1f}ms")

    if metrics.memory_peak_mb > self._limits.max_memory_mb:
        violations.append(f"Memory {metrics.memory_peak_mb:.1f}MB exceeded limit of {self._limits.max_memory_mb}MB")

    wall_limit_ms = self._limits.max_execution_time_seconds * 1000
    if metrics.wall_time_ms > wall_limit_ms:
        violations.append(f"Wall time {metrics.wall_time_ms:.1f}ms exceeded limit of {wall_limit_ms:.1f}ms")

    passed = len(violations) == 0

    if not passed:
        logger.warning(
            "resource_monitor_violation",
            violations=violations,
            cpu_time_ms=metrics.cpu_time_ms,
            memory_peak_mb=metrics.memory_peak_mb,
            wall_time_ms=metrics.wall_time_ms,
        )

    return MonitorResult(passed=passed, warnings=warnings, violations=violations)

OutputSizeMonitor

Monitors output size to prevent memory issues.

Checks that the combined size of stdout, stderr, and return value does not exceed a configurable limit.

Example

monitor = OutputSizeMonitor(max_output_bytes=1_000_000) result = await monitor.on_execution_complete(sandbox_result, code="...")

Source code in src/agenticapi/harness/sandbox/monitors.py
class OutputSizeMonitor:
    """Monitors output size to prevent memory issues.

    Checks that the combined size of stdout, stderr, and return value
    does not exceed a configurable limit.

    Example:
        monitor = OutputSizeMonitor(max_output_bytes=1_000_000)
        result = await monitor.on_execution_complete(sandbox_result, code="...")
    """

    def __init__(self, *, max_output_bytes: int = 1_000_000) -> None:
        """Initialize with maximum output size.

        Args:
            max_output_bytes: Maximum allowed output size in bytes.
        """
        self._max_output_bytes = max_output_bytes

    async def on_execution_complete(
        self,
        result: SandboxResult,
        *,
        code: str,
    ) -> MonitorResult:
        """Check output size against limit.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.

        Returns:
            MonitorResult with violations if output too large.
        """
        total_size = sys.getsizeof(result.stdout) + sys.getsizeof(result.stderr)

        try:
            total_size += len(json.dumps(result.return_value, default=str).encode())
        except (TypeError, ValueError):
            total_size += sys.getsizeof(result.return_value)

        violations: list[str] = []
        warnings: list[str] = []

        if total_size > self._max_output_bytes:
            violations.append(f"Output size {total_size} bytes exceeded limit of {self._max_output_bytes} bytes")
        elif total_size > self._max_output_bytes * 0.8:
            warnings.append(f"Output size {total_size} bytes approaching limit of {self._max_output_bytes} bytes")

        passed = len(violations) == 0

        if not passed:
            logger.warning("output_size_monitor_violation", total_size=total_size)

        return MonitorResult(passed=passed, warnings=warnings, violations=violations)

__init__

__init__(*, max_output_bytes: int = 1000000) -> None

Initialize with maximum output size.

Parameters:

Name Type Description Default
max_output_bytes int

Maximum allowed output size in bytes.

1000000
Source code in src/agenticapi/harness/sandbox/monitors.py
def __init__(self, *, max_output_bytes: int = 1_000_000) -> None:
    """Initialize with maximum output size.

    Args:
        max_output_bytes: Maximum allowed output size in bytes.
    """
    self._max_output_bytes = max_output_bytes

on_execution_complete async

on_execution_complete(
    result: SandboxResult, *, code: str
) -> MonitorResult

Check output size against limit.

Parameters:

Name Type Description Default
result SandboxResult

The sandbox execution result.

required
code str

The code that was executed.

required

Returns:

Type Description
MonitorResult

MonitorResult with violations if output too large.

Source code in src/agenticapi/harness/sandbox/monitors.py
async def on_execution_complete(
    self,
    result: SandboxResult,
    *,
    code: str,
) -> MonitorResult:
    """Check output size against limit.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.

    Returns:
        MonitorResult with violations if output too large.
    """
    total_size = sys.getsizeof(result.stdout) + sys.getsizeof(result.stderr)

    try:
        total_size += len(json.dumps(result.return_value, default=str).encode())
    except (TypeError, ValueError):
        total_size += sys.getsizeof(result.return_value)

    violations: list[str] = []
    warnings: list[str] = []

    if total_size > self._max_output_bytes:
        violations.append(f"Output size {total_size} bytes exceeded limit of {self._max_output_bytes} bytes")
    elif total_size > self._max_output_bytes * 0.8:
        warnings.append(f"Output size {total_size} bytes approaching limit of {self._max_output_bytes} bytes")

    passed = len(violations) == 0

    if not passed:
        logger.warning("output_size_monitor_violation", total_size=total_size)

    return MonitorResult(passed=passed, warnings=warnings, violations=violations)

Validators

OutputTypeValidator

Validates that execution output is JSON-serializable.

Ensures the return value can be safely serialized for API responses.

Example

validator = OutputTypeValidator() result = await validator.validate(sandbox_result, code="...", intent_action="read")

Source code in src/agenticapi/harness/sandbox/validators.py
class OutputTypeValidator:
    """Validates that execution output is JSON-serializable.

    Ensures the return value can be safely serialized for API responses.

    Example:
        validator = OutputTypeValidator()
        result = await validator.validate(sandbox_result, code="...", intent_action="read")
    """

    async def validate(
        self,
        result: SandboxResult,
        *,
        code: str,
        intent_action: str,
    ) -> ValidationResult:
        """Check that the return value is JSON-serializable.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.
            intent_action: The intent action type.

        Returns:
            ValidationResult with errors if output cannot be serialized.
        """
        if result.return_value is None:
            return ValidationResult(valid=True)

        try:
            json.dumps(result.return_value, default=str)
            return ValidationResult(valid=True)
        except (TypeError, ValueError, OverflowError) as exc:
            logger.warning(
                "output_type_validation_failed",
                error=str(exc),
            )
            return ValidationResult(
                valid=False,
                errors=[f"Return value is not JSON-serializable: {exc}"],
            )

validate async

validate(
    result: SandboxResult, *, code: str, intent_action: str
) -> ValidationResult

Check that the return value is JSON-serializable.

Parameters:

Name Type Description Default
result SandboxResult

The sandbox execution result.

required
code str

The code that was executed.

required
intent_action str

The intent action type.

required

Returns:

Type Description
ValidationResult

ValidationResult with errors if output cannot be serialized.

Source code in src/agenticapi/harness/sandbox/validators.py
async def validate(
    self,
    result: SandboxResult,
    *,
    code: str,
    intent_action: str,
) -> ValidationResult:
    """Check that the return value is JSON-serializable.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.
        intent_action: The intent action type.

    Returns:
        ValidationResult with errors if output cannot be serialized.
    """
    if result.return_value is None:
        return ValidationResult(valid=True)

    try:
        json.dumps(result.return_value, default=str)
        return ValidationResult(valid=True)
    except (TypeError, ValueError, OverflowError) as exc:
        logger.warning(
            "output_type_validation_failed",
            error=str(exc),
        )
        return ValidationResult(
            valid=False,
            errors=[f"Return value is not JSON-serializable: {exc}"],
        )

ReadOnlyValidator

Validates that read intents did not produce write-like output.

Checks stderr and stdout for patterns that suggest write operations occurred during what should have been a read-only operation.

Example

validator = ReadOnlyValidator() result = await validator.validate(sandbox_result, code="...", intent_action="read")

Source code in src/agenticapi/harness/sandbox/validators.py
class ReadOnlyValidator:
    """Validates that read intents did not produce write-like output.

    Checks stderr and stdout for patterns that suggest write operations
    occurred during what should have been a read-only operation.

    Example:
        validator = ReadOnlyValidator()
        result = await validator.validate(sandbox_result, code="...", intent_action="read")
    """

    WRITE_PATTERNS: ClassVar[list[str]] = [
        "INSERT INTO",
        "UPDATE ",
        "DELETE FROM",
        "DROP TABLE",
        "ALTER TABLE",
        "CREATE TABLE",
        "TRUNCATE",
    ]

    async def validate(
        self,
        result: SandboxResult,
        *,
        code: str,
        intent_action: str,
    ) -> ValidationResult:
        """Check for write patterns in read-only operations.

        Only validates when intent_action is "read". Other actions
        are allowed to have write-like output.

        Args:
            result: The sandbox execution result.
            code: The code that was executed.
            intent_action: The intent action type.

        Returns:
            ValidationResult with warnings if write patterns detected.
        """
        if intent_action != "read":
            return ValidationResult(valid=True)

        warnings: list[str] = []
        combined_output = f"{result.stdout} {result.stderr}".upper()

        for pattern in self.WRITE_PATTERNS:
            if pattern in combined_output:
                warnings.append(f"Read-only operation produced write-like output containing '{pattern}'")

        if warnings:
            logger.warning(
                "read_only_validation_warning",
                warnings=warnings,
                intent_action=intent_action,
            )

        return ValidationResult(valid=True, warnings=warnings)

validate async

validate(
    result: SandboxResult, *, code: str, intent_action: str
) -> ValidationResult

Check for write patterns in read-only operations.

Only validates when intent_action is "read". Other actions are allowed to have write-like output.

Parameters:

Name Type Description Default
result SandboxResult

The sandbox execution result.

required
code str

The code that was executed.

required
intent_action str

The intent action type.

required

Returns:

Type Description
ValidationResult

ValidationResult with warnings if write patterns detected.

Source code in src/agenticapi/harness/sandbox/validators.py
async def validate(
    self,
    result: SandboxResult,
    *,
    code: str,
    intent_action: str,
) -> ValidationResult:
    """Check for write patterns in read-only operations.

    Only validates when intent_action is "read". Other actions
    are allowed to have write-like output.

    Args:
        result: The sandbox execution result.
        code: The code that was executed.
        intent_action: The intent action type.

    Returns:
        ValidationResult with warnings if write patterns detected.
    """
    if intent_action != "read":
        return ValidationResult(valid=True)

    warnings: list[str] = []
    combined_output = f"{result.stdout} {result.stderr}".upper()

    for pattern in self.WRITE_PATTERNS:
        if pattern in combined_output:
            warnings.append(f"Read-only operation produced write-like output containing '{pattern}'")

    if warnings:
        logger.warning(
            "read_only_validation_warning",
            warnings=warnings,
            intent_action=intent_action,
        )

    return ValidationResult(valid=True, warnings=warnings)