LLM Backends¶

LLMBackend (Protocol)¶

LLMBackend ¶

Bases: Protocol

Protocol for LLM backend implementations.

Using Protocol (structural subtyping) so that third-party LLM wrapper libraries can be used without depending on AgenticAPI.

Source code in src/agenticapi/runtime/llm/base.py

@runtime_checkable
class LLMBackend(Protocol):
    """Protocol for LLM backend implementations.

    Using Protocol (structural subtyping) so that third-party LLM wrapper
    libraries can be used without depending on AgenticAPI.
    """

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt and receive a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response.
        """
        ...

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Send a prompt and receive a streaming response.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            Chunks of the response as they are generated.
        """
        ...

    @property
    def model_name(self) -> str:
        """The name of the model being used."""
        ...

model_name `property` ¶

model_name: str

The name of the model being used.

generate `async` ¶

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt and receive a complete response.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Returns:

Type	Description
`LLMResponse`	The complete LLM response.

Source code in src/agenticapi/runtime/llm/base.py

async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt and receive a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response.
    """
    ...

generate_stream `async` ¶

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Send a prompt and receive a streaming response.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Yields:

Type	Description
`AsyncIterator[LLMChunk]`	Chunks of the response as they are generated.

Source code in src/agenticapi/runtime/llm/base.py

async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Send a prompt and receive a streaming response.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        Chunks of the response as they are generated.
    """
    ...

Data Classes¶

LLMPrompt `dataclass` ¶

A complete prompt to send to an LLM backend.

Attributes:

Name	Type	Description
`system`	`str`	The system prompt instructing the LLM's behavior.
`messages`	`list[LLMMessage]`	The conversation messages.
`tools`	`list[dict[str, Any]] \| None`	Optional tool definitions for function calling.
`max_tokens`	`int`	Maximum tokens to generate.
`temperature`	`float`	Sampling temperature (0.0 = deterministic, 1.0 = creative).
`response_schema`	`dict[str, Any] \| None`	Optional JSON Schema (Pydantic-derived) the LLM must conform to. Backends translate this into the provider's native structured-output API (Anthropic `tools` + `tool_choice`, OpenAI `response_format=json_schema`, Gemini `response_schema`). When `None`, the model produces free-form text as before.
`response_schema_name`	`str \| None`	Optional descriptive name for the schema, used by some providers as the schema title.
`tool_choice`	`str \| dict[str, str] \| None`	Controls how the model selects tools. Accepted values: `"auto"` (model decides), `"required"` (must call a tool), `"none"` (never call a tool), or a dict `{"type": "tool", "name": "..."}` to force a specific tool. `None` (default) defers to the provider's default.

Source code in src/agenticapi/runtime/llm/base.py

@dataclass(frozen=True, slots=True)
class LLMPrompt:
    """A complete prompt to send to an LLM backend.

    Attributes:
        system: The system prompt instructing the LLM's behavior.
        messages: The conversation messages.
        tools: Optional tool definitions for function calling.
        max_tokens: Maximum tokens to generate.
        temperature: Sampling temperature (0.0 = deterministic, 1.0 = creative).
        response_schema: Optional JSON Schema (Pydantic-derived) the
            LLM must conform to. Backends translate this into the
            provider's native structured-output API
            (Anthropic ``tools`` + ``tool_choice``, OpenAI
            ``response_format=json_schema``, Gemini ``response_schema``).
            When ``None``, the model produces free-form text as before.
        response_schema_name: Optional descriptive name for the
            schema, used by some providers as the schema title.
        tool_choice: Controls how the model selects tools. Accepted
            values: ``"auto"`` (model decides), ``"required"`` (must
            call a tool), ``"none"`` (never call a tool), or a dict
            ``{"type": "tool", "name": "..."}`` to force a specific
            tool. ``None`` (default) defers to the provider's default.
    """

    system: str
    messages: list[LLMMessage]
    tools: list[dict[str, Any]] | None = None
    max_tokens: int = 4096
    temperature: float = 0.1
    response_schema: dict[str, Any] | None = None
    response_schema_name: str | None = None
    tool_choice: str | dict[str, str] | None = None

LLMMessage `dataclass` ¶

A single message in an LLM conversation.

Attributes:

Name	Type	Description
`role`	`str`	The role of the message sender ("system", "user", "assistant", or "tool").
`content`	`str`	The text content of the message.
`tool_call_id`	`str \| None`	Provider-supplied identifier linking a `role="tool"` result message back to the originating tool call. Required by OpenAI, used by Anthropic for `tool_result` blocks. `None` for non-tool messages.
`tool_calls`	`list[ToolCall] \| None`	Tool-call requests that the LLM emitted on an `role="assistant"` message. Stored so that backends can reconstruct the full multi-turn conversation in the provider's native format (Anthropic `tool_use` content blocks, OpenAI `tool_calls` array, Gemini `function_call` parts). `None` for non-assistant or text-only assistant messages.

Source code in src/agenticapi/runtime/llm/base.py

@dataclass(frozen=True, slots=True)
class LLMMessage:
    """A single message in an LLM conversation.

    Attributes:
        role: The role of the message sender ("system", "user",
            "assistant", or "tool").
        content: The text content of the message.
        tool_call_id: Provider-supplied identifier linking a ``role="tool"``
            result message back to the originating tool call.  Required by
            OpenAI, used by Anthropic for ``tool_result`` blocks.  ``None``
            for non-tool messages.
        tool_calls: Tool-call requests that the LLM emitted on an
            ``role="assistant"`` message.  Stored so that backends can
            reconstruct the full multi-turn conversation in the
            provider's native format (Anthropic ``tool_use`` content
            blocks, OpenAI ``tool_calls`` array, Gemini
            ``function_call`` parts).  ``None`` for non-assistant or
            text-only assistant messages.
    """

    role: str
    content: str
    tool_call_id: str | None = None
    tool_calls: list[ToolCall] | None = None

LLMMessage carries two optional fields for multi-turn tool conversations:

tool_call_id: str | None — on role="tool" messages, links back to the originating tool call. Required by OpenAI, used by Anthropic for tool_result blocks.
tool_calls: list[ToolCall] | None — on role="assistant" messages, preserves the full tool call structure so backends can reconstruct provider-native multi-turn formats.

Both fields default to None for backward compatibility.

LLMResponse `dataclass` ¶

A complete response from an LLM backend.

Attributes:

Name	Type	Description
`content`	`str`	The generated text content. Empty string when the response was a pure tool-call (no narrative text).
`reasoning`	`str \| None`	Optional chain-of-thought reasoning (if supported by model).
`confidence`	`float`	Estimated confidence in the response (0.0-1.0).
`usage`	`LLMUsage`	Token usage statistics.
`model`	`str`	The model identifier that generated this response.
`tool_calls`	`list[ToolCall]`	Phase E3 — native function-call requests from the model. Empty list when the model produced text instead of (or in addition to) calling a tool. Populated by every backend that supports function calling: Anthropic, OpenAI, Gemini, Mock.
`finish_reason`	`str \| None`	Why the model stopped generating. One of `"stop"`, `"length"`, `"tool_calls"`, `"content_filter"`, or backend-specific values. `None` for backends that don't expose this.

Source code in src/agenticapi/runtime/llm/base.py

@dataclass(frozen=True, slots=True)
class LLMResponse:
    """A complete response from an LLM backend.

    Attributes:
        content: The generated text content. Empty string when the
            response was a pure tool-call (no narrative text).
        reasoning: Optional chain-of-thought reasoning (if supported by model).
        confidence: Estimated confidence in the response (0.0-1.0).
        usage: Token usage statistics.
        model: The model identifier that generated this response.
        tool_calls: Phase E3 — native function-call requests from the
            model. Empty list when the model produced text instead of
            (or in addition to) calling a tool. Populated by every
            backend that supports function calling: Anthropic, OpenAI,
            Gemini, Mock.
        finish_reason: Why the model stopped generating. One of
            ``"stop"``, ``"length"``, ``"tool_calls"``, ``"content_filter"``,
            or backend-specific values. ``None`` for backends that
            don't expose this.
    """

    content: str
    reasoning: str | None = None
    confidence: float = 1.0
    usage: LLMUsage = field(default_factory=lambda: LLMUsage(0, 0))
    model: str = ""
    tool_calls: list[ToolCall] = field(default_factory=list)
    finish_reason: str | None = None

LLMResponse carries two fields that drive native function calling:

tool_calls: list[ToolCall] — structured function-call requests returned by the model. Empty for plain text completions.
finish_reason: str | None — why generation stopped. Typical values: "stop", "length", "tool_calls", "content_filter". None for backends that don't report it.

All four backends (Anthropic, OpenAI, Gemini, Mock) fully populate these fields. Each real backend parses its provider's native response format into ToolCall objects and maps stop reasons to normalized finish_reason values.

ToolCall `dataclass` ¶

A single native function-call request from an LLM (Phase E3).

Modern LLM APIs (Anthropic tools/tool_choice, OpenAI tools, Gemini function_declarations) emit structured function-call objects when they want a tool invoked instead of producing free-form Python code. This dataclass is the framework-agnostic representation of one such call.

The LLMBackend protocol promises to populate :attr:LLMResponse.tool_calls with one entry per requested invocation. Downstream consumers (the harness's tool-first path in Phase E4) iterate the list, validate the arguments against the registered tool's Pydantic schema, and dispatch to the tool with cost / latency / reliability all dramatically better than going through code generation + sandbox execution.

Attributes:

Name	Type	Description
`id`	`str`	Provider-supplied identifier for this call. Echoed back in the tool result so multi-call exchanges stay in sync.
`name`	`str`	The tool name the model wants to invoke. Resolved against the registered :class:`ToolRegistry`.
`arguments`	`dict[str, Any]`	The keyword arguments the model produced for the tool. Always a dict; the framework validates it through the tool's Pydantic input model before dispatching.

Source code in src/agenticapi/runtime/llm/base.py

@dataclass(frozen=True, slots=True)
class ToolCall:
    """A single native function-call request from an LLM (Phase E3).

    Modern LLM APIs (Anthropic ``tools``/``tool_choice``, OpenAI
    ``tools``, Gemini ``function_declarations``) emit structured
    function-call objects when they want a tool invoked instead of
    producing free-form Python code. This dataclass is the
    framework-agnostic representation of one such call.

    The ``LLMBackend`` protocol promises to populate
    :attr:`LLMResponse.tool_calls` with one entry per requested
    invocation. Downstream consumers (the harness's tool-first path
    in Phase E4) iterate the list, validate the arguments against
    the registered tool's Pydantic schema, and dispatch to the tool
    with cost / latency / reliability all dramatically better than
    going through code generation + sandbox execution.

    Attributes:
        id: Provider-supplied identifier for this call. Echoed back
            in the tool result so multi-call exchanges stay in sync.
        name: The tool name the model wants to invoke. Resolved
            against the registered :class:`ToolRegistry`.
        arguments: The keyword arguments the model produced for the
            tool. Always a dict; the framework validates it through
            the tool's Pydantic input model before dispatching.
    """

    id: str
    name: str
    arguments: dict[str, Any]

LLMUsage `dataclass` ¶

Token usage information from an LLM call.

Attributes:

Name	Type	Description
`input_tokens`	`int`	Number of tokens in the prompt.
`output_tokens`	`int`	Number of tokens in the response.

Source code in src/agenticapi/runtime/llm/base.py

@dataclass(frozen=True, slots=True)
class LLMUsage:
    """Token usage information from an LLM call.

    Attributes:
        input_tokens: Number of tokens in the prompt.
        output_tokens: Number of tokens in the response.
    """

    input_tokens: int
    output_tokens: int

LLMChunk `dataclass` ¶

A single chunk from a streaming LLM response.

Attributes:

Name	Type	Description
`content`	`str`	The text content of this chunk.
`is_final`	`bool`	Whether this is the last chunk in the stream.

Source code in src/agenticapi/runtime/llm/base.py

@dataclass(frozen=True, slots=True)
class LLMChunk:
    """A single chunk from a streaming LLM response.

    Attributes:
        content: The text content of this chunk.
        is_final: Whether this is the last chunk in the stream.
    """

    content: str
    is_final: bool = False

AnthropicBackend¶

AnthropicBackend ¶

LLM backend using the Anthropic API (Claude models).

Uses anthropic.AsyncAnthropic for async communication with the Anthropic API. Supports both complete and streaming generation, native function calling via tool_use content blocks, and automatic retry on transient errors.

Example

backend = AnthropicBackend(model="claude-sonnet-4-6") response = await backend.generate(prompt)

Source code in src/agenticapi/runtime/llm/anthropic.py

class AnthropicBackend:
    """LLM backend using the Anthropic API (Claude models).

    Uses anthropic.AsyncAnthropic for async communication with the
    Anthropic API. Supports both complete and streaming generation,
    native function calling via ``tool_use`` content blocks, and
    automatic retry on transient errors.

    Example:
        backend = AnthropicBackend(model="claude-sonnet-4-6")
        response = await backend.generate(prompt)
    """

    def __init__(
        self,
        *,
        model: str = "claude-sonnet-4-6",
        api_key: str | None = None,
        max_tokens: int = 4096,
        timeout: float = 120.0,
        retry: RetryConfig | None = None,
    ) -> None:
        """Initialize the Anthropic backend.

        Args:
            model: The Anthropic model identifier to use.
            api_key: Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.
            max_tokens: Default maximum tokens for generation.
            timeout: API call timeout in seconds.
            retry: Optional retry configuration for transient failures.
        """
        try:
            import anthropic
        except ImportError as exc:
            raise ImportError(
                "The 'anthropic' package is required for AnthropicBackend. Install it with: pip install anthropic"
            ) from exc

        resolved_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
        if not resolved_key:
            raise ValueError("Anthropic API key must be provided via api_key parameter or ANTHROPIC_API_KEY env var")

        self._model = model
        self._max_tokens = max_tokens
        self._timeout = timeout
        self._client = anthropic.AsyncAnthropic(api_key=resolved_key, timeout=timeout)

        if retry is None:
            self._retry = RetryConfig(
                max_retries=3,
                retryable_exceptions=(
                    anthropic.RateLimitError,
                    anthropic.APITimeoutError,
                    anthropic.InternalServerError,
                ),
            )
        else:
            self._retry = retry

    @property
    def model_name(self) -> str:
        """The name of the Anthropic model being used."""
        return self._model

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt to the Anthropic API and return a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response with content and usage statistics.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)
            message: Any = await with_retry(self._client.messages.create, self._retry, **kwargs)
            return self._build_response(message)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("anthropic_generate_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Anthropic API call failed: {exc}") from exc

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream a response from the Anthropic API.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            LLMChunk objects as response tokens are generated.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)

            async with self._client.messages.stream(**kwargs) as stream:
                async for text in stream.text_stream:
                    yield LLMChunk(content=text, is_final=False)

            yield LLMChunk(content="", is_final=True)

            logger.info("anthropic_stream_complete", model=self._model)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("anthropic_stream_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Anthropic streaming API call failed: {exc}") from exc

    def _build_request_kwargs(self, prompt: LLMPrompt) -> dict[str, Any]:
        """Build keyword arguments for the Anthropic API call.

        Translates the framework's generic message and tool formats
        into the Anthropic-specific wire format:

        - Tool definitions use ``input_schema`` (not ``parameters``).
        - Assistant messages with ``tool_calls`` become content blocks
          containing ``tool_use`` entries.
        - Tool-result messages (``role="tool"``) become ``user``
          messages with ``tool_result`` content blocks keyed by
          ``tool_use_id``.

        Args:
            prompt: The LLM prompt to convert.

        Returns:
            Dictionary of keyword arguments for messages.create().
        """
        messages: list[dict[str, Any]] = []
        for msg in prompt.messages:
            if msg.role == "system":
                continue
            if msg.role == "assistant" and msg.tool_calls:
                # Anthropic expects tool_use content blocks on assistant
                # messages that requested tool calls.
                blocks: list[dict[str, Any]] = []
                if msg.content:
                    blocks.append({"type": "text", "text": msg.content})
                for tc in msg.tool_calls:
                    blocks.append(
                        {
                            "type": "tool_use",
                            "id": tc.id,
                            "name": tc.name,
                            "input": tc.arguments,
                        }
                    )
                messages.append({"role": "assistant", "content": blocks})
            elif msg.role == "tool" and msg.tool_call_id:
                # Anthropic expects tool results as user messages with
                # tool_result content blocks.
                messages.append(
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "tool_result",
                                "tool_use_id": msg.tool_call_id,
                                "content": msg.content,
                            }
                        ],
                    }
                )
            else:
                messages.append({"role": msg.role, "content": msg.content})

        kwargs: dict[str, Any] = {
            "model": self._model,
            "max_tokens": prompt.max_tokens or self._max_tokens,
            "system": prompt.system,
            "messages": messages,
            "temperature": prompt.temperature,
        }

        if prompt.tools:
            kwargs["tools"] = [self._normalize_tool(t) for t in prompt.tools]

        if prompt.tool_choice is not None and prompt.tools:
            if isinstance(prompt.tool_choice, dict):
                kwargs["tool_choice"] = prompt.tool_choice
            elif prompt.tool_choice == "auto":
                kwargs["tool_choice"] = {"type": "auto"}
            elif prompt.tool_choice == "required":
                kwargs["tool_choice"] = {"type": "any"}
            elif prompt.tool_choice == "none":
                # Anthropic doesn't have a "none" tool_choice — omit tools.
                kwargs.pop("tools", None)

        return kwargs

    @staticmethod
    def _normalize_tool(t: dict[str, Any]) -> dict[str, Any]:
        """Normalize a tool definition to Anthropic format.

        Accepts the framework's generic format (``"parameters"`` key),
        the Anthropic format (``"input_schema"`` key), or both.
        Always produces ``{"name", "description", "input_schema"}``.
        When both keys are present, ``input_schema`` takes precedence.
        """
        return {
            "name": t["name"],
            "description": t.get("description", ""),
            "input_schema": t.get("input_schema", t.get("parameters", {})),
        }

    def _build_response(self, message: Any) -> LLMResponse:
        """Build an LLMResponse from an Anthropic message object.

        Extracts text content, tool_use blocks, finish_reason, and
        usage statistics.

        Args:
            message: The Anthropic API message object.

        Returns:
            A fully populated LLMResponse.
        """
        text_parts: list[str] = []
        tool_calls: list[ToolCall] = []

        for block in message.content:
            if hasattr(block, "text"):
                text_parts.append(block.text)
            elif getattr(block, "type", None) == "tool_use":
                tool_calls.append(
                    ToolCall(
                        id=block.id,
                        name=block.name,
                        arguments=block.input if isinstance(block.input, dict) else {},
                    )
                )

        finish_reason: str | None = None
        stop_reason = getattr(message, "stop_reason", None)
        if stop_reason == "tool_use":
            finish_reason = "tool_calls"
        elif stop_reason == "end_turn":
            finish_reason = "stop"
        elif stop_reason == "max_tokens":
            finish_reason = "length"
        elif stop_reason is not None:
            finish_reason = str(stop_reason)

        usage = LLMUsage(
            input_tokens=message.usage.input_tokens,
            output_tokens=message.usage.output_tokens,
        )

        logger.info(
            "anthropic_generate_complete",
            model=self._model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            tool_calls=len(tool_calls),
            finish_reason=finish_reason,
        )

        return LLMResponse(
            content="".join(text_parts),
            usage=usage,
            model=message.model,
            tool_calls=tool_calls,
            finish_reason=finish_reason,
        )

model_name `property` ¶

model_name: str

The name of the Anthropic model being used.

init ¶

__init__(
    *,
    model: str = "claude-sonnet-4-6",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None

Initialize the Anthropic backend.

Parameters:

Name	Type	Description	Default
`model`	`str`	The Anthropic model identifier to use.	`'claude-sonnet-4-6'`
`api_key`	`str \| None`	Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.	`None`
`max_tokens`	`int`	Default maximum tokens for generation.	`4096`
`timeout`	`float`	API call timeout in seconds.	`120.0`
`retry`	`RetryConfig \| None`	Optional retry configuration for transient failures.	`None`

Source code in src/agenticapi/runtime/llm/anthropic.py

def __init__(
    self,
    *,
    model: str = "claude-sonnet-4-6",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None:
    """Initialize the Anthropic backend.

    Args:
        model: The Anthropic model identifier to use.
        api_key: Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.
        max_tokens: Default maximum tokens for generation.
        timeout: API call timeout in seconds.
        retry: Optional retry configuration for transient failures.
    """
    try:
        import anthropic
    except ImportError as exc:
        raise ImportError(
            "The 'anthropic' package is required for AnthropicBackend. Install it with: pip install anthropic"
        ) from exc

    resolved_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
    if not resolved_key:
        raise ValueError("Anthropic API key must be provided via api_key parameter or ANTHROPIC_API_KEY env var")

    self._model = model
    self._max_tokens = max_tokens
    self._timeout = timeout
    self._client = anthropic.AsyncAnthropic(api_key=resolved_key, timeout=timeout)

    if retry is None:
        self._retry = RetryConfig(
            max_retries=3,
            retryable_exceptions=(
                anthropic.RateLimitError,
                anthropic.APITimeoutError,
                anthropic.InternalServerError,
            ),
        )
    else:
        self._retry = retry

generate `async` ¶

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt to the Anthropic API and return a complete response.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Returns:

Type	Description
`LLMResponse`	The complete LLM response with content and usage statistics.

Raises:

Type	Description
`CodeGenerationError`	If the API call fails.

Source code in src/agenticapi/runtime/llm/anthropic.py

async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt to the Anthropic API and return a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response with content and usage statistics.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)
        message: Any = await with_retry(self._client.messages.create, self._retry, **kwargs)
        return self._build_response(message)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("anthropic_generate_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Anthropic API call failed: {exc}") from exc

generate_stream `async` ¶

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream a response from the Anthropic API.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Yields:

Type	Description
`AsyncIterator[LLMChunk]`	LLMChunk objects as response tokens are generated.

Raises:

Type	Description
`CodeGenerationError`	If the API call fails.

Source code in src/agenticapi/runtime/llm/anthropic.py

async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream a response from the Anthropic API.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        LLMChunk objects as response tokens are generated.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)

        async with self._client.messages.stream(**kwargs) as stream:
            async for text in stream.text_stream:
                yield LLMChunk(content=text, is_final=False)

        yield LLMChunk(content="", is_final=True)

        logger.info("anthropic_stream_complete", model=self._model)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("anthropic_stream_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Anthropic streaming API call failed: {exc}") from exc

OpenAIBackend¶

OpenAIBackend ¶

LLM backend using the OpenAI API (GPT models).

Uses openai.AsyncOpenAI for async communication with the OpenAI API. Supports both complete and streaming generation, native function calling, and automatic retry on transient errors.

Example

backend = OpenAIBackend(model="gpt-5.4-mini") response = await backend.generate(prompt)

Source code in src/agenticapi/runtime/llm/openai.py

class OpenAIBackend:
    """LLM backend using the OpenAI API (GPT models).

    Uses openai.AsyncOpenAI for async communication with the
    OpenAI API. Supports both complete and streaming generation,
    native function calling, and automatic retry on transient errors.

    Example:
        backend = OpenAIBackend(model="gpt-5.4-mini")
        response = await backend.generate(prompt)
    """

    def __init__(
        self,
        *,
        model: str = "gpt-5.4-mini",
        api_key: str | None = None,
        max_tokens: int = 4096,
        timeout: float = 120.0,
        retry: RetryConfig | None = None,
    ) -> None:
        """Initialize the OpenAI backend.

        Args:
            model: The OpenAI model identifier to use.
            api_key: OpenAI API key. Falls back to OPENAI_API_KEY env var.
            max_tokens: Default maximum tokens for generation.
            timeout: API call timeout in seconds.
            retry: Optional retry configuration for transient failures.
        """
        try:
            import openai
        except ImportError as exc:
            raise ImportError(
                "The 'openai' package is required for OpenAIBackend. Install it with: pip install openai"
            ) from exc

        resolved_key = api_key or os.environ.get("OPENAI_API_KEY")
        if not resolved_key:
            raise ValueError("OpenAI API key must be provided via api_key parameter or OPENAI_API_KEY env var")

        self._model = model
        self._max_tokens = max_tokens
        self._timeout = timeout
        self._client = openai.AsyncOpenAI(api_key=resolved_key, timeout=timeout)

        if retry is None:
            self._retry = RetryConfig(
                max_retries=3,
                retryable_exceptions=(
                    openai.RateLimitError,
                    openai.APITimeoutError,
                ),
            )
        else:
            self._retry = retry

    @property
    def model_name(self) -> str:
        """The name of the OpenAI model being used."""
        return self._model

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt to the OpenAI API and return a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response with content and usage statistics.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)
            completion: Any = await with_retry(self._client.chat.completions.create, self._retry, **kwargs)
            return self._build_response(completion)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("openai_generate_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"OpenAI API call failed: {exc}") from exc

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream a response from the OpenAI API.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            LLMChunk objects as response tokens are generated.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)
            kwargs["stream"] = True

            stream = await self._client.chat.completions.create(**kwargs)

            async for event in stream:
                if event.choices and event.choices[0].delta.content:
                    yield LLMChunk(content=event.choices[0].delta.content, is_final=False)

            yield LLMChunk(content="", is_final=True)

            logger.info("openai_stream_complete", model=self._model)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("openai_stream_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"OpenAI streaming API call failed: {exc}") from exc

    def _build_request_kwargs(self, prompt: LLMPrompt) -> dict[str, Any]:
        """Build keyword arguments for the OpenAI API call.

        Translates the framework's generic message and tool formats
        into the OpenAI-specific wire format:

        - Tool definitions are wrapped in ``{"type": "function",
          "function": {...}}``.
        - Assistant messages with ``tool_calls`` include a ``tool_calls``
          array of ``{"id", "type", "function": {"name", "arguments"}}``
          objects.
        - Tool-result messages (``role="tool"``) include ``tool_call_id``.

        Args:
            prompt: The LLM prompt to convert.

        Returns:
            Dictionary of keyword arguments for chat.completions.create().
        """
        messages: list[dict[str, Any]] = [{"role": "developer", "content": prompt.system}]
        for msg in prompt.messages:
            if msg.role == "system":
                continue
            if msg.role == "assistant" and msg.tool_calls:
                messages.append(
                    {
                        "role": "assistant",
                        "content": msg.content or None,
                        "tool_calls": [
                            {
                                "id": tc.id,
                                "type": "function",
                                "function": {
                                    "name": tc.name,
                                    "arguments": json.dumps(tc.arguments),
                                },
                            }
                            for tc in msg.tool_calls
                        ],
                    }
                )
            elif msg.role == "tool" and msg.tool_call_id:
                messages.append(
                    {
                        "role": "tool",
                        "tool_call_id": msg.tool_call_id,
                        "content": msg.content,
                    }
                )
            else:
                messages.append({"role": msg.role, "content": msg.content})

        kwargs: dict[str, Any] = {
            "model": self._model,
            "max_completion_tokens": prompt.max_tokens or self._max_tokens,
            "messages": messages,
            "temperature": prompt.temperature,
        }

        if prompt.tools:
            kwargs["tools"] = [self._normalize_tool(t) for t in prompt.tools]

        if prompt.tool_choice is not None and prompt.tools:
            kwargs["tool_choice"] = prompt.tool_choice

        return kwargs

    @staticmethod
    def _normalize_tool(t: dict[str, Any]) -> dict[str, Any]:
        """Normalize a tool definition to OpenAI format.

        Accepts both the framework's generic format
        (``{"name", "description", "parameters"}``) and the already-
        wrapped OpenAI format (``{"type": "function", "function": {...}}``).
        """
        if t.get("type") == "function" and "function" in t:
            return t
        return {
            "type": "function",
            "function": {
                "name": t["name"],
                "description": t.get("description", ""),
                "parameters": t.get("parameters", t.get("input_schema", {})),
            },
        }

    def _build_response(self, completion: Any) -> LLMResponse:
        """Build an LLMResponse from an OpenAI completion object.

        Extracts text content, tool_calls, finish_reason, and usage.

        Args:
            completion: The OpenAI API completion object.

        Returns:
            A fully populated LLMResponse.
        """
        choice = completion.choices[0]
        content = choice.message.content or ""

        tool_calls: list[ToolCall] = []
        raw_calls = getattr(choice.message, "tool_calls", None)
        if raw_calls:
            for tc in raw_calls:
                try:
                    arguments = json.loads(tc.function.arguments) if tc.function.arguments else {}
                except (json.JSONDecodeError, TypeError):
                    arguments = {}
                tool_calls.append(
                    ToolCall(
                        id=tc.id,
                        name=tc.function.name,
                        arguments=arguments,
                    )
                )

        finish_reason: str | None = None
        raw_reason = getattr(choice, "finish_reason", None)
        if raw_reason == "tool_calls":
            finish_reason = "tool_calls"
        elif raw_reason == "stop":
            finish_reason = "stop"
        elif raw_reason == "length":
            finish_reason = "length"
        elif raw_reason == "content_filter":
            finish_reason = "content_filter"
        elif raw_reason is not None:
            finish_reason = str(raw_reason)

        usage = LLMUsage(
            input_tokens=completion.usage.prompt_tokens if completion.usage else 0,
            output_tokens=completion.usage.completion_tokens if completion.usage else 0,
        )

        logger.info(
            "openai_generate_complete",
            model=self._model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            tool_calls=len(tool_calls),
            finish_reason=finish_reason,
        )

        return LLMResponse(
            content=content,
            usage=usage,
            model=completion.model or self._model,
            tool_calls=tool_calls,
            finish_reason=finish_reason,
        )

model_name `property` ¶

model_name: str

The name of the OpenAI model being used.

init ¶

__init__(
    *,
    model: str = "gpt-5.4-mini",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None

Initialize the OpenAI backend.

Parameters:

Name	Type	Description	Default
`model`	`str`	The OpenAI model identifier to use.	`'gpt-5.4-mini'`
`api_key`	`str \| None`	OpenAI API key. Falls back to OPENAI_API_KEY env var.	`None`
`max_tokens`	`int`	Default maximum tokens for generation.	`4096`
`timeout`	`float`	API call timeout in seconds.	`120.0`
`retry`	`RetryConfig \| None`	Optional retry configuration for transient failures.	`None`

Source code in src/agenticapi/runtime/llm/openai.py

def __init__(
    self,
    *,
    model: str = "gpt-5.4-mini",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None:
    """Initialize the OpenAI backend.

    Args:
        model: The OpenAI model identifier to use.
        api_key: OpenAI API key. Falls back to OPENAI_API_KEY env var.
        max_tokens: Default maximum tokens for generation.
        timeout: API call timeout in seconds.
        retry: Optional retry configuration for transient failures.
    """
    try:
        import openai
    except ImportError as exc:
        raise ImportError(
            "The 'openai' package is required for OpenAIBackend. Install it with: pip install openai"
        ) from exc

    resolved_key = api_key or os.environ.get("OPENAI_API_KEY")
    if not resolved_key:
        raise ValueError("OpenAI API key must be provided via api_key parameter or OPENAI_API_KEY env var")

    self._model = model
    self._max_tokens = max_tokens
    self._timeout = timeout
    self._client = openai.AsyncOpenAI(api_key=resolved_key, timeout=timeout)

    if retry is None:
        self._retry = RetryConfig(
            max_retries=3,
            retryable_exceptions=(
                openai.RateLimitError,
                openai.APITimeoutError,
            ),
        )
    else:
        self._retry = retry

generate `async` ¶

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt to the OpenAI API and return a complete response.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Returns:

Type	Description
`LLMResponse`	The complete LLM response with content and usage statistics.

Raises:

Type	Description
`CodeGenerationError`	If the API call fails.

Source code in src/agenticapi/runtime/llm/openai.py

async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt to the OpenAI API and return a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response with content and usage statistics.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)
        completion: Any = await with_retry(self._client.chat.completions.create, self._retry, **kwargs)
        return self._build_response(completion)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("openai_generate_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"OpenAI API call failed: {exc}") from exc

generate_stream `async` ¶

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream a response from the OpenAI API.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Yields:

Type	Description
`AsyncIterator[LLMChunk]`	LLMChunk objects as response tokens are generated.

Raises:

Type	Description
`CodeGenerationError`	If the API call fails.

Source code in src/agenticapi/runtime/llm/openai.py

async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream a response from the OpenAI API.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        LLMChunk objects as response tokens are generated.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)
        kwargs["stream"] = True

        stream = await self._client.chat.completions.create(**kwargs)

        async for event in stream:
            if event.choices and event.choices[0].delta.content:
                yield LLMChunk(content=event.choices[0].delta.content, is_final=False)

        yield LLMChunk(content="", is_final=True)

        logger.info("openai_stream_complete", model=self._model)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("openai_stream_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"OpenAI streaming API call failed: {exc}") from exc

GeminiBackend¶

GeminiBackend ¶

LLM backend using the Google Gemini API.

Uses the google-genai SDK for async communication with the Gemini API. Supports both complete and streaming generation, native function calling, and automatic retry on transient errors.

Example

backend = GeminiBackend(model="gemini-2.5-flash") response = await backend.generate(prompt)

Source code in src/agenticapi/runtime/llm/gemini.py

class GeminiBackend:
    """LLM backend using the Google Gemini API.

    Uses the google-genai SDK for async communication with the
    Gemini API. Supports both complete and streaming generation,
    native function calling, and automatic retry on transient errors.

    Example:
        backend = GeminiBackend(model="gemini-2.5-flash")
        response = await backend.generate(prompt)
    """

    def __init__(
        self,
        *,
        model: str = "gemini-2.5-flash",
        api_key: str | None = None,
        max_tokens: int = 4096,
        timeout: float = 120.0,
        retry: RetryConfig | None = None,
    ) -> None:
        """Initialize the Gemini backend.

        Args:
            model: The Gemini model identifier to use.
            api_key: Google API key. Falls back to GOOGLE_API_KEY env var.
            max_tokens: Default maximum tokens for generation.
            timeout: API call timeout in seconds.
            retry: Optional retry configuration for transient failures.
        """
        try:
            from google import genai
        except ImportError as exc:
            raise ImportError(
                "The 'google-genai' package is required for GeminiBackend. Install it with: pip install google-genai"
            ) from exc

        resolved_key = api_key or os.environ.get("GOOGLE_API_KEY")
        if not resolved_key:
            raise ValueError("Google API key must be provided via api_key parameter or GOOGLE_API_KEY env var")

        self._model = model
        self._max_tokens = max_tokens
        self._timeout = timeout
        self._client = genai.Client(api_key=resolved_key)

        # Attempt to configure retryable exceptions from google SDK.
        retryable: tuple[type[Exception], ...] = ()
        try:
            from google.api_core.exceptions import ResourceExhausted, ServiceUnavailable  # type: ignore[import-untyped]

            retryable = (ResourceExhausted, ServiceUnavailable)
        except ImportError:
            pass

        self._retry = retry if retry is not None else RetryConfig(max_retries=3, retryable_exceptions=retryable)

    @property
    def model_name(self) -> str:
        """The name of the Gemini model being used."""
        return self._model

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt to the Gemini API and return a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response with content and usage statistics.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            config, contents = self._build_request_params(prompt)
            response: Any = await with_retry(
                self._client.aio.models.generate_content,
                self._retry,
                model=self._model,
                contents=contents,
                config=config,
            )
            return self._build_response(response)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("gemini_generate_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Gemini API call failed: {exc}") from exc

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream a response from the Gemini API.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            LLMChunk objects as response tokens are generated.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            config, contents = self._build_request_params(prompt)

            stream = await self._client.aio.models.generate_content_stream(
                model=self._model,
                contents=contents,
                config=config,
            )
            async for chunk in stream:
                if chunk.text:
                    yield LLMChunk(content=chunk.text, is_final=False)

            yield LLMChunk(content="", is_final=True)

            logger.info("gemini_stream_complete", model=self._model)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("gemini_stream_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Gemini streaming API call failed: {exc}") from exc

    def _build_request_params(self, prompt: LLMPrompt) -> tuple[Any, list[Any]]:
        """Build request parameters for the Gemini API call.

        Args:
            prompt: The LLM prompt to convert.

        Returns:
            A tuple of (GenerateContentConfig, contents list).
        """
        from google.genai import types

        config_kwargs: dict[str, Any] = {
            "system_instruction": prompt.system,
            "temperature": prompt.temperature,
            "max_output_tokens": prompt.max_tokens or self._max_tokens,
        }

        if prompt.tools:
            config_kwargs["tools"] = self._convert_tools(prompt.tools)

        if prompt.tool_choice is not None and prompt.tools:
            tool_config = self._convert_tool_choice(prompt.tool_choice)
            if tool_config is not None:
                config_kwargs["tool_config"] = tool_config

        config = types.GenerateContentConfig(**config_kwargs)

        contents: list[types.Content] = []
        for msg in prompt.messages:
            if msg.role == "system":
                continue
            if msg.role == "assistant" and msg.tool_calls:
                # Gemini represents tool calls as function_call Parts
                # on a model message.
                parts: list[Any] = []
                if msg.content:
                    parts.append(types.Part(text=msg.content))
                for tc in msg.tool_calls:
                    parts.append(types.Part(function_call=types.FunctionCall(name=tc.name, args=tc.arguments)))
                contents.append(types.Content(role="model", parts=parts))
            elif msg.role == "tool":
                # Gemini expects tool results as user messages with
                # function_response Parts.  The ``name`` field must be
                # the *function name* (e.g. "get_weather"), not the
                # provider-assigned call ID.  Look it up from the
                # preceding assistant message's tool_calls list.
                tool_name = self._resolve_tool_name(prompt.messages, msg)
                contents.append(
                    types.Content(
                        role="user",
                        parts=[
                            types.Part(
                                function_response=types.FunctionResponse(
                                    name=tool_name,
                                    response={"result": msg.content},
                                )
                            )
                        ],
                    )
                )
            else:
                role = "model" if msg.role == "assistant" else "user"
                contents.append(types.Content(role=role, parts=[types.Part(text=msg.content)]))

        return config, contents

    @staticmethod
    def _convert_tools(tools: list[dict[str, Any]]) -> list[Any]:
        """Convert framework tool definitions to Gemini format.

        Args:
            tools: Tool definitions in the framework's generic format.

        Returns:
            A list of Gemini Tool objects with function_declarations.
        """
        from google.genai import types

        declarations: list[types.FunctionDeclaration] = []
        for tool in tools:
            name = tool.get("name", "")
            description = tool.get("description", "")
            parameters = tool.get("parameters") or tool.get("input_schema") or {}

            declarations.append(
                types.FunctionDeclaration(
                    name=name,
                    description=description,
                    parameters=parameters if parameters else None,  # type: ignore[arg-type]
                )
            )

        return [types.Tool(function_declarations=declarations)]

    @staticmethod
    def _convert_tool_choice(tool_choice: str | dict[str, str]) -> Any:
        """Convert framework tool_choice to Gemini tool_config.

        Args:
            tool_choice: The tool_choice value from LLMPrompt.

        Returns:
            A Gemini ToolConfig or None.
        """
        from google.genai import types

        if isinstance(tool_choice, dict):
            # Force a specific tool.
            return types.ToolConfig(
                function_calling_config=types.FunctionCallingConfig(
                    mode="ANY",  # type: ignore[arg-type]
                    allowed_function_names=[tool_choice.get("name", "")],
                )
            )
        if tool_choice == "auto":
            return types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="AUTO"))  # type: ignore[arg-type]
        if tool_choice == "required":
            return types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="ANY"))  # type: ignore[arg-type]
        if tool_choice == "none":
            return types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="NONE"))  # type: ignore[arg-type]
        return None

    @staticmethod
    def _resolve_tool_name(messages: list[LLMMessage], tool_msg: LLMMessage) -> str:
        """Resolve the function name for a tool-result message.

        Gemini's ``FunctionResponse.name`` must be the actual function
        name (e.g. ``"get_weather"``), not the provider-assigned call
        ID.  This helper walks *backward* through the conversation to
        find the assistant message whose ``tool_calls`` list contains a
        ``ToolCall`` with a matching ``id``.

        Falls back to ``tool_call_id`` (which may still be a name in
        some use-cases) or ``"tool_result"`` if no match is found.
        """
        if tool_msg.tool_call_id:
            for prior in reversed(messages):
                if prior.role == "assistant" and prior.tool_calls:
                    for tc in prior.tool_calls:
                        if tc.id == tool_msg.tool_call_id:
                            return tc.name
        return tool_msg.tool_call_id or "tool_result"

    def _build_response(self, response: Any) -> LLMResponse:
        """Build an LLMResponse from a Gemini response object.

        Extracts text content, function_call parts, finish_reason,
        and usage statistics.

        Args:
            response: The Gemini API response object.

        Returns:
            A fully populated LLMResponse.
        """
        text_parts: list[str] = []
        tool_calls: list[ToolCall] = []

        candidates = getattr(response, "candidates", None)
        if isinstance(candidates, list) and candidates:
            parts = getattr(candidates[0].content, "parts", None) or []
            for part in parts:
                if hasattr(part, "text") and part.text:
                    text_parts.append(part.text)
                fc = getattr(part, "function_call", None)
                if fc is not None:
                    args = dict(fc.args) if fc.args else {}
                    tool_calls.append(
                        ToolCall(
                            id=str(uuid.uuid4()),
                            name=fc.name,
                            arguments=args,
                        )
                    )

        # Fallback: if candidates parsing didn't extract text, use
        # the convenience ``response.text`` property.
        if not text_parts and not tool_calls:
            fallback_text = getattr(response, "text", None)
            if fallback_text:
                text_parts.append(fallback_text)

        finish_reason: str | None = None
        if isinstance(candidates, list) and candidates:
            raw_reason = getattr(candidates[0], "finish_reason", None)
            if raw_reason is not None:
                reason_str = str(raw_reason)
                if "STOP" in reason_str:
                    finish_reason = "tool_calls" if tool_calls else "stop"
                elif "MAX_TOKENS" in reason_str:
                    finish_reason = "length"
                elif "SAFETY" in reason_str:
                    finish_reason = "content_filter"
                else:
                    finish_reason = "tool_calls" if tool_calls else "stop"

        usage_meta = getattr(response, "usage_metadata", None)
        usage = LLMUsage(
            input_tokens=(usage_meta.prompt_token_count or 0) if usage_meta else 0,
            output_tokens=(usage_meta.candidates_token_count or 0) if usage_meta else 0,
        )

        logger.info(
            "gemini_generate_complete",
            model=self._model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            tool_calls=len(tool_calls),
            finish_reason=finish_reason,
        )

        return LLMResponse(
            content="".join(text_parts),
            usage=usage,
            model=self._model,
            tool_calls=tool_calls,
            finish_reason=finish_reason,
        )

model_name `property` ¶

model_name: str

The name of the Gemini model being used.

init ¶

__init__(
    *,
    model: str = "gemini-2.5-flash",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None

Initialize the Gemini backend.

Parameters:

Name	Type	Description	Default
`model`	`str`	The Gemini model identifier to use.	`'gemini-2.5-flash'`
`api_key`	`str \| None`	Google API key. Falls back to GOOGLE_API_KEY env var.	`None`
`max_tokens`	`int`	Default maximum tokens for generation.	`4096`
`timeout`	`float`	API call timeout in seconds.	`120.0`
`retry`	`RetryConfig \| None`	Optional retry configuration for transient failures.	`None`

Source code in src/agenticapi/runtime/llm/gemini.py

def __init__(
    self,
    *,
    model: str = "gemini-2.5-flash",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None:
    """Initialize the Gemini backend.

    Args:
        model: The Gemini model identifier to use.
        api_key: Google API key. Falls back to GOOGLE_API_KEY env var.
        max_tokens: Default maximum tokens for generation.
        timeout: API call timeout in seconds.
        retry: Optional retry configuration for transient failures.
    """
    try:
        from google import genai
    except ImportError as exc:
        raise ImportError(
            "The 'google-genai' package is required for GeminiBackend. Install it with: pip install google-genai"
        ) from exc

    resolved_key = api_key or os.environ.get("GOOGLE_API_KEY")
    if not resolved_key:
        raise ValueError("Google API key must be provided via api_key parameter or GOOGLE_API_KEY env var")

    self._model = model
    self._max_tokens = max_tokens
    self._timeout = timeout
    self._client = genai.Client(api_key=resolved_key)

    # Attempt to configure retryable exceptions from google SDK.
    retryable: tuple[type[Exception], ...] = ()
    try:
        from google.api_core.exceptions import ResourceExhausted, ServiceUnavailable  # type: ignore[import-untyped]

        retryable = (ResourceExhausted, ServiceUnavailable)
    except ImportError:
        pass

    self._retry = retry if retry is not None else RetryConfig(max_retries=3, retryable_exceptions=retryable)

generate `async` ¶

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt to the Gemini API and return a complete response.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Returns:

Type	Description
`LLMResponse`	The complete LLM response with content and usage statistics.

Raises:

Type	Description
`CodeGenerationError`	If the API call fails.

Source code in src/agenticapi/runtime/llm/gemini.py

async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt to the Gemini API and return a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response with content and usage statistics.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        config, contents = self._build_request_params(prompt)
        response: Any = await with_retry(
            self._client.aio.models.generate_content,
            self._retry,
            model=self._model,
            contents=contents,
            config=config,
        )
        return self._build_response(response)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("gemini_generate_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Gemini API call failed: {exc}") from exc

generate_stream `async` ¶

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream a response from the Gemini API.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt to process.	required

Yields:

Type	Description
`AsyncIterator[LLMChunk]`	LLMChunk objects as response tokens are generated.

Raises:

Type	Description
`CodeGenerationError`	If the API call fails.

Source code in src/agenticapi/runtime/llm/gemini.py

async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream a response from the Gemini API.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        LLMChunk objects as response tokens are generated.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        config, contents = self._build_request_params(prompt)

        stream = await self._client.aio.models.generate_content_stream(
            model=self._model,
            contents=contents,
            config=config,
        )
        async for chunk in stream:
            if chunk.text:
                yield LLMChunk(content=chunk.text, is_final=False)

        yield LLMChunk(content="", is_final=True)

        logger.info("gemini_stream_complete", model=self._model)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("gemini_stream_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Gemini streaming API call failed: {exc}") from exc

MockBackend¶

MockBackend ¶

A mock LLM backend that returns pre-configured responses.

Responses are returned in FIFO order. Raises CodeGenerationError when all responses have been consumed.

Example

backend = MockBackend(responses=["SELECT COUNT() FROM orders"]) response = await backend.generate(prompt) assert response.content == "SELECT COUNT() FROM orders"

Source code in src/agenticapi/runtime/llm/mock.py

class MockBackend:
    """A mock LLM backend that returns pre-configured responses.

    Responses are returned in FIFO order. Raises CodeGenerationError
    when all responses have been consumed.

    Example:
        backend = MockBackend(responses=["SELECT COUNT(*) FROM orders"])
        response = await backend.generate(prompt)
        assert response.content == "SELECT COUNT(*) FROM orders"
    """

    def __init__(
        self,
        responses: list[str] | None = None,
        *,
        structured_responses: list[dict[str, Any]] | None = None,
        tool_call_responses: list[list[ToolCall]] | None = None,
    ) -> None:
        """Initialize the mock backend.

        Args:
            responses: List of response strings to return in order.
                Used when neither ``LLMPrompt.response_schema`` nor
                ``LLMPrompt.tools`` is set.
            structured_responses: List of pre-built dicts the backend
                returns when the prompt carries a ``response_schema``.
                Each dict is JSON-serialised into ``LLMResponse.content``
                so the consumer can parse it back into a Pydantic model.
                Falls back to a synthesised stub matching the schema's
                ``required`` fields when this list is empty.
            tool_call_responses: Phase E3 — list of pre-built tool-call
                bundles the backend returns when ``LLMPrompt.tools`` is
                set. Each entry is a list of one-or-more
                :class:`ToolCall`s representing what the model would
                emit on a single turn (most calls are length-1; a
                length-2 list represents the model batching two tool
                invocations into one response). Falls back to an empty
                tool-call list (and a synthesised text response) when
                this list is empty so existing tests stay green.
        """
        self._responses: list[str] = list(responses) if responses else []
        self._structured_responses: list[dict[str, Any]] = list(structured_responses) if structured_responses else []
        self._tool_call_responses: list[list[ToolCall]] = list(tool_call_responses) if tool_call_responses else []
        self._call_count: int = 0
        self._prompts: list[LLMPrompt] = []

    @property
    def model_name(self) -> str:
        """The name of the mock model."""
        return "mock"

    @property
    def call_count(self) -> int:
        """Number of generate calls made."""
        return self._call_count

    @property
    def prompts(self) -> list[LLMPrompt]:
        """All prompts that were sent to this backend."""
        return list(self._prompts)

    def add_response(self, response: str) -> None:
        """Add a response to the queue.

        Args:
            response: The response string to add.
        """
        self._responses.append(response)

    def add_structured_response(self, response: dict[str, Any]) -> None:
        """Add a structured (schema-conforming) response to the queue.

        Args:
            response: The dict the backend will return on the next call
                that includes a ``response_schema`` in the prompt.
        """
        self._structured_responses.append(response)

    def add_tool_call_response(self, calls: ToolCall | list[ToolCall]) -> None:
        """Queue a native function-call response for the next tools-enabled call.

        Args:
            calls: Either one :class:`ToolCall` (the common case) or a
                list representing the model batching multiple calls
                into a single response.
        """
        bundle = [calls] if isinstance(calls, ToolCall) else list(calls)
        self._tool_call_responses.append(bundle)

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Return the next pre-configured response.

        Branch order, in priority:

        1. ``prompt.tools`` set **and** a tool-call response queued →
           return an :class:`LLMResponse` with the queued
           :class:`ToolCall`s and an empty content string. This is
           the Phase E3 native-function-calling path.
        2. ``prompt.response_schema`` set → return a structured
           (JSON) response from the queue or synthesised from the
           schema. This is the D4 typed-intent path.
        3. Otherwise → return the next free-form text response.

        Args:
            prompt: The LLM prompt (recorded for later inspection).

        Returns:
            An LLMResponse with the next pre-configured content.

        Raises:
            CodeGenerationError: If no response is available for the
                requested mode.
        """
        self._prompts.append(prompt)
        self._call_count += 1

        # Phase E3: tools-enabled path. The model "wants to call a
        # function" — return the queued ToolCall bundle. Empty
        # content + finish_reason="tool_calls" mirrors what the real
        # backends emit on this path.
        if prompt.tools and self._tool_call_responses:
            calls = self._tool_call_responses.pop(0)
            return LLMResponse(
                content="",
                usage=LLMUsage(
                    input_tokens=len(prompt.system) // 4,
                    output_tokens=sum(len(json.dumps(c.arguments)) for c in calls) // 4,
                ),
                model="mock",
                tool_calls=calls,
                finish_reason="tool_calls",
            )

        # tool_choice="required" forces a tool call even when none is
        # queued — synthesise a call to the first declared tool.
        if prompt.tools and prompt.tool_choice == "required":
            first_tool = prompt.tools[0]
            synth = ToolCall(
                id="mock_required_0",
                name=first_tool.get("name", "unknown"),
                arguments={},
            )
            return LLMResponse(
                content="",
                usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=10),
                model="mock",
                tool_calls=[synth],
                finish_reason="tool_calls",
            )

        if prompt.response_schema is not None:
            payload: dict[str, Any]
            if self._structured_responses:
                payload = self._structured_responses.pop(0)
            else:
                payload = _synthesise_from_schema(prompt.response_schema)
            content = json.dumps(payload)
            return LLMResponse(
                content=content,
                usage=LLMUsage(
                    input_tokens=len(prompt.system) // 4,
                    output_tokens=len(content) // 4,
                ),
                model="mock",
                finish_reason="stop",
            )

        if not self._responses:
            raise CodeGenerationError("MockBackend: no more responses available")

        content = self._responses.pop(0)
        return LLMResponse(
            content=content,
            usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=len(content) // 4),
            model="mock",
        )

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream the next pre-configured response in chunks.

        Splits the response content into word-level chunks for realistic
        streaming simulation.

        Args:
            prompt: The LLM prompt (recorded for later inspection).

        Yields:
            LLMChunk objects, with the final chunk having is_final=True.

        Raises:
            CodeGenerationError: If all responses have been consumed.
        """
        response = await self.generate(prompt)
        words = response.content.split(" ")
        for i, word in enumerate(words):
            is_last = i == len(words) - 1
            chunk_content = word if is_last else word + " "
            yield LLMChunk(content=chunk_content, is_final=is_last)

model_name `property` ¶

model_name: str

The name of the mock model.

call_count `property` ¶

call_count: int

Number of generate calls made.

prompts `property` ¶

prompts: list[LLMPrompt]

All prompts that were sent to this backend.

init ¶

__init__(
    responses: list[str] | None = None,
    *,
    structured_responses: list[dict[str, Any]]
    | None = None,
    tool_call_responses: list[list[ToolCall]] | None = None,
) -> None

Initialize the mock backend.

Parameters:

Name	Type	Description	Default
`responses`	`list[str] \| None`	List of response strings to return in order. Used when neither `LLMPrompt.response_schema` nor `LLMPrompt.tools` is set.	`None`
`structured_responses`	`list[dict[str, Any]] \| None`	List of pre-built dicts the backend returns when the prompt carries a `response_schema`. Each dict is JSON-serialised into `LLMResponse.content` so the consumer can parse it back into a Pydantic model. Falls back to a synthesised stub matching the schema's `required` fields when this list is empty.	`None`
`tool_call_responses`	`list[list[ToolCall]] \| None`	Phase E3 — list of pre-built tool-call bundles the backend returns when `LLMPrompt.tools` is set. Each entry is a list of one-or-more :class:`ToolCall`s representing what the model would emit on a single turn (most calls are length-1; a length-2 list represents the model batching two tool invocations into one response). Falls back to an empty tool-call list (and a synthesised text response) when this list is empty so existing tests stay green.	`None`

Source code in src/agenticapi/runtime/llm/mock.py

def __init__(
    self,
    responses: list[str] | None = None,
    *,
    structured_responses: list[dict[str, Any]] | None = None,
    tool_call_responses: list[list[ToolCall]] | None = None,
) -> None:
    """Initialize the mock backend.

    Args:
        responses: List of response strings to return in order.
            Used when neither ``LLMPrompt.response_schema`` nor
            ``LLMPrompt.tools`` is set.
        structured_responses: List of pre-built dicts the backend
            returns when the prompt carries a ``response_schema``.
            Each dict is JSON-serialised into ``LLMResponse.content``
            so the consumer can parse it back into a Pydantic model.
            Falls back to a synthesised stub matching the schema's
            ``required`` fields when this list is empty.
        tool_call_responses: Phase E3 — list of pre-built tool-call
            bundles the backend returns when ``LLMPrompt.tools`` is
            set. Each entry is a list of one-or-more
            :class:`ToolCall`s representing what the model would
            emit on a single turn (most calls are length-1; a
            length-2 list represents the model batching two tool
            invocations into one response). Falls back to an empty
            tool-call list (and a synthesised text response) when
            this list is empty so existing tests stay green.
    """
    self._responses: list[str] = list(responses) if responses else []
    self._structured_responses: list[dict[str, Any]] = list(structured_responses) if structured_responses else []
    self._tool_call_responses: list[list[ToolCall]] = list(tool_call_responses) if tool_call_responses else []
    self._call_count: int = 0
    self._prompts: list[LLMPrompt] = []

add_response ¶

add_response(response: str) -> None

Add a response to the queue.

Parameters:

Name	Type	Description	Default
`response`	`str`	The response string to add.	required

Source code in src/agenticapi/runtime/llm/mock.py

def add_response(self, response: str) -> None:
    """Add a response to the queue.

    Args:
        response: The response string to add.
    """
    self._responses.append(response)

add_structured_response ¶

add_structured_response(response: dict[str, Any]) -> None

Add a structured (schema-conforming) response to the queue.

Parameters:

Name	Type	Description	Default
`response`	`dict[str, Any]`	The dict the backend will return on the next call that includes a `response_schema` in the prompt.	required

Source code in src/agenticapi/runtime/llm/mock.py

def add_structured_response(self, response: dict[str, Any]) -> None:
    """Add a structured (schema-conforming) response to the queue.

    Args:
        response: The dict the backend will return on the next call
            that includes a ``response_schema`` in the prompt.
    """
    self._structured_responses.append(response)

add_tool_call_response ¶

add_tool_call_response(
    calls: ToolCall | list[ToolCall],
) -> None

Queue a native function-call response for the next tools-enabled call.

Parameters:

Name	Type	Description	Default
`calls`	`ToolCall \| list[ToolCall]`	Either one :class:`ToolCall` (the common case) or a list representing the model batching multiple calls into a single response.	required

Source code in src/agenticapi/runtime/llm/mock.py

def add_tool_call_response(self, calls: ToolCall | list[ToolCall]) -> None:
    """Queue a native function-call response for the next tools-enabled call.

    Args:
        calls: Either one :class:`ToolCall` (the common case) or a
            list representing the model batching multiple calls
            into a single response.
    """
    bundle = [calls] if isinstance(calls, ToolCall) else list(calls)
    self._tool_call_responses.append(bundle)

generate `async` ¶

generate(prompt: LLMPrompt) -> LLMResponse

Return the next pre-configured response.

Branch order, in priority:

prompt.tools set and a tool-call response queued → return an :class:LLMResponse with the queued :class:ToolCalls and an empty content string. This is the Phase E3 native-function-calling path.
prompt.response_schema set → return a structured (JSON) response from the queue or synthesised from the schema. This is the D4 typed-intent path.
Otherwise → return the next free-form text response.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt (recorded for later inspection).	required

Returns:

Type	Description
`LLMResponse`	An LLMResponse with the next pre-configured content.

Raises:

Type	Description
`CodeGenerationError`	If no response is available for the requested mode.

Source code in src/agenticapi/runtime/llm/mock.py

async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Return the next pre-configured response.

    Branch order, in priority:

    1. ``prompt.tools`` set **and** a tool-call response queued →
       return an :class:`LLMResponse` with the queued
       :class:`ToolCall`s and an empty content string. This is
       the Phase E3 native-function-calling path.
    2. ``prompt.response_schema`` set → return a structured
       (JSON) response from the queue or synthesised from the
       schema. This is the D4 typed-intent path.
    3. Otherwise → return the next free-form text response.

    Args:
        prompt: The LLM prompt (recorded for later inspection).

    Returns:
        An LLMResponse with the next pre-configured content.

    Raises:
        CodeGenerationError: If no response is available for the
            requested mode.
    """
    self._prompts.append(prompt)
    self._call_count += 1

    # Phase E3: tools-enabled path. The model "wants to call a
    # function" — return the queued ToolCall bundle. Empty
    # content + finish_reason="tool_calls" mirrors what the real
    # backends emit on this path.
    if prompt.tools and self._tool_call_responses:
        calls = self._tool_call_responses.pop(0)
        return LLMResponse(
            content="",
            usage=LLMUsage(
                input_tokens=len(prompt.system) // 4,
                output_tokens=sum(len(json.dumps(c.arguments)) for c in calls) // 4,
            ),
            model="mock",
            tool_calls=calls,
            finish_reason="tool_calls",
        )

    # tool_choice="required" forces a tool call even when none is
    # queued — synthesise a call to the first declared tool.
    if prompt.tools and prompt.tool_choice == "required":
        first_tool = prompt.tools[0]
        synth = ToolCall(
            id="mock_required_0",
            name=first_tool.get("name", "unknown"),
            arguments={},
        )
        return LLMResponse(
            content="",
            usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=10),
            model="mock",
            tool_calls=[synth],
            finish_reason="tool_calls",
        )

    if prompt.response_schema is not None:
        payload: dict[str, Any]
        if self._structured_responses:
            payload = self._structured_responses.pop(0)
        else:
            payload = _synthesise_from_schema(prompt.response_schema)
        content = json.dumps(payload)
        return LLMResponse(
            content=content,
            usage=LLMUsage(
                input_tokens=len(prompt.system) // 4,
                output_tokens=len(content) // 4,
            ),
            model="mock",
            finish_reason="stop",
        )

    if not self._responses:
        raise CodeGenerationError("MockBackend: no more responses available")

    content = self._responses.pop(0)
    return LLMResponse(
        content=content,
        usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=len(content) // 4),
        model="mock",
    )

generate_stream `async` ¶

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream the next pre-configured response in chunks.

Splits the response content into word-level chunks for realistic streaming simulation.

Parameters:

Name	Type	Description	Default
`prompt`	`LLMPrompt`	The LLM prompt (recorded for later inspection).	required

Yields:

Type	Description
`AsyncIterator[LLMChunk]`	LLMChunk objects, with the final chunk having is_final=True.

Raises:

Type	Description
`CodeGenerationError`	If all responses have been consumed.

Source code in src/agenticapi/runtime/llm/mock.py

async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream the next pre-configured response in chunks.

    Splits the response content into word-level chunks for realistic
    streaming simulation.

    Args:
        prompt: The LLM prompt (recorded for later inspection).

    Yields:
        LLMChunk objects, with the final chunk having is_final=True.

    Raises:
        CodeGenerationError: If all responses have been consumed.
    """
    response = await self.generate(prompt)
    words = response.content.split(" ")
    for i, word in enumerate(words):
        is_last = i == len(words) - 1
        chunk_content = word if is_last else word + " "
        yield LLMChunk(content=chunk_content, is_final=is_last)

RetryConfig¶

RetryConfig `dataclass` ¶

Configuration for LLM call retries.

Attributes:

Name	Type	Description
`max_retries`	`int`	Maximum number of retry attempts (0 = no retries).
`base_delay_seconds`	`float`	Initial delay before the first retry.
`max_delay_seconds`	`float`	Upper bound on delay between retries.
`jitter`	`bool`	Whether to add random jitter to the delay.
`retryable_exceptions`	`tuple[type[Exception], ...]`	Exception types that trigger a retry.

Source code in src/agenticapi/runtime/llm/retry.py

@dataclass(frozen=True, slots=True)
class RetryConfig:
    """Configuration for LLM call retries.

    Attributes:
        max_retries: Maximum number of retry attempts (0 = no retries).
        base_delay_seconds: Initial delay before the first retry.
        max_delay_seconds: Upper bound on delay between retries.
        jitter: Whether to add random jitter to the delay.
        retryable_exceptions: Exception types that trigger a retry.
    """

    max_retries: int = 3
    base_delay_seconds: float = 1.0
    max_delay_seconds: float = 30.0
    jitter: bool = True
    retryable_exceptions: tuple[type[Exception], ...] = field(default_factory=tuple)

CodeGenerator¶

CodeGenerator ¶

Generates Python code from intents using an LLM backend.

Uses the LLM to convert natural language intents into executable Python code, scoped to the available tools. The generated code is extracted from the LLM response and returned for harness evaluation.

Example

generator = CodeGenerator(llm=backend, tools=registry) result = await generator.generate( intent_raw="Show me order count", intent_action="read", intent_domain="order", intent_parameters={}, context=agent_context, ) print(result.code)

Source code in src/agenticapi/runtime/code_generator.py

class CodeGenerator:
    """Generates Python code from intents using an LLM backend.

    Uses the LLM to convert natural language intents into executable
    Python code, scoped to the available tools. The generated code
    is extracted from the LLM response and returned for harness evaluation.

    Example:
        generator = CodeGenerator(llm=backend, tools=registry)
        result = await generator.generate(
            intent_raw="Show me order count",
            intent_action="read",
            intent_domain="order",
            intent_parameters={},
            context=agent_context,
        )
        print(result.code)
    """

    def __init__(
        self,
        *,
        llm: LLMBackend,
        tools: ToolRegistry | None = None,
    ) -> None:
        """Initialize the code generator.

        Args:
            llm: The LLM backend to use for code generation.
            tools: Optional tool registry defining available tools.
        """
        self._llm = llm
        self._tools = tools or ToolRegistry()

    async def generate(
        self,
        *,
        intent_raw: str,
        intent_action: str,
        intent_domain: str,
        intent_parameters: dict[str, Any],
        context: AgentContext,
        sandbox_data: dict[str, object] | None = None,
    ) -> GeneratedCode:
        """Generate Python code from an intent.

        Builds a prompt from the intent and context, sends it to the LLM,
        and extracts the generated code from the response.

        Args:
            intent_raw: The original natural language request.
            intent_action: The classified action type.
            intent_domain: The domain of the request.
            intent_parameters: Extracted parameters from the intent.
            context: The agent execution context.
            sandbox_data: Pre-fetched tool data to include in the prompt
                so the LLM knows the data schema.

        Returns:
            GeneratedCode containing the extracted code and metadata.

        Raises:
            CodeGenerationError: If code generation or extraction fails.
        """
        tool_definitions = self._tools.get_definitions()
        context_str = context.context_window.build()

        prompt = build_code_generation_prompt(
            intent_raw=intent_raw,
            intent_action=intent_action,
            intent_domain=intent_domain,
            intent_parameters=intent_parameters,
            tool_definitions=tool_definitions,
            context=context_str,
            sandbox_data=sandbox_data,
        )

        logger.info(
            "code_generation_started",
            trace_id=context.trace_id,
            intent_action=intent_action,
            intent_domain=intent_domain,
            tool_count=len(tool_definitions),
        )

        from agenticapi.observability import (
            AgenticAPIAttributes,
            GenAIAttributes,
            SpanNames,
            get_tracer,
        )

        tracer = get_tracer()
        with tracer.start_as_current_span(SpanNames.CODE_GENERATE.value) as gen_span:
            gen_span.set_attribute(AgenticAPIAttributes.INTENT_ACTION.value, intent_action)
            gen_span.set_attribute(AgenticAPIAttributes.INTENT_DOMAIN.value, intent_domain)
            with tracer.start_as_current_span(SpanNames.GEN_AI_CHAT.value) as llm_span:
                llm_span.set_attribute(GenAIAttributes.OPERATION_NAME.value, "code_generate")
                llm_span.set_attribute(GenAIAttributes.REQUEST_MODEL.value, self._llm.model_name)
                llm_span.set_attribute(GenAIAttributes.REQUEST_MAX_TOKENS.value, prompt.max_tokens)
                llm_span.set_attribute(GenAIAttributes.REQUEST_TEMPERATURE.value, prompt.temperature)
                try:
                    response = await self._llm.generate(prompt)
                except CodeGenerationError as exc:
                    llm_span.record_exception(exc)
                    raise
                except Exception as exc:
                    logger.error("code_generation_llm_failed", trace_id=context.trace_id, error=str(exc))
                    llm_span.record_exception(exc)
                    raise CodeGenerationError(f"LLM call failed during code generation: {exc}") from exc

                llm_span.set_attribute(GenAIAttributes.RESPONSE_MODEL.value, response.model)
                llm_span.set_attribute(GenAIAttributes.USAGE_INPUT_TOKENS.value, response.usage.input_tokens)
                llm_span.set_attribute(GenAIAttributes.USAGE_OUTPUT_TOKENS.value, response.usage.output_tokens)

            code = _extract_code(response.content or "")
            if not code.strip():
                logger.error(
                    "code_generation_empty",
                    trace_id=context.trace_id,
                    raw_response=(response.content or "")[:200],
                )
                raise CodeGenerationError("LLM returned empty code")

            gen_span.set_attribute(AgenticAPIAttributes.CODE_LINES.value, code.count("\n") + 1)

            logger.info(
                "code_generation_complete",
                trace_id=context.trace_id,
                code_lines=code.count("\n") + 1,
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
            )

            return GeneratedCode(
                code=code,
                reasoning=response.reasoning,
                confidence=response.confidence,
                usage=response.usage,
            )

init ¶

__init__(
    *, llm: LLMBackend, tools: ToolRegistry | None = None
) -> None

Initialize the code generator.

Parameters:

Name	Type	Description	Default
`llm`	`LLMBackend`	The LLM backend to use for code generation.	required
`tools`	`ToolRegistry \| None`	Optional tool registry defining available tools.	`None`

Source code in src/agenticapi/runtime/code_generator.py

def __init__(
    self,
    *,
    llm: LLMBackend,
    tools: ToolRegistry | None = None,
) -> None:
    """Initialize the code generator.

    Args:
        llm: The LLM backend to use for code generation.
        tools: Optional tool registry defining available tools.
    """
    self._llm = llm
    self._tools = tools or ToolRegistry()

generate `async` ¶

generate(
    *,
    intent_raw: str,
    intent_action: str,
    intent_domain: str,
    intent_parameters: dict[str, Any],
    context: AgentContext,
    sandbox_data: dict[str, object] | None = None,
) -> GeneratedCode

Generate Python code from an intent.

Builds a prompt from the intent and context, sends it to the LLM, and extracts the generated code from the response.

Parameters:

Name	Type	Description	Default
`intent_raw`	`str`	The original natural language request.	required
`intent_action`	`str`	The classified action type.	required
`intent_domain`	`str`	The domain of the request.	required
`intent_parameters`	`dict[str, Any]`	Extracted parameters from the intent.	required
`context`	`AgentContext`	The agent execution context.	required
`sandbox_data`	`dict[str, object] \| None`	Pre-fetched tool data to include in the prompt so the LLM knows the data schema.	`None`

Returns:

Type	Description
`GeneratedCode`	GeneratedCode containing the extracted code and metadata.

Raises:

Type	Description
`CodeGenerationError`	If code generation or extraction fails.

Source code in src/agenticapi/runtime/code_generator.py

async def generate(
    self,
    *,
    intent_raw: str,
    intent_action: str,
    intent_domain: str,
    intent_parameters: dict[str, Any],
    context: AgentContext,
    sandbox_data: dict[str, object] | None = None,
) -> GeneratedCode:
    """Generate Python code from an intent.

    Builds a prompt from the intent and context, sends it to the LLM,
    and extracts the generated code from the response.

    Args:
        intent_raw: The original natural language request.
        intent_action: The classified action type.
        intent_domain: The domain of the request.
        intent_parameters: Extracted parameters from the intent.
        context: The agent execution context.
        sandbox_data: Pre-fetched tool data to include in the prompt
            so the LLM knows the data schema.

    Returns:
        GeneratedCode containing the extracted code and metadata.

    Raises:
        CodeGenerationError: If code generation or extraction fails.
    """
    tool_definitions = self._tools.get_definitions()
    context_str = context.context_window.build()

    prompt = build_code_generation_prompt(
        intent_raw=intent_raw,
        intent_action=intent_action,
        intent_domain=intent_domain,
        intent_parameters=intent_parameters,
        tool_definitions=tool_definitions,
        context=context_str,
        sandbox_data=sandbox_data,
    )

    logger.info(
        "code_generation_started",
        trace_id=context.trace_id,
        intent_action=intent_action,
        intent_domain=intent_domain,
        tool_count=len(tool_definitions),
    )

    from agenticapi.observability import (
        AgenticAPIAttributes,
        GenAIAttributes,
        SpanNames,
        get_tracer,
    )

    tracer = get_tracer()
    with tracer.start_as_current_span(SpanNames.CODE_GENERATE.value) as gen_span:
        gen_span.set_attribute(AgenticAPIAttributes.INTENT_ACTION.value, intent_action)
        gen_span.set_attribute(AgenticAPIAttributes.INTENT_DOMAIN.value, intent_domain)
        with tracer.start_as_current_span(SpanNames.GEN_AI_CHAT.value) as llm_span:
            llm_span.set_attribute(GenAIAttributes.OPERATION_NAME.value, "code_generate")
            llm_span.set_attribute(GenAIAttributes.REQUEST_MODEL.value, self._llm.model_name)
            llm_span.set_attribute(GenAIAttributes.REQUEST_MAX_TOKENS.value, prompt.max_tokens)
            llm_span.set_attribute(GenAIAttributes.REQUEST_TEMPERATURE.value, prompt.temperature)
            try:
                response = await self._llm.generate(prompt)
            except CodeGenerationError as exc:
                llm_span.record_exception(exc)
                raise
            except Exception as exc:
                logger.error("code_generation_llm_failed", trace_id=context.trace_id, error=str(exc))
                llm_span.record_exception(exc)
                raise CodeGenerationError(f"LLM call failed during code generation: {exc}") from exc

            llm_span.set_attribute(GenAIAttributes.RESPONSE_MODEL.value, response.model)
            llm_span.set_attribute(GenAIAttributes.USAGE_INPUT_TOKENS.value, response.usage.input_tokens)
            llm_span.set_attribute(GenAIAttributes.USAGE_OUTPUT_TOKENS.value, response.usage.output_tokens)

        code = _extract_code(response.content or "")
        if not code.strip():
            logger.error(
                "code_generation_empty",
                trace_id=context.trace_id,
                raw_response=(response.content or "")[:200],
            )
            raise CodeGenerationError("LLM returned empty code")

        gen_span.set_attribute(AgenticAPIAttributes.CODE_LINES.value, code.count("\n") + 1)

        logger.info(
            "code_generation_complete",
            trace_id=context.trace_id,
            code_lines=code.count("\n") + 1,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
        )

        return GeneratedCode(
            code=code,
            reasoning=response.reasoning,
            confidence=response.confidence,
            usage=response.usage,
        )

LLM Backends¶

LLMBackend (Protocol)¶

LLMBackend ¶

model_name property ¶

generate async ¶

generate_stream async ¶

Data Classes¶

LLMPrompt dataclass ¶

LLMMessage dataclass ¶

LLMResponse dataclass ¶

ToolCall dataclass ¶

LLMUsage dataclass ¶

LLMChunk dataclass ¶

AnthropicBackend¶

AnthropicBackend ¶

model_name property ¶

__init__ ¶

generate async ¶

generate_stream async ¶

OpenAIBackend¶

OpenAIBackend ¶

model_name property ¶

__init__ ¶

generate async ¶

generate_stream async ¶

GeminiBackend¶

GeminiBackend ¶

model_name property ¶

__init__ ¶

generate async ¶

generate_stream async ¶

MockBackend¶

MockBackend ¶

model_name property ¶

call_count property ¶

prompts property ¶

__init__ ¶

add_response ¶

add_structured_response ¶

add_tool_call_response ¶

generate async ¶

generate_stream async ¶

RetryConfig¶

RetryConfig dataclass ¶

CodeGenerator¶

CodeGenerator ¶

__init__ ¶

generate async ¶

model_name `property` ¶

generate `async` ¶

generate_stream `async` ¶

LLMPrompt `dataclass` ¶

LLMMessage `dataclass` ¶

LLMResponse `dataclass` ¶

ToolCall `dataclass` ¶

LLMUsage `dataclass` ¶

LLMChunk `dataclass` ¶

model_name `property` ¶

init ¶

generate `async` ¶

generate_stream `async` ¶

model_name `property` ¶

init ¶

generate `async` ¶

generate_stream `async` ¶

model_name `property` ¶

init ¶

generate `async` ¶

generate_stream `async` ¶

model_name `property` ¶

call_count `property` ¶

prompts `property` ¶

init ¶

generate `async` ¶

generate_stream `async` ¶

RetryConfig `dataclass` ¶

init ¶

generate `async` ¶