Skip to content

LLM Backends

LLMBackend (Protocol)

LLMBackend

Bases: Protocol

Protocol for LLM backend implementations.

Using Protocol (structural subtyping) so that third-party LLM wrapper libraries can be used without depending on AgenticAPI.

Source code in src/agenticapi/runtime/llm/base.py
@runtime_checkable
class LLMBackend(Protocol):
    """Protocol for LLM backend implementations.

    Using Protocol (structural subtyping) so that third-party LLM wrapper
    libraries can be used without depending on AgenticAPI.
    """

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt and receive a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response.
        """
        ...

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Send a prompt and receive a streaming response.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            Chunks of the response as they are generated.
        """
        ...

    @property
    def model_name(self) -> str:
        """The name of the model being used."""
        ...

model_name property

model_name: str

The name of the model being used.

generate async

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt and receive a complete response.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Returns:

Type Description
LLMResponse

The complete LLM response.

Source code in src/agenticapi/runtime/llm/base.py
async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt and receive a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response.
    """
    ...

generate_stream async

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Send a prompt and receive a streaming response.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Yields:

Type Description
AsyncIterator[LLMChunk]

Chunks of the response as they are generated.

Source code in src/agenticapi/runtime/llm/base.py
async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Send a prompt and receive a streaming response.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        Chunks of the response as they are generated.
    """
    ...

Data Classes

LLMPrompt dataclass

A complete prompt to send to an LLM backend.

Attributes:

Name Type Description
system str

The system prompt instructing the LLM's behavior.

messages list[LLMMessage]

The conversation messages.

tools list[dict[str, Any]] | None

Optional tool definitions for function calling.

max_tokens int

Maximum tokens to generate.

temperature float

Sampling temperature (0.0 = deterministic, 1.0 = creative).

response_schema dict[str, Any] | None

Optional JSON Schema (Pydantic-derived) the LLM must conform to. Backends translate this into the provider's native structured-output API (Anthropic tools + tool_choice, OpenAI response_format=json_schema, Gemini response_schema). When None, the model produces free-form text as before.

response_schema_name str | None

Optional descriptive name for the schema, used by some providers as the schema title.

tool_choice str | dict[str, str] | None

Controls how the model selects tools. Accepted values: "auto" (model decides), "required" (must call a tool), "none" (never call a tool), or a dict {"type": "tool", "name": "..."} to force a specific tool. None (default) defers to the provider's default.

Source code in src/agenticapi/runtime/llm/base.py
@dataclass(frozen=True, slots=True)
class LLMPrompt:
    """A complete prompt to send to an LLM backend.

    Attributes:
        system: The system prompt instructing the LLM's behavior.
        messages: The conversation messages.
        tools: Optional tool definitions for function calling.
        max_tokens: Maximum tokens to generate.
        temperature: Sampling temperature (0.0 = deterministic, 1.0 = creative).
        response_schema: Optional JSON Schema (Pydantic-derived) the
            LLM must conform to. Backends translate this into the
            provider's native structured-output API
            (Anthropic ``tools`` + ``tool_choice``, OpenAI
            ``response_format=json_schema``, Gemini ``response_schema``).
            When ``None``, the model produces free-form text as before.
        response_schema_name: Optional descriptive name for the
            schema, used by some providers as the schema title.
        tool_choice: Controls how the model selects tools. Accepted
            values: ``"auto"`` (model decides), ``"required"`` (must
            call a tool), ``"none"`` (never call a tool), or a dict
            ``{"type": "tool", "name": "..."}`` to force a specific
            tool. ``None`` (default) defers to the provider's default.
    """

    system: str
    messages: list[LLMMessage]
    tools: list[dict[str, Any]] | None = None
    max_tokens: int = 4096
    temperature: float = 0.1
    response_schema: dict[str, Any] | None = None
    response_schema_name: str | None = None
    tool_choice: str | dict[str, str] | None = None

LLMMessage dataclass

A single message in an LLM conversation.

Attributes:

Name Type Description
role str

The role of the message sender ("system", "user", "assistant", or "tool").

content str

The text content of the message.

tool_call_id str | None

Provider-supplied identifier linking a role="tool" result message back to the originating tool call. Required by OpenAI, used by Anthropic for tool_result blocks. None for non-tool messages.

tool_calls list[ToolCall] | None

Tool-call requests that the LLM emitted on an role="assistant" message. Stored so that backends can reconstruct the full multi-turn conversation in the provider's native format (Anthropic tool_use content blocks, OpenAI tool_calls array, Gemini function_call parts). None for non-assistant or text-only assistant messages.

Source code in src/agenticapi/runtime/llm/base.py
@dataclass(frozen=True, slots=True)
class LLMMessage:
    """A single message in an LLM conversation.

    Attributes:
        role: The role of the message sender ("system", "user",
            "assistant", or "tool").
        content: The text content of the message.
        tool_call_id: Provider-supplied identifier linking a ``role="tool"``
            result message back to the originating tool call.  Required by
            OpenAI, used by Anthropic for ``tool_result`` blocks.  ``None``
            for non-tool messages.
        tool_calls: Tool-call requests that the LLM emitted on an
            ``role="assistant"`` message.  Stored so that backends can
            reconstruct the full multi-turn conversation in the
            provider's native format (Anthropic ``tool_use`` content
            blocks, OpenAI ``tool_calls`` array, Gemini
            ``function_call`` parts).  ``None`` for non-assistant or
            text-only assistant messages.
    """

    role: str
    content: str
    tool_call_id: str | None = None
    tool_calls: list[ToolCall] | None = None

LLMMessage carries two optional fields for multi-turn tool conversations:

  • tool_call_id: str | None — on role="tool" messages, links back to the originating tool call. Required by OpenAI, used by Anthropic for tool_result blocks.
  • tool_calls: list[ToolCall] | None — on role="assistant" messages, preserves the full tool call structure so backends can reconstruct provider-native multi-turn formats.

Both fields default to None for backward compatibility.

LLMResponse dataclass

A complete response from an LLM backend.

Attributes:

Name Type Description
content str

The generated text content. Empty string when the response was a pure tool-call (no narrative text).

reasoning str | None

Optional chain-of-thought reasoning (if supported by model).

confidence float

Estimated confidence in the response (0.0-1.0).

usage LLMUsage

Token usage statistics.

model str

The model identifier that generated this response.

tool_calls list[ToolCall]

Phase E3 — native function-call requests from the model. Empty list when the model produced text instead of (or in addition to) calling a tool. Populated by every backend that supports function calling: Anthropic, OpenAI, Gemini, Mock.

finish_reason str | None

Why the model stopped generating. One of "stop", "length", "tool_calls", "content_filter", or backend-specific values. None for backends that don't expose this.

Source code in src/agenticapi/runtime/llm/base.py
@dataclass(frozen=True, slots=True)
class LLMResponse:
    """A complete response from an LLM backend.

    Attributes:
        content: The generated text content. Empty string when the
            response was a pure tool-call (no narrative text).
        reasoning: Optional chain-of-thought reasoning (if supported by model).
        confidence: Estimated confidence in the response (0.0-1.0).
        usage: Token usage statistics.
        model: The model identifier that generated this response.
        tool_calls: Phase E3 — native function-call requests from the
            model. Empty list when the model produced text instead of
            (or in addition to) calling a tool. Populated by every
            backend that supports function calling: Anthropic, OpenAI,
            Gemini, Mock.
        finish_reason: Why the model stopped generating. One of
            ``"stop"``, ``"length"``, ``"tool_calls"``, ``"content_filter"``,
            or backend-specific values. ``None`` for backends that
            don't expose this.
    """

    content: str
    reasoning: str | None = None
    confidence: float = 1.0
    usage: LLMUsage = field(default_factory=lambda: LLMUsage(0, 0))
    model: str = ""
    tool_calls: list[ToolCall] = field(default_factory=list)
    finish_reason: str | None = None

LLMResponse carries two fields that drive native function calling:

  • tool_calls: list[ToolCall] — structured function-call requests returned by the model. Empty for plain text completions.
  • finish_reason: str | None — why generation stopped. Typical values: "stop", "length", "tool_calls", "content_filter". None for backends that don't report it.

All four backends (Anthropic, OpenAI, Gemini, Mock) fully populate these fields. Each real backend parses its provider's native response format into ToolCall objects and maps stop reasons to normalized finish_reason values.

ToolCall dataclass

A single native function-call request from an LLM (Phase E3).

Modern LLM APIs (Anthropic tools/tool_choice, OpenAI tools, Gemini function_declarations) emit structured function-call objects when they want a tool invoked instead of producing free-form Python code. This dataclass is the framework-agnostic representation of one such call.

The LLMBackend protocol promises to populate :attr:LLMResponse.tool_calls with one entry per requested invocation. Downstream consumers (the harness's tool-first path in Phase E4) iterate the list, validate the arguments against the registered tool's Pydantic schema, and dispatch to the tool with cost / latency / reliability all dramatically better than going through code generation + sandbox execution.

Attributes:

Name Type Description
id str

Provider-supplied identifier for this call. Echoed back in the tool result so multi-call exchanges stay in sync.

name str

The tool name the model wants to invoke. Resolved against the registered :class:ToolRegistry.

arguments dict[str, Any]

The keyword arguments the model produced for the tool. Always a dict; the framework validates it through the tool's Pydantic input model before dispatching.

Source code in src/agenticapi/runtime/llm/base.py
@dataclass(frozen=True, slots=True)
class ToolCall:
    """A single native function-call request from an LLM (Phase E3).

    Modern LLM APIs (Anthropic ``tools``/``tool_choice``, OpenAI
    ``tools``, Gemini ``function_declarations``) emit structured
    function-call objects when they want a tool invoked instead of
    producing free-form Python code. This dataclass is the
    framework-agnostic representation of one such call.

    The ``LLMBackend`` protocol promises to populate
    :attr:`LLMResponse.tool_calls` with one entry per requested
    invocation. Downstream consumers (the harness's tool-first path
    in Phase E4) iterate the list, validate the arguments against
    the registered tool's Pydantic schema, and dispatch to the tool
    with cost / latency / reliability all dramatically better than
    going through code generation + sandbox execution.

    Attributes:
        id: Provider-supplied identifier for this call. Echoed back
            in the tool result so multi-call exchanges stay in sync.
        name: The tool name the model wants to invoke. Resolved
            against the registered :class:`ToolRegistry`.
        arguments: The keyword arguments the model produced for the
            tool. Always a dict; the framework validates it through
            the tool's Pydantic input model before dispatching.
    """

    id: str
    name: str
    arguments: dict[str, Any]

LLMUsage dataclass

Token usage information from an LLM call.

Attributes:

Name Type Description
input_tokens int

Number of tokens in the prompt.

output_tokens int

Number of tokens in the response.

Source code in src/agenticapi/runtime/llm/base.py
@dataclass(frozen=True, slots=True)
class LLMUsage:
    """Token usage information from an LLM call.

    Attributes:
        input_tokens: Number of tokens in the prompt.
        output_tokens: Number of tokens in the response.
    """

    input_tokens: int
    output_tokens: int

LLMChunk dataclass

A single chunk from a streaming LLM response.

Attributes:

Name Type Description
content str

The text content of this chunk.

is_final bool

Whether this is the last chunk in the stream.

Source code in src/agenticapi/runtime/llm/base.py
@dataclass(frozen=True, slots=True)
class LLMChunk:
    """A single chunk from a streaming LLM response.

    Attributes:
        content: The text content of this chunk.
        is_final: Whether this is the last chunk in the stream.
    """

    content: str
    is_final: bool = False

AnthropicBackend

AnthropicBackend

LLM backend using the Anthropic API (Claude models).

Uses anthropic.AsyncAnthropic for async communication with the Anthropic API. Supports both complete and streaming generation, native function calling via tool_use content blocks, and automatic retry on transient errors.

Example

backend = AnthropicBackend(model="claude-sonnet-4-6") response = await backend.generate(prompt)

Source code in src/agenticapi/runtime/llm/anthropic.py
class AnthropicBackend:
    """LLM backend using the Anthropic API (Claude models).

    Uses anthropic.AsyncAnthropic for async communication with the
    Anthropic API. Supports both complete and streaming generation,
    native function calling via ``tool_use`` content blocks, and
    automatic retry on transient errors.

    Example:
        backend = AnthropicBackend(model="claude-sonnet-4-6")
        response = await backend.generate(prompt)
    """

    def __init__(
        self,
        *,
        model: str = "claude-sonnet-4-6",
        api_key: str | None = None,
        max_tokens: int = 4096,
        timeout: float = 120.0,
        retry: RetryConfig | None = None,
    ) -> None:
        """Initialize the Anthropic backend.

        Args:
            model: The Anthropic model identifier to use.
            api_key: Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.
            max_tokens: Default maximum tokens for generation.
            timeout: API call timeout in seconds.
            retry: Optional retry configuration for transient failures.
        """
        try:
            import anthropic
        except ImportError as exc:
            raise ImportError(
                "The 'anthropic' package is required for AnthropicBackend. Install it with: pip install anthropic"
            ) from exc

        resolved_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
        if not resolved_key:
            raise ValueError("Anthropic API key must be provided via api_key parameter or ANTHROPIC_API_KEY env var")

        self._model = model
        self._max_tokens = max_tokens
        self._timeout = timeout
        self._client = anthropic.AsyncAnthropic(api_key=resolved_key, timeout=timeout)

        if retry is None:
            self._retry = RetryConfig(
                max_retries=3,
                retryable_exceptions=(
                    anthropic.RateLimitError,
                    anthropic.APITimeoutError,
                    anthropic.InternalServerError,
                ),
            )
        else:
            self._retry = retry

    @property
    def model_name(self) -> str:
        """The name of the Anthropic model being used."""
        return self._model

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt to the Anthropic API and return a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response with content and usage statistics.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)
            message: Any = await with_retry(self._client.messages.create, self._retry, **kwargs)
            return self._build_response(message)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("anthropic_generate_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Anthropic API call failed: {exc}") from exc

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream a response from the Anthropic API.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            LLMChunk objects as response tokens are generated.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)

            async with self._client.messages.stream(**kwargs) as stream:
                async for text in stream.text_stream:
                    yield LLMChunk(content=text, is_final=False)

            yield LLMChunk(content="", is_final=True)

            logger.info("anthropic_stream_complete", model=self._model)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("anthropic_stream_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Anthropic streaming API call failed: {exc}") from exc

    def _build_request_kwargs(self, prompt: LLMPrompt) -> dict[str, Any]:
        """Build keyword arguments for the Anthropic API call.

        Translates the framework's generic message and tool formats
        into the Anthropic-specific wire format:

        - Tool definitions use ``input_schema`` (not ``parameters``).
        - Assistant messages with ``tool_calls`` become content blocks
          containing ``tool_use`` entries.
        - Tool-result messages (``role="tool"``) become ``user``
          messages with ``tool_result`` content blocks keyed by
          ``tool_use_id``.

        Args:
            prompt: The LLM prompt to convert.

        Returns:
            Dictionary of keyword arguments for messages.create().
        """
        messages: list[dict[str, Any]] = []
        for msg in prompt.messages:
            if msg.role == "system":
                continue
            if msg.role == "assistant" and msg.tool_calls:
                # Anthropic expects tool_use content blocks on assistant
                # messages that requested tool calls.
                blocks: list[dict[str, Any]] = []
                if msg.content:
                    blocks.append({"type": "text", "text": msg.content})
                for tc in msg.tool_calls:
                    blocks.append(
                        {
                            "type": "tool_use",
                            "id": tc.id,
                            "name": tc.name,
                            "input": tc.arguments,
                        }
                    )
                messages.append({"role": "assistant", "content": blocks})
            elif msg.role == "tool" and msg.tool_call_id:
                # Anthropic expects tool results as user messages with
                # tool_result content blocks.
                messages.append(
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "tool_result",
                                "tool_use_id": msg.tool_call_id,
                                "content": msg.content,
                            }
                        ],
                    }
                )
            else:
                messages.append({"role": msg.role, "content": msg.content})

        kwargs: dict[str, Any] = {
            "model": self._model,
            "max_tokens": prompt.max_tokens or self._max_tokens,
            "system": prompt.system,
            "messages": messages,
            "temperature": prompt.temperature,
        }

        if prompt.tools:
            kwargs["tools"] = [self._normalize_tool(t) for t in prompt.tools]

        if prompt.tool_choice is not None and prompt.tools:
            if isinstance(prompt.tool_choice, dict):
                kwargs["tool_choice"] = prompt.tool_choice
            elif prompt.tool_choice == "auto":
                kwargs["tool_choice"] = {"type": "auto"}
            elif prompt.tool_choice == "required":
                kwargs["tool_choice"] = {"type": "any"}
            elif prompt.tool_choice == "none":
                # Anthropic doesn't have a "none" tool_choice — omit tools.
                kwargs.pop("tools", None)

        return kwargs

    @staticmethod
    def _normalize_tool(t: dict[str, Any]) -> dict[str, Any]:
        """Normalize a tool definition to Anthropic format.

        Accepts the framework's generic format (``"parameters"`` key),
        the Anthropic format (``"input_schema"`` key), or both.
        Always produces ``{"name", "description", "input_schema"}``.
        When both keys are present, ``input_schema`` takes precedence.
        """
        return {
            "name": t["name"],
            "description": t.get("description", ""),
            "input_schema": t.get("input_schema", t.get("parameters", {})),
        }

    def _build_response(self, message: Any) -> LLMResponse:
        """Build an LLMResponse from an Anthropic message object.

        Extracts text content, tool_use blocks, finish_reason, and
        usage statistics.

        Args:
            message: The Anthropic API message object.

        Returns:
            A fully populated LLMResponse.
        """
        text_parts: list[str] = []
        tool_calls: list[ToolCall] = []

        for block in message.content:
            if hasattr(block, "text"):
                text_parts.append(block.text)
            elif getattr(block, "type", None) == "tool_use":
                tool_calls.append(
                    ToolCall(
                        id=block.id,
                        name=block.name,
                        arguments=block.input if isinstance(block.input, dict) else {},
                    )
                )

        finish_reason: str | None = None
        stop_reason = getattr(message, "stop_reason", None)
        if stop_reason == "tool_use":
            finish_reason = "tool_calls"
        elif stop_reason == "end_turn":
            finish_reason = "stop"
        elif stop_reason == "max_tokens":
            finish_reason = "length"
        elif stop_reason is not None:
            finish_reason = str(stop_reason)

        usage = LLMUsage(
            input_tokens=message.usage.input_tokens,
            output_tokens=message.usage.output_tokens,
        )

        logger.info(
            "anthropic_generate_complete",
            model=self._model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            tool_calls=len(tool_calls),
            finish_reason=finish_reason,
        )

        return LLMResponse(
            content="".join(text_parts),
            usage=usage,
            model=message.model,
            tool_calls=tool_calls,
            finish_reason=finish_reason,
        )

model_name property

model_name: str

The name of the Anthropic model being used.

__init__

__init__(
    *,
    model: str = "claude-sonnet-4-6",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None

Initialize the Anthropic backend.

Parameters:

Name Type Description Default
model str

The Anthropic model identifier to use.

'claude-sonnet-4-6'
api_key str | None

Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.

None
max_tokens int

Default maximum tokens for generation.

4096
timeout float

API call timeout in seconds.

120.0
retry RetryConfig | None

Optional retry configuration for transient failures.

None
Source code in src/agenticapi/runtime/llm/anthropic.py
def __init__(
    self,
    *,
    model: str = "claude-sonnet-4-6",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None:
    """Initialize the Anthropic backend.

    Args:
        model: The Anthropic model identifier to use.
        api_key: Anthropic API key. Falls back to ANTHROPIC_API_KEY env var.
        max_tokens: Default maximum tokens for generation.
        timeout: API call timeout in seconds.
        retry: Optional retry configuration for transient failures.
    """
    try:
        import anthropic
    except ImportError as exc:
        raise ImportError(
            "The 'anthropic' package is required for AnthropicBackend. Install it with: pip install anthropic"
        ) from exc

    resolved_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
    if not resolved_key:
        raise ValueError("Anthropic API key must be provided via api_key parameter or ANTHROPIC_API_KEY env var")

    self._model = model
    self._max_tokens = max_tokens
    self._timeout = timeout
    self._client = anthropic.AsyncAnthropic(api_key=resolved_key, timeout=timeout)

    if retry is None:
        self._retry = RetryConfig(
            max_retries=3,
            retryable_exceptions=(
                anthropic.RateLimitError,
                anthropic.APITimeoutError,
                anthropic.InternalServerError,
            ),
        )
    else:
        self._retry = retry

generate async

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt to the Anthropic API and return a complete response.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Returns:

Type Description
LLMResponse

The complete LLM response with content and usage statistics.

Raises:

Type Description
CodeGenerationError

If the API call fails.

Source code in src/agenticapi/runtime/llm/anthropic.py
async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt to the Anthropic API and return a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response with content and usage statistics.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)
        message: Any = await with_retry(self._client.messages.create, self._retry, **kwargs)
        return self._build_response(message)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("anthropic_generate_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Anthropic API call failed: {exc}") from exc

generate_stream async

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream a response from the Anthropic API.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Yields:

Type Description
AsyncIterator[LLMChunk]

LLMChunk objects as response tokens are generated.

Raises:

Type Description
CodeGenerationError

If the API call fails.

Source code in src/agenticapi/runtime/llm/anthropic.py
async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream a response from the Anthropic API.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        LLMChunk objects as response tokens are generated.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)

        async with self._client.messages.stream(**kwargs) as stream:
            async for text in stream.text_stream:
                yield LLMChunk(content=text, is_final=False)

        yield LLMChunk(content="", is_final=True)

        logger.info("anthropic_stream_complete", model=self._model)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("anthropic_stream_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Anthropic streaming API call failed: {exc}") from exc

OpenAIBackend

OpenAIBackend

LLM backend using the OpenAI API (GPT models).

Uses openai.AsyncOpenAI for async communication with the OpenAI API. Supports both complete and streaming generation, native function calling, and automatic retry on transient errors.

Example

backend = OpenAIBackend(model="gpt-5.4-mini") response = await backend.generate(prompt)

Source code in src/agenticapi/runtime/llm/openai.py
class OpenAIBackend:
    """LLM backend using the OpenAI API (GPT models).

    Uses openai.AsyncOpenAI for async communication with the
    OpenAI API. Supports both complete and streaming generation,
    native function calling, and automatic retry on transient errors.

    Example:
        backend = OpenAIBackend(model="gpt-5.4-mini")
        response = await backend.generate(prompt)
    """

    def __init__(
        self,
        *,
        model: str = "gpt-5.4-mini",
        api_key: str | None = None,
        max_tokens: int = 4096,
        timeout: float = 120.0,
        retry: RetryConfig | None = None,
    ) -> None:
        """Initialize the OpenAI backend.

        Args:
            model: The OpenAI model identifier to use.
            api_key: OpenAI API key. Falls back to OPENAI_API_KEY env var.
            max_tokens: Default maximum tokens for generation.
            timeout: API call timeout in seconds.
            retry: Optional retry configuration for transient failures.
        """
        try:
            import openai
        except ImportError as exc:
            raise ImportError(
                "The 'openai' package is required for OpenAIBackend. Install it with: pip install openai"
            ) from exc

        resolved_key = api_key or os.environ.get("OPENAI_API_KEY")
        if not resolved_key:
            raise ValueError("OpenAI API key must be provided via api_key parameter or OPENAI_API_KEY env var")

        self._model = model
        self._max_tokens = max_tokens
        self._timeout = timeout
        self._client = openai.AsyncOpenAI(api_key=resolved_key, timeout=timeout)

        if retry is None:
            self._retry = RetryConfig(
                max_retries=3,
                retryable_exceptions=(
                    openai.RateLimitError,
                    openai.APITimeoutError,
                ),
            )
        else:
            self._retry = retry

    @property
    def model_name(self) -> str:
        """The name of the OpenAI model being used."""
        return self._model

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt to the OpenAI API and return a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response with content and usage statistics.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)
            completion: Any = await with_retry(self._client.chat.completions.create, self._retry, **kwargs)
            return self._build_response(completion)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("openai_generate_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"OpenAI API call failed: {exc}") from exc

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream a response from the OpenAI API.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            LLMChunk objects as response tokens are generated.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            kwargs = self._build_request_kwargs(prompt)
            kwargs["stream"] = True

            stream = await self._client.chat.completions.create(**kwargs)

            async for event in stream:
                if event.choices and event.choices[0].delta.content:
                    yield LLMChunk(content=event.choices[0].delta.content, is_final=False)

            yield LLMChunk(content="", is_final=True)

            logger.info("openai_stream_complete", model=self._model)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("openai_stream_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"OpenAI streaming API call failed: {exc}") from exc

    def _build_request_kwargs(self, prompt: LLMPrompt) -> dict[str, Any]:
        """Build keyword arguments for the OpenAI API call.

        Translates the framework's generic message and tool formats
        into the OpenAI-specific wire format:

        - Tool definitions are wrapped in ``{"type": "function",
          "function": {...}}``.
        - Assistant messages with ``tool_calls`` include a ``tool_calls``
          array of ``{"id", "type", "function": {"name", "arguments"}}``
          objects.
        - Tool-result messages (``role="tool"``) include ``tool_call_id``.

        Args:
            prompt: The LLM prompt to convert.

        Returns:
            Dictionary of keyword arguments for chat.completions.create().
        """
        messages: list[dict[str, Any]] = [{"role": "developer", "content": prompt.system}]
        for msg in prompt.messages:
            if msg.role == "system":
                continue
            if msg.role == "assistant" and msg.tool_calls:
                messages.append(
                    {
                        "role": "assistant",
                        "content": msg.content or None,
                        "tool_calls": [
                            {
                                "id": tc.id,
                                "type": "function",
                                "function": {
                                    "name": tc.name,
                                    "arguments": json.dumps(tc.arguments),
                                },
                            }
                            for tc in msg.tool_calls
                        ],
                    }
                )
            elif msg.role == "tool" and msg.tool_call_id:
                messages.append(
                    {
                        "role": "tool",
                        "tool_call_id": msg.tool_call_id,
                        "content": msg.content,
                    }
                )
            else:
                messages.append({"role": msg.role, "content": msg.content})

        kwargs: dict[str, Any] = {
            "model": self._model,
            "max_completion_tokens": prompt.max_tokens or self._max_tokens,
            "messages": messages,
            "temperature": prompt.temperature,
        }

        if prompt.tools:
            kwargs["tools"] = [self._normalize_tool(t) for t in prompt.tools]

        if prompt.tool_choice is not None and prompt.tools:
            kwargs["tool_choice"] = prompt.tool_choice

        return kwargs

    @staticmethod
    def _normalize_tool(t: dict[str, Any]) -> dict[str, Any]:
        """Normalize a tool definition to OpenAI format.

        Accepts both the framework's generic format
        (``{"name", "description", "parameters"}``) and the already-
        wrapped OpenAI format (``{"type": "function", "function": {...}}``).
        """
        if t.get("type") == "function" and "function" in t:
            return t
        return {
            "type": "function",
            "function": {
                "name": t["name"],
                "description": t.get("description", ""),
                "parameters": t.get("parameters", t.get("input_schema", {})),
            },
        }

    def _build_response(self, completion: Any) -> LLMResponse:
        """Build an LLMResponse from an OpenAI completion object.

        Extracts text content, tool_calls, finish_reason, and usage.

        Args:
            completion: The OpenAI API completion object.

        Returns:
            A fully populated LLMResponse.
        """
        choice = completion.choices[0]
        content = choice.message.content or ""

        tool_calls: list[ToolCall] = []
        raw_calls = getattr(choice.message, "tool_calls", None)
        if raw_calls:
            for tc in raw_calls:
                try:
                    arguments = json.loads(tc.function.arguments) if tc.function.arguments else {}
                except (json.JSONDecodeError, TypeError):
                    arguments = {}
                tool_calls.append(
                    ToolCall(
                        id=tc.id,
                        name=tc.function.name,
                        arguments=arguments,
                    )
                )

        finish_reason: str | None = None
        raw_reason = getattr(choice, "finish_reason", None)
        if raw_reason == "tool_calls":
            finish_reason = "tool_calls"
        elif raw_reason == "stop":
            finish_reason = "stop"
        elif raw_reason == "length":
            finish_reason = "length"
        elif raw_reason == "content_filter":
            finish_reason = "content_filter"
        elif raw_reason is not None:
            finish_reason = str(raw_reason)

        usage = LLMUsage(
            input_tokens=completion.usage.prompt_tokens if completion.usage else 0,
            output_tokens=completion.usage.completion_tokens if completion.usage else 0,
        )

        logger.info(
            "openai_generate_complete",
            model=self._model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            tool_calls=len(tool_calls),
            finish_reason=finish_reason,
        )

        return LLMResponse(
            content=content,
            usage=usage,
            model=completion.model or self._model,
            tool_calls=tool_calls,
            finish_reason=finish_reason,
        )

model_name property

model_name: str

The name of the OpenAI model being used.

__init__

__init__(
    *,
    model: str = "gpt-5.4-mini",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None

Initialize the OpenAI backend.

Parameters:

Name Type Description Default
model str

The OpenAI model identifier to use.

'gpt-5.4-mini'
api_key str | None

OpenAI API key. Falls back to OPENAI_API_KEY env var.

None
max_tokens int

Default maximum tokens for generation.

4096
timeout float

API call timeout in seconds.

120.0
retry RetryConfig | None

Optional retry configuration for transient failures.

None
Source code in src/agenticapi/runtime/llm/openai.py
def __init__(
    self,
    *,
    model: str = "gpt-5.4-mini",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None:
    """Initialize the OpenAI backend.

    Args:
        model: The OpenAI model identifier to use.
        api_key: OpenAI API key. Falls back to OPENAI_API_KEY env var.
        max_tokens: Default maximum tokens for generation.
        timeout: API call timeout in seconds.
        retry: Optional retry configuration for transient failures.
    """
    try:
        import openai
    except ImportError as exc:
        raise ImportError(
            "The 'openai' package is required for OpenAIBackend. Install it with: pip install openai"
        ) from exc

    resolved_key = api_key or os.environ.get("OPENAI_API_KEY")
    if not resolved_key:
        raise ValueError("OpenAI API key must be provided via api_key parameter or OPENAI_API_KEY env var")

    self._model = model
    self._max_tokens = max_tokens
    self._timeout = timeout
    self._client = openai.AsyncOpenAI(api_key=resolved_key, timeout=timeout)

    if retry is None:
        self._retry = RetryConfig(
            max_retries=3,
            retryable_exceptions=(
                openai.RateLimitError,
                openai.APITimeoutError,
            ),
        )
    else:
        self._retry = retry

generate async

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt to the OpenAI API and return a complete response.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Returns:

Type Description
LLMResponse

The complete LLM response with content and usage statistics.

Raises:

Type Description
CodeGenerationError

If the API call fails.

Source code in src/agenticapi/runtime/llm/openai.py
async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt to the OpenAI API and return a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response with content and usage statistics.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)
        completion: Any = await with_retry(self._client.chat.completions.create, self._retry, **kwargs)
        return self._build_response(completion)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("openai_generate_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"OpenAI API call failed: {exc}") from exc

generate_stream async

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream a response from the OpenAI API.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Yields:

Type Description
AsyncIterator[LLMChunk]

LLMChunk objects as response tokens are generated.

Raises:

Type Description
CodeGenerationError

If the API call fails.

Source code in src/agenticapi/runtime/llm/openai.py
async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream a response from the OpenAI API.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        LLMChunk objects as response tokens are generated.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        kwargs = self._build_request_kwargs(prompt)
        kwargs["stream"] = True

        stream = await self._client.chat.completions.create(**kwargs)

        async for event in stream:
            if event.choices and event.choices[0].delta.content:
                yield LLMChunk(content=event.choices[0].delta.content, is_final=False)

        yield LLMChunk(content="", is_final=True)

        logger.info("openai_stream_complete", model=self._model)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("openai_stream_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"OpenAI streaming API call failed: {exc}") from exc

GeminiBackend

GeminiBackend

LLM backend using the Google Gemini API.

Uses the google-genai SDK for async communication with the Gemini API. Supports both complete and streaming generation, native function calling, and automatic retry on transient errors.

Example

backend = GeminiBackend(model="gemini-2.5-flash") response = await backend.generate(prompt)

Source code in src/agenticapi/runtime/llm/gemini.py
class GeminiBackend:
    """LLM backend using the Google Gemini API.

    Uses the google-genai SDK for async communication with the
    Gemini API. Supports both complete and streaming generation,
    native function calling, and automatic retry on transient errors.

    Example:
        backend = GeminiBackend(model="gemini-2.5-flash")
        response = await backend.generate(prompt)
    """

    def __init__(
        self,
        *,
        model: str = "gemini-2.5-flash",
        api_key: str | None = None,
        max_tokens: int = 4096,
        timeout: float = 120.0,
        retry: RetryConfig | None = None,
    ) -> None:
        """Initialize the Gemini backend.

        Args:
            model: The Gemini model identifier to use.
            api_key: Google API key. Falls back to GOOGLE_API_KEY env var.
            max_tokens: Default maximum tokens for generation.
            timeout: API call timeout in seconds.
            retry: Optional retry configuration for transient failures.
        """
        try:
            from google import genai
        except ImportError as exc:
            raise ImportError(
                "The 'google-genai' package is required for GeminiBackend. Install it with: pip install google-genai"
            ) from exc

        resolved_key = api_key or os.environ.get("GOOGLE_API_KEY")
        if not resolved_key:
            raise ValueError("Google API key must be provided via api_key parameter or GOOGLE_API_KEY env var")

        self._model = model
        self._max_tokens = max_tokens
        self._timeout = timeout
        self._client = genai.Client(api_key=resolved_key)

        # Attempt to configure retryable exceptions from google SDK.
        retryable: tuple[type[Exception], ...] = ()
        try:
            from google.api_core.exceptions import ResourceExhausted, ServiceUnavailable  # type: ignore[import-untyped]

            retryable = (ResourceExhausted, ServiceUnavailable)
        except ImportError:
            pass

        self._retry = retry if retry is not None else RetryConfig(max_retries=3, retryable_exceptions=retryable)

    @property
    def model_name(self) -> str:
        """The name of the Gemini model being used."""
        return self._model

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Send a prompt to the Gemini API and return a complete response.

        Args:
            prompt: The LLM prompt to process.

        Returns:
            The complete LLM response with content and usage statistics.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            config, contents = self._build_request_params(prompt)
            response: Any = await with_retry(
                self._client.aio.models.generate_content,
                self._retry,
                model=self._model,
                contents=contents,
                config=config,
            )
            return self._build_response(response)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("gemini_generate_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Gemini API call failed: {exc}") from exc

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream a response from the Gemini API.

        Args:
            prompt: The LLM prompt to process.

        Yields:
            LLMChunk objects as response tokens are generated.

        Raises:
            CodeGenerationError: If the API call fails.
        """
        try:
            config, contents = self._build_request_params(prompt)

            stream = await self._client.aio.models.generate_content_stream(
                model=self._model,
                contents=contents,
                config=config,
            )
            async for chunk in stream:
                if chunk.text:
                    yield LLMChunk(content=chunk.text, is_final=False)

            yield LLMChunk(content="", is_final=True)

            logger.info("gemini_stream_complete", model=self._model)

        except Exception as exc:
            if isinstance(exc, CodeGenerationError):
                raise
            logger.error("gemini_stream_failed", error=str(exc), model=self._model)
            raise CodeGenerationError(f"Gemini streaming API call failed: {exc}") from exc

    def _build_request_params(self, prompt: LLMPrompt) -> tuple[Any, list[Any]]:
        """Build request parameters for the Gemini API call.

        Args:
            prompt: The LLM prompt to convert.

        Returns:
            A tuple of (GenerateContentConfig, contents list).
        """
        from google.genai import types

        config_kwargs: dict[str, Any] = {
            "system_instruction": prompt.system,
            "temperature": prompt.temperature,
            "max_output_tokens": prompt.max_tokens or self._max_tokens,
        }

        if prompt.tools:
            config_kwargs["tools"] = self._convert_tools(prompt.tools)

        if prompt.tool_choice is not None and prompt.tools:
            tool_config = self._convert_tool_choice(prompt.tool_choice)
            if tool_config is not None:
                config_kwargs["tool_config"] = tool_config

        config = types.GenerateContentConfig(**config_kwargs)

        contents: list[types.Content] = []
        for msg in prompt.messages:
            if msg.role == "system":
                continue
            if msg.role == "assistant" and msg.tool_calls:
                # Gemini represents tool calls as function_call Parts
                # on a model message.
                parts: list[Any] = []
                if msg.content:
                    parts.append(types.Part(text=msg.content))
                for tc in msg.tool_calls:
                    parts.append(types.Part(function_call=types.FunctionCall(name=tc.name, args=tc.arguments)))
                contents.append(types.Content(role="model", parts=parts))
            elif msg.role == "tool":
                # Gemini expects tool results as user messages with
                # function_response Parts.  The ``name`` field must be
                # the *function name* (e.g. "get_weather"), not the
                # provider-assigned call ID.  Look it up from the
                # preceding assistant message's tool_calls list.
                tool_name = self._resolve_tool_name(prompt.messages, msg)
                contents.append(
                    types.Content(
                        role="user",
                        parts=[
                            types.Part(
                                function_response=types.FunctionResponse(
                                    name=tool_name,
                                    response={"result": msg.content},
                                )
                            )
                        ],
                    )
                )
            else:
                role = "model" if msg.role == "assistant" else "user"
                contents.append(types.Content(role=role, parts=[types.Part(text=msg.content)]))

        return config, contents

    @staticmethod
    def _convert_tools(tools: list[dict[str, Any]]) -> list[Any]:
        """Convert framework tool definitions to Gemini format.

        Args:
            tools: Tool definitions in the framework's generic format.

        Returns:
            A list of Gemini Tool objects with function_declarations.
        """
        from google.genai import types

        declarations: list[types.FunctionDeclaration] = []
        for tool in tools:
            name = tool.get("name", "")
            description = tool.get("description", "")
            parameters = tool.get("parameters") or tool.get("input_schema") or {}

            declarations.append(
                types.FunctionDeclaration(
                    name=name,
                    description=description,
                    parameters=parameters if parameters else None,  # type: ignore[arg-type]
                )
            )

        return [types.Tool(function_declarations=declarations)]

    @staticmethod
    def _convert_tool_choice(tool_choice: str | dict[str, str]) -> Any:
        """Convert framework tool_choice to Gemini tool_config.

        Args:
            tool_choice: The tool_choice value from LLMPrompt.

        Returns:
            A Gemini ToolConfig or None.
        """
        from google.genai import types

        if isinstance(tool_choice, dict):
            # Force a specific tool.
            return types.ToolConfig(
                function_calling_config=types.FunctionCallingConfig(
                    mode="ANY",  # type: ignore[arg-type]
                    allowed_function_names=[tool_choice.get("name", "")],
                )
            )
        if tool_choice == "auto":
            return types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="AUTO"))  # type: ignore[arg-type]
        if tool_choice == "required":
            return types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="ANY"))  # type: ignore[arg-type]
        if tool_choice == "none":
            return types.ToolConfig(function_calling_config=types.FunctionCallingConfig(mode="NONE"))  # type: ignore[arg-type]
        return None

    @staticmethod
    def _resolve_tool_name(messages: list[LLMMessage], tool_msg: LLMMessage) -> str:
        """Resolve the function name for a tool-result message.

        Gemini's ``FunctionResponse.name`` must be the actual function
        name (e.g. ``"get_weather"``), not the provider-assigned call
        ID.  This helper walks *backward* through the conversation to
        find the assistant message whose ``tool_calls`` list contains a
        ``ToolCall`` with a matching ``id``.

        Falls back to ``tool_call_id`` (which may still be a name in
        some use-cases) or ``"tool_result"`` if no match is found.
        """
        if tool_msg.tool_call_id:
            for prior in reversed(messages):
                if prior.role == "assistant" and prior.tool_calls:
                    for tc in prior.tool_calls:
                        if tc.id == tool_msg.tool_call_id:
                            return tc.name
        return tool_msg.tool_call_id or "tool_result"

    def _build_response(self, response: Any) -> LLMResponse:
        """Build an LLMResponse from a Gemini response object.

        Extracts text content, function_call parts, finish_reason,
        and usage statistics.

        Args:
            response: The Gemini API response object.

        Returns:
            A fully populated LLMResponse.
        """
        text_parts: list[str] = []
        tool_calls: list[ToolCall] = []

        candidates = getattr(response, "candidates", None)
        if isinstance(candidates, list) and candidates:
            parts = getattr(candidates[0].content, "parts", None) or []
            for part in parts:
                if hasattr(part, "text") and part.text:
                    text_parts.append(part.text)
                fc = getattr(part, "function_call", None)
                if fc is not None:
                    args = dict(fc.args) if fc.args else {}
                    tool_calls.append(
                        ToolCall(
                            id=str(uuid.uuid4()),
                            name=fc.name,
                            arguments=args,
                        )
                    )

        # Fallback: if candidates parsing didn't extract text, use
        # the convenience ``response.text`` property.
        if not text_parts and not tool_calls:
            fallback_text = getattr(response, "text", None)
            if fallback_text:
                text_parts.append(fallback_text)

        finish_reason: str | None = None
        if isinstance(candidates, list) and candidates:
            raw_reason = getattr(candidates[0], "finish_reason", None)
            if raw_reason is not None:
                reason_str = str(raw_reason)
                if "STOP" in reason_str:
                    finish_reason = "tool_calls" if tool_calls else "stop"
                elif "MAX_TOKENS" in reason_str:
                    finish_reason = "length"
                elif "SAFETY" in reason_str:
                    finish_reason = "content_filter"
                else:
                    finish_reason = "tool_calls" if tool_calls else "stop"

        usage_meta = getattr(response, "usage_metadata", None)
        usage = LLMUsage(
            input_tokens=(usage_meta.prompt_token_count or 0) if usage_meta else 0,
            output_tokens=(usage_meta.candidates_token_count or 0) if usage_meta else 0,
        )

        logger.info(
            "gemini_generate_complete",
            model=self._model,
            input_tokens=usage.input_tokens,
            output_tokens=usage.output_tokens,
            tool_calls=len(tool_calls),
            finish_reason=finish_reason,
        )

        return LLMResponse(
            content="".join(text_parts),
            usage=usage,
            model=self._model,
            tool_calls=tool_calls,
            finish_reason=finish_reason,
        )

model_name property

model_name: str

The name of the Gemini model being used.

__init__

__init__(
    *,
    model: str = "gemini-2.5-flash",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None

Initialize the Gemini backend.

Parameters:

Name Type Description Default
model str

The Gemini model identifier to use.

'gemini-2.5-flash'
api_key str | None

Google API key. Falls back to GOOGLE_API_KEY env var.

None
max_tokens int

Default maximum tokens for generation.

4096
timeout float

API call timeout in seconds.

120.0
retry RetryConfig | None

Optional retry configuration for transient failures.

None
Source code in src/agenticapi/runtime/llm/gemini.py
def __init__(
    self,
    *,
    model: str = "gemini-2.5-flash",
    api_key: str | None = None,
    max_tokens: int = 4096,
    timeout: float = 120.0,
    retry: RetryConfig | None = None,
) -> None:
    """Initialize the Gemini backend.

    Args:
        model: The Gemini model identifier to use.
        api_key: Google API key. Falls back to GOOGLE_API_KEY env var.
        max_tokens: Default maximum tokens for generation.
        timeout: API call timeout in seconds.
        retry: Optional retry configuration for transient failures.
    """
    try:
        from google import genai
    except ImportError as exc:
        raise ImportError(
            "The 'google-genai' package is required for GeminiBackend. Install it with: pip install google-genai"
        ) from exc

    resolved_key = api_key or os.environ.get("GOOGLE_API_KEY")
    if not resolved_key:
        raise ValueError("Google API key must be provided via api_key parameter or GOOGLE_API_KEY env var")

    self._model = model
    self._max_tokens = max_tokens
    self._timeout = timeout
    self._client = genai.Client(api_key=resolved_key)

    # Attempt to configure retryable exceptions from google SDK.
    retryable: tuple[type[Exception], ...] = ()
    try:
        from google.api_core.exceptions import ResourceExhausted, ServiceUnavailable  # type: ignore[import-untyped]

        retryable = (ResourceExhausted, ServiceUnavailable)
    except ImportError:
        pass

    self._retry = retry if retry is not None else RetryConfig(max_retries=3, retryable_exceptions=retryable)

generate async

generate(prompt: LLMPrompt) -> LLMResponse

Send a prompt to the Gemini API and return a complete response.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Returns:

Type Description
LLMResponse

The complete LLM response with content and usage statistics.

Raises:

Type Description
CodeGenerationError

If the API call fails.

Source code in src/agenticapi/runtime/llm/gemini.py
async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Send a prompt to the Gemini API and return a complete response.

    Args:
        prompt: The LLM prompt to process.

    Returns:
        The complete LLM response with content and usage statistics.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        config, contents = self._build_request_params(prompt)
        response: Any = await with_retry(
            self._client.aio.models.generate_content,
            self._retry,
            model=self._model,
            contents=contents,
            config=config,
        )
        return self._build_response(response)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("gemini_generate_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Gemini API call failed: {exc}") from exc

generate_stream async

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream a response from the Gemini API.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt to process.

required

Yields:

Type Description
AsyncIterator[LLMChunk]

LLMChunk objects as response tokens are generated.

Raises:

Type Description
CodeGenerationError

If the API call fails.

Source code in src/agenticapi/runtime/llm/gemini.py
async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream a response from the Gemini API.

    Args:
        prompt: The LLM prompt to process.

    Yields:
        LLMChunk objects as response tokens are generated.

    Raises:
        CodeGenerationError: If the API call fails.
    """
    try:
        config, contents = self._build_request_params(prompt)

        stream = await self._client.aio.models.generate_content_stream(
            model=self._model,
            contents=contents,
            config=config,
        )
        async for chunk in stream:
            if chunk.text:
                yield LLMChunk(content=chunk.text, is_final=False)

        yield LLMChunk(content="", is_final=True)

        logger.info("gemini_stream_complete", model=self._model)

    except Exception as exc:
        if isinstance(exc, CodeGenerationError):
            raise
        logger.error("gemini_stream_failed", error=str(exc), model=self._model)
        raise CodeGenerationError(f"Gemini streaming API call failed: {exc}") from exc

MockBackend

MockBackend

A mock LLM backend that returns pre-configured responses.

Responses are returned in FIFO order. Raises CodeGenerationError when all responses have been consumed.

Example

backend = MockBackend(responses=["SELECT COUNT() FROM orders"]) response = await backend.generate(prompt) assert response.content == "SELECT COUNT() FROM orders"

Source code in src/agenticapi/runtime/llm/mock.py
class MockBackend:
    """A mock LLM backend that returns pre-configured responses.

    Responses are returned in FIFO order. Raises CodeGenerationError
    when all responses have been consumed.

    Example:
        backend = MockBackend(responses=["SELECT COUNT(*) FROM orders"])
        response = await backend.generate(prompt)
        assert response.content == "SELECT COUNT(*) FROM orders"
    """

    def __init__(
        self,
        responses: list[str] | None = None,
        *,
        structured_responses: list[dict[str, Any]] | None = None,
        tool_call_responses: list[list[ToolCall]] | None = None,
    ) -> None:
        """Initialize the mock backend.

        Args:
            responses: List of response strings to return in order.
                Used when neither ``LLMPrompt.response_schema`` nor
                ``LLMPrompt.tools`` is set.
            structured_responses: List of pre-built dicts the backend
                returns when the prompt carries a ``response_schema``.
                Each dict is JSON-serialised into ``LLMResponse.content``
                so the consumer can parse it back into a Pydantic model.
                Falls back to a synthesised stub matching the schema's
                ``required`` fields when this list is empty.
            tool_call_responses: Phase E3 — list of pre-built tool-call
                bundles the backend returns when ``LLMPrompt.tools`` is
                set. Each entry is a list of one-or-more
                :class:`ToolCall`s representing what the model would
                emit on a single turn (most calls are length-1; a
                length-2 list represents the model batching two tool
                invocations into one response). Falls back to an empty
                tool-call list (and a synthesised text response) when
                this list is empty so existing tests stay green.
        """
        self._responses: list[str] = list(responses) if responses else []
        self._structured_responses: list[dict[str, Any]] = list(structured_responses) if structured_responses else []
        self._tool_call_responses: list[list[ToolCall]] = list(tool_call_responses) if tool_call_responses else []
        self._call_count: int = 0
        self._prompts: list[LLMPrompt] = []

    @property
    def model_name(self) -> str:
        """The name of the mock model."""
        return "mock"

    @property
    def call_count(self) -> int:
        """Number of generate calls made."""
        return self._call_count

    @property
    def prompts(self) -> list[LLMPrompt]:
        """All prompts that were sent to this backend."""
        return list(self._prompts)

    def add_response(self, response: str) -> None:
        """Add a response to the queue.

        Args:
            response: The response string to add.
        """
        self._responses.append(response)

    def add_structured_response(self, response: dict[str, Any]) -> None:
        """Add a structured (schema-conforming) response to the queue.

        Args:
            response: The dict the backend will return on the next call
                that includes a ``response_schema`` in the prompt.
        """
        self._structured_responses.append(response)

    def add_tool_call_response(self, calls: ToolCall | list[ToolCall]) -> None:
        """Queue a native function-call response for the next tools-enabled call.

        Args:
            calls: Either one :class:`ToolCall` (the common case) or a
                list representing the model batching multiple calls
                into a single response.
        """
        bundle = [calls] if isinstance(calls, ToolCall) else list(calls)
        self._tool_call_responses.append(bundle)

    async def generate(self, prompt: LLMPrompt) -> LLMResponse:
        """Return the next pre-configured response.

        Branch order, in priority:

        1. ``prompt.tools`` set **and** a tool-call response queued →
           return an :class:`LLMResponse` with the queued
           :class:`ToolCall`s and an empty content string. This is
           the Phase E3 native-function-calling path.
        2. ``prompt.response_schema`` set → return a structured
           (JSON) response from the queue or synthesised from the
           schema. This is the D4 typed-intent path.
        3. Otherwise → return the next free-form text response.

        Args:
            prompt: The LLM prompt (recorded for later inspection).

        Returns:
            An LLMResponse with the next pre-configured content.

        Raises:
            CodeGenerationError: If no response is available for the
                requested mode.
        """
        self._prompts.append(prompt)
        self._call_count += 1

        # Phase E3: tools-enabled path. The model "wants to call a
        # function" — return the queued ToolCall bundle. Empty
        # content + finish_reason="tool_calls" mirrors what the real
        # backends emit on this path.
        if prompt.tools and self._tool_call_responses:
            calls = self._tool_call_responses.pop(0)
            return LLMResponse(
                content="",
                usage=LLMUsage(
                    input_tokens=len(prompt.system) // 4,
                    output_tokens=sum(len(json.dumps(c.arguments)) for c in calls) // 4,
                ),
                model="mock",
                tool_calls=calls,
                finish_reason="tool_calls",
            )

        # tool_choice="required" forces a tool call even when none is
        # queued — synthesise a call to the first declared tool.
        if prompt.tools and prompt.tool_choice == "required":
            first_tool = prompt.tools[0]
            synth = ToolCall(
                id="mock_required_0",
                name=first_tool.get("name", "unknown"),
                arguments={},
            )
            return LLMResponse(
                content="",
                usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=10),
                model="mock",
                tool_calls=[synth],
                finish_reason="tool_calls",
            )

        if prompt.response_schema is not None:
            payload: dict[str, Any]
            if self._structured_responses:
                payload = self._structured_responses.pop(0)
            else:
                payload = _synthesise_from_schema(prompt.response_schema)
            content = json.dumps(payload)
            return LLMResponse(
                content=content,
                usage=LLMUsage(
                    input_tokens=len(prompt.system) // 4,
                    output_tokens=len(content) // 4,
                ),
                model="mock",
                finish_reason="stop",
            )

        if not self._responses:
            raise CodeGenerationError("MockBackend: no more responses available")

        content = self._responses.pop(0)
        return LLMResponse(
            content=content,
            usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=len(content) // 4),
            model="mock",
        )

    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
        """Stream the next pre-configured response in chunks.

        Splits the response content into word-level chunks for realistic
        streaming simulation.

        Args:
            prompt: The LLM prompt (recorded for later inspection).

        Yields:
            LLMChunk objects, with the final chunk having is_final=True.

        Raises:
            CodeGenerationError: If all responses have been consumed.
        """
        response = await self.generate(prompt)
        words = response.content.split(" ")
        for i, word in enumerate(words):
            is_last = i == len(words) - 1
            chunk_content = word if is_last else word + " "
            yield LLMChunk(content=chunk_content, is_final=is_last)

model_name property

model_name: str

The name of the mock model.

call_count property

call_count: int

Number of generate calls made.

prompts property

prompts: list[LLMPrompt]

All prompts that were sent to this backend.

__init__

__init__(
    responses: list[str] | None = None,
    *,
    structured_responses: list[dict[str, Any]]
    | None = None,
    tool_call_responses: list[list[ToolCall]] | None = None,
) -> None

Initialize the mock backend.

Parameters:

Name Type Description Default
responses list[str] | None

List of response strings to return in order. Used when neither LLMPrompt.response_schema nor LLMPrompt.tools is set.

None
structured_responses list[dict[str, Any]] | None

List of pre-built dicts the backend returns when the prompt carries a response_schema. Each dict is JSON-serialised into LLMResponse.content so the consumer can parse it back into a Pydantic model. Falls back to a synthesised stub matching the schema's required fields when this list is empty.

None
tool_call_responses list[list[ToolCall]] | None

Phase E3 — list of pre-built tool-call bundles the backend returns when LLMPrompt.tools is set. Each entry is a list of one-or-more :class:ToolCalls representing what the model would emit on a single turn (most calls are length-1; a length-2 list represents the model batching two tool invocations into one response). Falls back to an empty tool-call list (and a synthesised text response) when this list is empty so existing tests stay green.

None
Source code in src/agenticapi/runtime/llm/mock.py
def __init__(
    self,
    responses: list[str] | None = None,
    *,
    structured_responses: list[dict[str, Any]] | None = None,
    tool_call_responses: list[list[ToolCall]] | None = None,
) -> None:
    """Initialize the mock backend.

    Args:
        responses: List of response strings to return in order.
            Used when neither ``LLMPrompt.response_schema`` nor
            ``LLMPrompt.tools`` is set.
        structured_responses: List of pre-built dicts the backend
            returns when the prompt carries a ``response_schema``.
            Each dict is JSON-serialised into ``LLMResponse.content``
            so the consumer can parse it back into a Pydantic model.
            Falls back to a synthesised stub matching the schema's
            ``required`` fields when this list is empty.
        tool_call_responses: Phase E3 — list of pre-built tool-call
            bundles the backend returns when ``LLMPrompt.tools`` is
            set. Each entry is a list of one-or-more
            :class:`ToolCall`s representing what the model would
            emit on a single turn (most calls are length-1; a
            length-2 list represents the model batching two tool
            invocations into one response). Falls back to an empty
            tool-call list (and a synthesised text response) when
            this list is empty so existing tests stay green.
    """
    self._responses: list[str] = list(responses) if responses else []
    self._structured_responses: list[dict[str, Any]] = list(structured_responses) if structured_responses else []
    self._tool_call_responses: list[list[ToolCall]] = list(tool_call_responses) if tool_call_responses else []
    self._call_count: int = 0
    self._prompts: list[LLMPrompt] = []

add_response

add_response(response: str) -> None

Add a response to the queue.

Parameters:

Name Type Description Default
response str

The response string to add.

required
Source code in src/agenticapi/runtime/llm/mock.py
def add_response(self, response: str) -> None:
    """Add a response to the queue.

    Args:
        response: The response string to add.
    """
    self._responses.append(response)

add_structured_response

add_structured_response(response: dict[str, Any]) -> None

Add a structured (schema-conforming) response to the queue.

Parameters:

Name Type Description Default
response dict[str, Any]

The dict the backend will return on the next call that includes a response_schema in the prompt.

required
Source code in src/agenticapi/runtime/llm/mock.py
def add_structured_response(self, response: dict[str, Any]) -> None:
    """Add a structured (schema-conforming) response to the queue.

    Args:
        response: The dict the backend will return on the next call
            that includes a ``response_schema`` in the prompt.
    """
    self._structured_responses.append(response)

add_tool_call_response

add_tool_call_response(
    calls: ToolCall | list[ToolCall],
) -> None

Queue a native function-call response for the next tools-enabled call.

Parameters:

Name Type Description Default
calls ToolCall | list[ToolCall]

Either one :class:ToolCall (the common case) or a list representing the model batching multiple calls into a single response.

required
Source code in src/agenticapi/runtime/llm/mock.py
def add_tool_call_response(self, calls: ToolCall | list[ToolCall]) -> None:
    """Queue a native function-call response for the next tools-enabled call.

    Args:
        calls: Either one :class:`ToolCall` (the common case) or a
            list representing the model batching multiple calls
            into a single response.
    """
    bundle = [calls] if isinstance(calls, ToolCall) else list(calls)
    self._tool_call_responses.append(bundle)

generate async

generate(prompt: LLMPrompt) -> LLMResponse

Return the next pre-configured response.

Branch order, in priority:

  1. prompt.tools set and a tool-call response queued → return an :class:LLMResponse with the queued :class:ToolCalls and an empty content string. This is the Phase E3 native-function-calling path.
  2. prompt.response_schema set → return a structured (JSON) response from the queue or synthesised from the schema. This is the D4 typed-intent path.
  3. Otherwise → return the next free-form text response.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt (recorded for later inspection).

required

Returns:

Type Description
LLMResponse

An LLMResponse with the next pre-configured content.

Raises:

Type Description
CodeGenerationError

If no response is available for the requested mode.

Source code in src/agenticapi/runtime/llm/mock.py
async def generate(self, prompt: LLMPrompt) -> LLMResponse:
    """Return the next pre-configured response.

    Branch order, in priority:

    1. ``prompt.tools`` set **and** a tool-call response queued →
       return an :class:`LLMResponse` with the queued
       :class:`ToolCall`s and an empty content string. This is
       the Phase E3 native-function-calling path.
    2. ``prompt.response_schema`` set → return a structured
       (JSON) response from the queue or synthesised from the
       schema. This is the D4 typed-intent path.
    3. Otherwise → return the next free-form text response.

    Args:
        prompt: The LLM prompt (recorded for later inspection).

    Returns:
        An LLMResponse with the next pre-configured content.

    Raises:
        CodeGenerationError: If no response is available for the
            requested mode.
    """
    self._prompts.append(prompt)
    self._call_count += 1

    # Phase E3: tools-enabled path. The model "wants to call a
    # function" — return the queued ToolCall bundle. Empty
    # content + finish_reason="tool_calls" mirrors what the real
    # backends emit on this path.
    if prompt.tools and self._tool_call_responses:
        calls = self._tool_call_responses.pop(0)
        return LLMResponse(
            content="",
            usage=LLMUsage(
                input_tokens=len(prompt.system) // 4,
                output_tokens=sum(len(json.dumps(c.arguments)) for c in calls) // 4,
            ),
            model="mock",
            tool_calls=calls,
            finish_reason="tool_calls",
        )

    # tool_choice="required" forces a tool call even when none is
    # queued — synthesise a call to the first declared tool.
    if prompt.tools and prompt.tool_choice == "required":
        first_tool = prompt.tools[0]
        synth = ToolCall(
            id="mock_required_0",
            name=first_tool.get("name", "unknown"),
            arguments={},
        )
        return LLMResponse(
            content="",
            usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=10),
            model="mock",
            tool_calls=[synth],
            finish_reason="tool_calls",
        )

    if prompt.response_schema is not None:
        payload: dict[str, Any]
        if self._structured_responses:
            payload = self._structured_responses.pop(0)
        else:
            payload = _synthesise_from_schema(prompt.response_schema)
        content = json.dumps(payload)
        return LLMResponse(
            content=content,
            usage=LLMUsage(
                input_tokens=len(prompt.system) // 4,
                output_tokens=len(content) // 4,
            ),
            model="mock",
            finish_reason="stop",
        )

    if not self._responses:
        raise CodeGenerationError("MockBackend: no more responses available")

    content = self._responses.pop(0)
    return LLMResponse(
        content=content,
        usage=LLMUsage(input_tokens=len(prompt.system) // 4, output_tokens=len(content) // 4),
        model="mock",
    )

generate_stream async

generate_stream(
    prompt: LLMPrompt,
) -> AsyncIterator[LLMChunk]

Stream the next pre-configured response in chunks.

Splits the response content into word-level chunks for realistic streaming simulation.

Parameters:

Name Type Description Default
prompt LLMPrompt

The LLM prompt (recorded for later inspection).

required

Yields:

Type Description
AsyncIterator[LLMChunk]

LLMChunk objects, with the final chunk having is_final=True.

Raises:

Type Description
CodeGenerationError

If all responses have been consumed.

Source code in src/agenticapi/runtime/llm/mock.py
async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]:
    """Stream the next pre-configured response in chunks.

    Splits the response content into word-level chunks for realistic
    streaming simulation.

    Args:
        prompt: The LLM prompt (recorded for later inspection).

    Yields:
        LLMChunk objects, with the final chunk having is_final=True.

    Raises:
        CodeGenerationError: If all responses have been consumed.
    """
    response = await self.generate(prompt)
    words = response.content.split(" ")
    for i, word in enumerate(words):
        is_last = i == len(words) - 1
        chunk_content = word if is_last else word + " "
        yield LLMChunk(content=chunk_content, is_final=is_last)

RetryConfig

RetryConfig dataclass

Configuration for LLM call retries.

Attributes:

Name Type Description
max_retries int

Maximum number of retry attempts (0 = no retries).

base_delay_seconds float

Initial delay before the first retry.

max_delay_seconds float

Upper bound on delay between retries.

jitter bool

Whether to add random jitter to the delay.

retryable_exceptions tuple[type[Exception], ...]

Exception types that trigger a retry.

Source code in src/agenticapi/runtime/llm/retry.py
@dataclass(frozen=True, slots=True)
class RetryConfig:
    """Configuration for LLM call retries.

    Attributes:
        max_retries: Maximum number of retry attempts (0 = no retries).
        base_delay_seconds: Initial delay before the first retry.
        max_delay_seconds: Upper bound on delay between retries.
        jitter: Whether to add random jitter to the delay.
        retryable_exceptions: Exception types that trigger a retry.
    """

    max_retries: int = 3
    base_delay_seconds: float = 1.0
    max_delay_seconds: float = 30.0
    jitter: bool = True
    retryable_exceptions: tuple[type[Exception], ...] = field(default_factory=tuple)

CodeGenerator

CodeGenerator

Generates Python code from intents using an LLM backend.

Uses the LLM to convert natural language intents into executable Python code, scoped to the available tools. The generated code is extracted from the LLM response and returned for harness evaluation.

Example

generator = CodeGenerator(llm=backend, tools=registry) result = await generator.generate( intent_raw="Show me order count", intent_action="read", intent_domain="order", intent_parameters={}, context=agent_context, ) print(result.code)

Source code in src/agenticapi/runtime/code_generator.py
class CodeGenerator:
    """Generates Python code from intents using an LLM backend.

    Uses the LLM to convert natural language intents into executable
    Python code, scoped to the available tools. The generated code
    is extracted from the LLM response and returned for harness evaluation.

    Example:
        generator = CodeGenerator(llm=backend, tools=registry)
        result = await generator.generate(
            intent_raw="Show me order count",
            intent_action="read",
            intent_domain="order",
            intent_parameters={},
            context=agent_context,
        )
        print(result.code)
    """

    def __init__(
        self,
        *,
        llm: LLMBackend,
        tools: ToolRegistry | None = None,
    ) -> None:
        """Initialize the code generator.

        Args:
            llm: The LLM backend to use for code generation.
            tools: Optional tool registry defining available tools.
        """
        self._llm = llm
        self._tools = tools or ToolRegistry()

    async def generate(
        self,
        *,
        intent_raw: str,
        intent_action: str,
        intent_domain: str,
        intent_parameters: dict[str, Any],
        context: AgentContext,
        sandbox_data: dict[str, object] | None = None,
    ) -> GeneratedCode:
        """Generate Python code from an intent.

        Builds a prompt from the intent and context, sends it to the LLM,
        and extracts the generated code from the response.

        Args:
            intent_raw: The original natural language request.
            intent_action: The classified action type.
            intent_domain: The domain of the request.
            intent_parameters: Extracted parameters from the intent.
            context: The agent execution context.
            sandbox_data: Pre-fetched tool data to include in the prompt
                so the LLM knows the data schema.

        Returns:
            GeneratedCode containing the extracted code and metadata.

        Raises:
            CodeGenerationError: If code generation or extraction fails.
        """
        tool_definitions = self._tools.get_definitions()
        context_str = context.context_window.build()

        prompt = build_code_generation_prompt(
            intent_raw=intent_raw,
            intent_action=intent_action,
            intent_domain=intent_domain,
            intent_parameters=intent_parameters,
            tool_definitions=tool_definitions,
            context=context_str,
            sandbox_data=sandbox_data,
        )

        logger.info(
            "code_generation_started",
            trace_id=context.trace_id,
            intent_action=intent_action,
            intent_domain=intent_domain,
            tool_count=len(tool_definitions),
        )

        from agenticapi.observability import (
            AgenticAPIAttributes,
            GenAIAttributes,
            SpanNames,
            get_tracer,
        )

        tracer = get_tracer()
        with tracer.start_as_current_span(SpanNames.CODE_GENERATE.value) as gen_span:
            gen_span.set_attribute(AgenticAPIAttributes.INTENT_ACTION.value, intent_action)
            gen_span.set_attribute(AgenticAPIAttributes.INTENT_DOMAIN.value, intent_domain)
            with tracer.start_as_current_span(SpanNames.GEN_AI_CHAT.value) as llm_span:
                llm_span.set_attribute(GenAIAttributes.OPERATION_NAME.value, "code_generate")
                llm_span.set_attribute(GenAIAttributes.REQUEST_MODEL.value, self._llm.model_name)
                llm_span.set_attribute(GenAIAttributes.REQUEST_MAX_TOKENS.value, prompt.max_tokens)
                llm_span.set_attribute(GenAIAttributes.REQUEST_TEMPERATURE.value, prompt.temperature)
                try:
                    response = await self._llm.generate(prompt)
                except CodeGenerationError as exc:
                    llm_span.record_exception(exc)
                    raise
                except Exception as exc:
                    logger.error("code_generation_llm_failed", trace_id=context.trace_id, error=str(exc))
                    llm_span.record_exception(exc)
                    raise CodeGenerationError(f"LLM call failed during code generation: {exc}") from exc

                llm_span.set_attribute(GenAIAttributes.RESPONSE_MODEL.value, response.model)
                llm_span.set_attribute(GenAIAttributes.USAGE_INPUT_TOKENS.value, response.usage.input_tokens)
                llm_span.set_attribute(GenAIAttributes.USAGE_OUTPUT_TOKENS.value, response.usage.output_tokens)

            code = _extract_code(response.content or "")
            if not code.strip():
                logger.error(
                    "code_generation_empty",
                    trace_id=context.trace_id,
                    raw_response=(response.content or "")[:200],
                )
                raise CodeGenerationError("LLM returned empty code")

            gen_span.set_attribute(AgenticAPIAttributes.CODE_LINES.value, code.count("\n") + 1)

            logger.info(
                "code_generation_complete",
                trace_id=context.trace_id,
                code_lines=code.count("\n") + 1,
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
            )

            return GeneratedCode(
                code=code,
                reasoning=response.reasoning,
                confidence=response.confidence,
                usage=response.usage,
            )

__init__

__init__(
    *, llm: LLMBackend, tools: ToolRegistry | None = None
) -> None

Initialize the code generator.

Parameters:

Name Type Description Default
llm LLMBackend

The LLM backend to use for code generation.

required
tools ToolRegistry | None

Optional tool registry defining available tools.

None
Source code in src/agenticapi/runtime/code_generator.py
def __init__(
    self,
    *,
    llm: LLMBackend,
    tools: ToolRegistry | None = None,
) -> None:
    """Initialize the code generator.

    Args:
        llm: The LLM backend to use for code generation.
        tools: Optional tool registry defining available tools.
    """
    self._llm = llm
    self._tools = tools or ToolRegistry()

generate async

generate(
    *,
    intent_raw: str,
    intent_action: str,
    intent_domain: str,
    intent_parameters: dict[str, Any],
    context: AgentContext,
    sandbox_data: dict[str, object] | None = None,
) -> GeneratedCode

Generate Python code from an intent.

Builds a prompt from the intent and context, sends it to the LLM, and extracts the generated code from the response.

Parameters:

Name Type Description Default
intent_raw str

The original natural language request.

required
intent_action str

The classified action type.

required
intent_domain str

The domain of the request.

required
intent_parameters dict[str, Any]

Extracted parameters from the intent.

required
context AgentContext

The agent execution context.

required
sandbox_data dict[str, object] | None

Pre-fetched tool data to include in the prompt so the LLM knows the data schema.

None

Returns:

Type Description
GeneratedCode

GeneratedCode containing the extracted code and metadata.

Raises:

Type Description
CodeGenerationError

If code generation or extraction fails.

Source code in src/agenticapi/runtime/code_generator.py
async def generate(
    self,
    *,
    intent_raw: str,
    intent_action: str,
    intent_domain: str,
    intent_parameters: dict[str, Any],
    context: AgentContext,
    sandbox_data: dict[str, object] | None = None,
) -> GeneratedCode:
    """Generate Python code from an intent.

    Builds a prompt from the intent and context, sends it to the LLM,
    and extracts the generated code from the response.

    Args:
        intent_raw: The original natural language request.
        intent_action: The classified action type.
        intent_domain: The domain of the request.
        intent_parameters: Extracted parameters from the intent.
        context: The agent execution context.
        sandbox_data: Pre-fetched tool data to include in the prompt
            so the LLM knows the data schema.

    Returns:
        GeneratedCode containing the extracted code and metadata.

    Raises:
        CodeGenerationError: If code generation or extraction fails.
    """
    tool_definitions = self._tools.get_definitions()
    context_str = context.context_window.build()

    prompt = build_code_generation_prompt(
        intent_raw=intent_raw,
        intent_action=intent_action,
        intent_domain=intent_domain,
        intent_parameters=intent_parameters,
        tool_definitions=tool_definitions,
        context=context_str,
        sandbox_data=sandbox_data,
    )

    logger.info(
        "code_generation_started",
        trace_id=context.trace_id,
        intent_action=intent_action,
        intent_domain=intent_domain,
        tool_count=len(tool_definitions),
    )

    from agenticapi.observability import (
        AgenticAPIAttributes,
        GenAIAttributes,
        SpanNames,
        get_tracer,
    )

    tracer = get_tracer()
    with tracer.start_as_current_span(SpanNames.CODE_GENERATE.value) as gen_span:
        gen_span.set_attribute(AgenticAPIAttributes.INTENT_ACTION.value, intent_action)
        gen_span.set_attribute(AgenticAPIAttributes.INTENT_DOMAIN.value, intent_domain)
        with tracer.start_as_current_span(SpanNames.GEN_AI_CHAT.value) as llm_span:
            llm_span.set_attribute(GenAIAttributes.OPERATION_NAME.value, "code_generate")
            llm_span.set_attribute(GenAIAttributes.REQUEST_MODEL.value, self._llm.model_name)
            llm_span.set_attribute(GenAIAttributes.REQUEST_MAX_TOKENS.value, prompt.max_tokens)
            llm_span.set_attribute(GenAIAttributes.REQUEST_TEMPERATURE.value, prompt.temperature)
            try:
                response = await self._llm.generate(prompt)
            except CodeGenerationError as exc:
                llm_span.record_exception(exc)
                raise
            except Exception as exc:
                logger.error("code_generation_llm_failed", trace_id=context.trace_id, error=str(exc))
                llm_span.record_exception(exc)
                raise CodeGenerationError(f"LLM call failed during code generation: {exc}") from exc

            llm_span.set_attribute(GenAIAttributes.RESPONSE_MODEL.value, response.model)
            llm_span.set_attribute(GenAIAttributes.USAGE_INPUT_TOKENS.value, response.usage.input_tokens)
            llm_span.set_attribute(GenAIAttributes.USAGE_OUTPUT_TOKENS.value, response.usage.output_tokens)

        code = _extract_code(response.content or "")
        if not code.strip():
            logger.error(
                "code_generation_empty",
                trace_id=context.trace_id,
                raw_response=(response.content or "")[:200],
            )
            raise CodeGenerationError("LLM returned empty code")

        gen_span.set_attribute(AgenticAPIAttributes.CODE_LINES.value, code.count("\n") + 1)

        logger.info(
            "code_generation_complete",
            trace_id=context.trace_id,
            code_lines=code.count("\n") + 1,
            input_tokens=response.usage.input_tokens,
            output_tokens=response.usage.output_tokens,
        )

        return GeneratedCode(
            code=code,
            reasoning=response.reasoning,
            confidence=response.confidence,
            usage=response.usage,
        )