Skip to content

LLM Backends

AgenticAPI supports multiple LLM providers through a pluggable LLMBackend protocol. All built-in backends support text generation, streaming, native function calling, and automatic retry with exponential backoff.

Built-in Backends

Backend Provider Default Model Env Variable
AnthropicBackend Anthropic claude-sonnet-4-6 ANTHROPIC_API_KEY
OpenAIBackend OpenAI gpt-5.4-mini OPENAI_API_KEY
GeminiBackend Google gemini-2.5-flash GOOGLE_API_KEY
MockBackend (Testing) mock --

Capability Matrix

Backend Text Stream Native Tool Calls finish_reason tool_choice Retry
AnthropicBackend Yes Yes Yes Yes Yes RateLimitError, Timeout, 5xx
OpenAIBackend Yes Yes Yes Yes Yes RateLimitError, Timeout
GeminiBackend Yes Yes Yes Yes Yes ResourceExhausted, Unavailable
MockBackend Yes Yes Yes Yes Yes --
from agenticapi.runtime.llm import AnthropicBackend, OpenAIBackend, GeminiBackend

llm = AnthropicBackend(model="claude-sonnet-4-6")
llm = OpenAIBackend(model="gpt-5.4-mini")
llm = GeminiBackend(model="gemini-2.5-flash")

Usage

Complete Generation

from agenticapi.runtime.llm.base import LLMPrompt, LLMMessage

response = await backend.generate(LLMPrompt(
    system="You are a helpful assistant.",
    messages=[LLMMessage(role="user", content="Write a SQL query")],
))
print(response.content)
print(response.usage)  # LLMUsage(input_tokens=..., output_tokens=...)

Streaming

async for chunk in backend.generate_stream(prompt):
    print(chunk.content, end="")

Structured Output

LLMPrompt supports response_schema for typed-intent and structured-output use cases. MockBackend fully honors response_schema and synthesises schema-conforming JSON. The provider backends do not yet translate response_schema into provider-native structured-output APIs, but the typed-intent programming model works end-to-end with MockBackend.

Native Function Calling

All four backends support native function calling. The LLM receives tool definitions, decides when to call them, and returns structured ToolCall objects. AgenticAPI's tool-first execution path (E4) dispatches these calls through the harness without going through the sandbox.

from agenticapi.runtime.llm.base import LLMMessage, LLMPrompt

prompt = LLMPrompt(
    system="You answer questions about orders.",
    messages=[LLMMessage(role="user", content="How many shipped orders today?")],
    tools=[
        {
            "name": "count_orders",
            "description": "Count orders by status.",
            "parameters": {
                "type": "object",
                "properties": {"status": {"type": "string"}},
                "required": ["status"],
            },
        }
    ],
    tool_choice="auto",  # or "required", "none", {"type": "tool", "name": "..."}
)

response = await backend.generate(prompt)

if response.finish_reason == "tool_calls":
    for call in response.tool_calls:
        print(call.id, call.name, call.arguments)
        result = await registry.get(call.name).invoke(**call.arguments)
else:
    print(response.content)

tool_choice

Controls how the model selects tools:

Value Meaning
"auto" Model decides whether to call a tool or respond with text
"required" Model must call at least one tool
"none" Model must not call any tool
{"type": "tool", "name": "count_orders"} Force a specific tool
None Defer to the provider's default (usually "auto")

Each backend translates tool_choice into its provider's native format (Anthropic uses {"type": "any"} for "required", Gemini uses FunctionCallingConfig(mode="ANY"), etc.).

ToolCall

@dataclass(frozen=True, slots=True)
class ToolCall:
    id: str                    # provider-supplied call ID
    name: str                  # tool name to invoke
    arguments: dict[str, Any]  # parsed keyword arguments

LLMResponse.finish_reason

finish_reason reports why generation stopped:

Value Meaning
"stop" Natural end of turn
"length" max_tokens reached
"tool_calls" Model requested tool(s); inspect tool_calls
"content_filter" Provider's safety filter engaged
None Backend didn't report a finish reason

Retry

All real backends include automatic retry with exponential backoff for transient provider errors. Each backend ships sensible defaults (3 retries, 1s base delay, jitter enabled). You can customize via RetryConfig:

from agenticapi.runtime.llm.retry import RetryConfig

backend = AnthropicBackend(
    retry=RetryConfig(
        max_retries=5,
        base_delay_seconds=0.5,
        max_delay_seconds=60.0,
        jitter=True,
    ),
)

RetryConfig fields:

Field Default Purpose
max_retries 3 Maximum retry attempts (0 = no retries)
base_delay_seconds 1.0 Initial delay before first retry
max_delay_seconds 30.0 Upper bound on delay
jitter True Add randomness to prevent thundering herd
retryable_exceptions Provider-specific Exception types that trigger retry

Custom Backend

Any class matching the LLMBackend protocol works without inheriting from AgenticAPI:

class MyCustomBackend:
    async def generate(self, prompt: LLMPrompt) -> LLMResponse: ...
    async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]: ...
    @property
    def model_name(self) -> str: ...

To support native function calling, populate LLMResponse.tool_calls and LLMResponse.finish_reason from your provider's response format.

MockBackend for Testing

from agenticapi.runtime.llm.base import ToolCall
from agenticapi.runtime.llm.mock import MockBackend

backend = MockBackend(responses=["SELECT COUNT(*) FROM orders", "result = 42"])
response = await backend.generate(prompt)
assert response.content == "SELECT COUNT(*) FROM orders"
assert backend.call_count == 1

# Queue a native tool-call response for the next tools-enabled request:
backend.add_tool_call_response([
    ToolCall(id="call_1", name="count_orders", arguments={"status": "shipped"}),
])

When MockBackend.generate() receives a prompt with tools and a tool-call response is queued, it returns the queued ToolCalls with finish_reason="tool_calls". When no tool-call response is queued, it falls back to the next text response. If tool_choice="required", it synthesises a call to the first declared tool even when no response is queued.

Multi-Turn Tool Conversations

For multi-turn tool conversations (e.g. the agentic loop), LLMMessage carries two optional fields for provider-native format translation:

from agenticapi.runtime.llm.base import LLMMessage, ToolCall

# Assistant message with tool calls
assistant_msg = LLMMessage(
    role="assistant",
    content="Let me calculate that.",
    tool_calls=[ToolCall(id="call_1", name="calc", arguments={"expr": "7*6"})],
)

# Tool result linked back to the call
tool_msg = LLMMessage(
    role="tool",
    content='{"result": 42}',
    tool_call_id="call_1",
)

Each backend translates these into provider-native format:

Provider Assistant Tool Calls Tool Results
Anthropic tool_use content blocks with id, name, input user message with tool_result block keyed by tool_use_id
OpenAI tool_calls array with function objects (JSON-encoded args) tool role with tool_call_id
Gemini function_call Parts on model message function_response Parts on user message (name resolved to actual function name)

The agentic loop (run_agentic_loop() and run_agentic_loop_streaming()) automatically populates these fields on every iteration.

Integration Testing with Real Providers

Integration tests verify end-to-end tool calling against real APIs:

# Run when API keys are available (skipped otherwise)
ANTHROPIC_API_KEY=sk-... OPENAI_API_KEY=sk-... GOOGLE_API_KEY=... \
  uv run pytest tests/integration/llm/ -v --timeout=60

Each test sends a calculator tool definition, asserts the LLM calls the tool, sends the result back, and asserts the final answer contains "42".