Claude Agent SDK Extension — Implementation Plan¶
Note (post-migration). This extension has been merged into the main package as
agentharnessapi[claude-agent-sdk]. Import fromagenticapi.ext.claude_agent_sdk. This document is preserved as historical design rationale.
Status: Shipped (merged into main package as optional extra)
Target version: agentharnessapi[claude-agent-sdk] (was agenticapi-claude-agent-sdk v0.1.0)
Owner: Core team
Author: Implementation plan generated 2026-04-10
1. Goals¶
Build an independently-installable extension package that integrates the Claude Agent SDK into AgenticAPI. The extension turns the SDK's full agentic loop (planning → tool use → reflection → final answer) into a first-class AgenticAPI execution strategy, while preserving AgenticAPI's harness guarantees: policy enforcement, audit trails, approval workflows, and tool registries.
What success looks like¶
A user can:
from agenticapi import AgenticApp, CodePolicy
from agenticapi_claude_agent_sdk import ClaudeAgentRunner, ClaudeAgentSDKBackend
app = AgenticApp(title="claude-sdk-demo")
runner = ClaudeAgentRunner(
system_prompt="You are a helpful coding assistant.",
allowed_tools=["Read", "Glob", "Grep"],
policies=[CodePolicy(denied_modules=["os", "subprocess"])],
)
@app.agent_endpoint(name="assistant", autonomy_level="manual")
async def assistant(intent, context):
return await runner.run(intent=intent, context=context)
…and that single endpoint will run a full Claude Agent SDK session, with
AgenticAPI policies bridged into Claude's permission/hook system, and the
result wrapped into an AgentResponse with execution trace recorded.
Non-goals (this iteration)¶
- Replacing AgenticAPI's
LLMBackendfor the harness's code-generation path. The SDK runs whole agent loops, not single-shot code generation, so wiring it intoHarnessEngine.execute()would force a leaky abstraction. We expose it as a parallel execution path, not a substitute forCodeGenerator. - Multi-turn conversational state via
ClaudeSDKClient. We use the simplerquery()one-shot interface in v0.1; multi-turn comes in v0.2. - Subagent support (
agents=) — listed as a v0.2 follow-up.
2. Why a separate package (not agenticapi[claude-sdk])¶
| Concern | Optional extra | Separate package |
|---|---|---|
| Install size | claude-agent-sdk is ~56 MB; users not using it pay nothing |
Same (only installed if requested) |
| Versioning | Locked to AgenticAPI release cycle | Can iterate independently as the SDK evolves |
| Coupling | Risk of importing SDK types into core | Hard boundary via published API |
| Discoverability | Hidden behind [extras] syntax |
First-class on PyPI |
| Precedent | agenticapi[mcp] (small, stable dep) |
Better fit for large, fast-moving deps |
We pick separate package. The Claude Agent SDK is large and evolving; an independent package lets us follow it without forcing AgenticAPI releases.
Layout in this repo¶
extensions/
agenticapi-claude-agent-sdk/
pyproject.toml # Independent package metadata
README.md
LICENSE # Inherits Apache-2.0
src/agenticapi_claude_agent_sdk/
__init__.py # Public API surface
_imports.py # Lazy-import shim with friendly errors
backend.py # ClaudeAgentSDKBackend (LLMBackend protocol)
runner.py # ClaudeAgentRunner (high-level entry point)
tools.py # Tool bridge (AgenticAPI Tool → SDK MCP tool)
permissions.py # Policy → can_use_tool + hooks bridge
messages.py # SDK message → AgentResponse adapter
options.py # ClaudeAgentOptions builder helpers
exceptions.py # ClaudeAgentSDKError + subclasses
types.py # Re-exported types and small dataclasses
tests/
__init__.py
conftest.py # Stub claude_agent_sdk for offline tests
test_backend.py
test_runner.py
test_tools.py
test_permissions.py
test_messages.py
examples/
01_simple_query.py
02_with_agenticapi_tools.py
03_with_policies.py
README.md
3. Architecture¶
3.1 Component diagram¶
┌────────────────────────────────────────────┐
│ AgenticAPI app code │
└──────────────┬─────────────────────────────┘
│ intent, context
▼
┌────────────────────────────────────────────┐
│ ClaudeAgentRunner │
│ (the high-level orchestrator) │
└──────┬──────┬──────┬──────┬────────────────┘
│ │ │ │
┌──────────────┘ │ │ └──────────────────┐
▼ ▼ ▼ ▼
┌───────────────┐ ┌────────────┐ ┌──────────────┐ ┌────────────────┐
│ Tool bridge │ │ Permission │ │ Options │ │ Message │
│ (AgenticAPI │ │ adapter │ │ builder │ │ adapter │
│ Tool → SDK │ │ (Policies →│ │ │ │ (SDK msgs → │
│ MCP tool) │ │ hooks + │ │ │ │ AgentResponse)│
│ │ │ can_use) │ │ │ │ │
└──────┬────────┘ └──────┬─────┘ └──────┬───────┘ └────────┬───────┘
│ │ │ │
└────────────────────┴──────┬───────┴────────────────────┘
▼
┌────────────────────────────────────────────┐
│ claude_agent_sdk.query(prompt, options) │
└────────────────────────────────────────────┘
3.2 Execution flow¶
runner.run(intent, context)is called.- Runner builds an
ClaudeAgentOptionsvia the options builder that: - merges built-in SDK tools (
allowed_tools) with the AgenticAPI tool bridge, - installs a
can_use_toolcallback derived from AgenticAPI policies, - installs a
PreToolUsehook for static analysis onBash/Write/Edit, - installs a
PostToolUsehook to mirror tool calls into the audit trace, - sets
cwd,env,model,permission_mode,max_turns,system_prompt. - Runner calls
claude_agent_sdk.query(prompt=intent.raw, options=...)and iterates the async stream of messages. - The message adapter collects
AssistantMessagecontent blocks, builds atool_callslist fromToolUseBlocks, captures the finalResultMessage, and produces anAgentResponsewith: result=ResultMessage.result(orstructured_outputif set),reasoning= concatenatedThinkingBlocktext,generated_code= collectedBash/Writetool input snippets,confidence= derived fromis_error/subtype,execution_trace_id= recorded viaHarnessEngine.audit_recorderif a harness is supplied.- Optional: when an
audit_recorderis configured, the runner records a completeExecutionTracefor the SDK session.
3.3 Key design choices and trade-offs¶
| Decision | Rationale |
|---|---|
Use query() (one-shot) not ClaudeSDKClient |
Maps cleanly to one HTTP request → one agent invocation; no session state to manage; trivially cancellable |
| Lazy-import the SDK | Tests and import agenticapi_claude_agent_sdk succeed even when the SDK isn't installed; only ClaudeAgentRunner.run() requires it |
Bridge AgenticAPI policies via can_use_tool AND a PreToolUse hook |
can_use_tool is the documented permission gate, but hooks fire even in bypassPermissions mode and can mutate input — defence in depth |
Wrap AgenticAPI Tool instances as SDK MCP tools via create_sdk_mcp_server |
In-process, no subprocess, no JSON-RPC — minimal overhead and no extra deps |
Run static analysis on Bash/Write/Edit inputs in the hook |
Prevents the SDK from running shell commands that AgenticAPI's CodePolicy.denied_modules would forbid in generated Python |
Don't try to make the SDK a LLMBackend for HarnessEngine |
Single-shot and full-loop are different abstractions. Forcing them to share an interface would muddy both. The runner is a peer, not a replacement |
4. Public API (v0.1)¶
# extensions/agenticapi-claude-agent-sdk/src/agenticapi_claude_agent_sdk/__init__.py
from agenticapi_claude_agent_sdk.backend import ClaudeAgentSDKBackend
from agenticapi_claude_agent_sdk.exceptions import (
ClaudeAgentSDKError,
ClaudeAgentSDKNotInstalledError,
ClaudeAgentSDKRunError,
)
from agenticapi_claude_agent_sdk.messages import AgentSessionResult
from agenticapi_claude_agent_sdk.permissions import HarnessPermissionAdapter
from agenticapi_claude_agent_sdk.runner import ClaudeAgentRunner
from agenticapi_claude_agent_sdk.tools import (
build_sdk_mcp_server_from_registry,
sdk_tool_from_agenticapi_tool,
)
__version__ = "0.1.0"
__all__ = [
"AgentSessionResult",
"ClaudeAgentRunner",
"ClaudeAgentSDKBackend",
"ClaudeAgentSDKError",
"ClaudeAgentSDKNotInstalledError",
"ClaudeAgentSDKRunError",
"HarnessPermissionAdapter",
"__version__",
"build_sdk_mcp_server_from_registry",
"sdk_tool_from_agenticapi_tool",
]
Class signatures¶
class ClaudeAgentRunner:
def __init__(
self,
*,
system_prompt: str | None = None,
model: str | None = None,
allowed_tools: Sequence[str] = (),
disallowed_tools: Sequence[str] = (),
permission_mode: str = "default",
max_turns: int | None = None,
cwd: str | Path | None = None,
env: Mapping[str, str] | None = None,
tool_registry: ToolRegistry | None = None,
policies: Sequence[Policy] = (),
audit_recorder: AuditRecorder | None = None,
approval_workflow: ApprovalWorkflow | None = None,
extra_options: Mapping[str, Any] | None = None,
mcp_server_name: str = "agenticapi",
) -> None: ...
async def run(
self,
*,
intent: Intent,
context: AgentContext,
) -> AgentResponse: ...
async def stream(
self,
*,
intent: Intent,
context: AgentContext,
) -> AsyncIterator[AgentSessionEvent]: ...
class ClaudeAgentSDKBackend:
"""LLMBackend protocol implementation for one-shot text completion.
Wraps query() so the SDK can be used wherever AgenticAPI expects an
LLMBackend (intent parsing, code generation). Does NOT expose the full
agent loop — for that, use ClaudeAgentRunner.
"""
def __init__(
self,
*,
model: str | None = None,
system_prompt: str | None = None,
permission_mode: str = "bypassPermissions", # text-only mode
extra_options: Mapping[str, Any] | None = None,
) -> None: ...
async def generate(self, prompt: LLMPrompt) -> LLMResponse: ...
async def generate_stream(self, prompt: LLMPrompt) -> AsyncIterator[LLMChunk]: ...
@property
def model_name(self) -> str: ...
5. Implementation priorities¶
Implementation is split into phases. Each phase is independently shippable.
Phase A — minimum viable extension (this iteration)¶
- Lazy import shim (
_imports.py) — friendly error if SDK not installed. - Tool bridge (
tools.py) — convertTool→SdkMcpTool, build MCP server. - Permission adapter (
permissions.py) —can_use_tool+PreToolUsehook wired toPolicyEvaluatorandcheck_code_safety. - Message adapter (
messages.py) — collect SDK stream →AgentSessionResult. - Options builder (
options.py) — assembleClaudeAgentOptions. - Runner (
runner.py) — high-levelrun()returningAgentResponse. - LLM backend (
backend.py) — thin wrapper for one-shot use. - Tests — offline tests using a stubbed
claude_agent_sdkmodule. - Example —
01_simple_query.pyand02_with_agenticapi_tools.py. - README — install, usage, links to AgenticAPI docs.
Phase B — follow-up (v0.2)¶
ClaudeSDKClient-backed multi-turnClaudeAgentSessionclass.- Subagent support (
agents=). - Approval workflow integration (raise
ApprovalRequiredfrom a hook). - Streaming
AgentResponsevia StarletteStreamingResponse. - More example apps; e2e test against the real SDK in CI (gated by env var).
Phase C — production polish (v0.3+)¶
- OpenTelemetry spans around each tool call.
- Cost accounting from
ResultMessage.total_cost_usdinto AgenticAPI's audit. - Custom permission UI integration.
- Retry/backoff on
RateLimitEvent.
6. Test strategy¶
The Claude Agent SDK requires the Claude CLI binary and (optionally) network access to Anthropic. We do not make those mandatory for AgenticAPI's CI.
- Unit tests stub
claude_agent_sdkwith a fake module fixture (conftest.py) that emits a deterministic message stream. This covers: ClaudeAgentRunner.run()happy path,- permission denial via
can_use_toolreturningPermissionResultDeny, - hook-based static analysis blocking a
Bashrm -rf /call, - tool bridge converting an AgenticAPI
Tooland invoking it, - error mapping (CLI not found →
ClaudeAgentSDKNotInstalledError). - Integration tests (gated by
RUN_CLAUDE_SDK_INTEGRATION=1andANTHROPIC_API_KEY) run against the real SDK. Skipped by default.
Coverage goal: ≥ 90 % for the extension's src/.
7. Quality gates¶
Mirrors AgenticAPI's CI requirements:
uv run ruff format --check extensions/agenticapi-claude-agent-sdk/
uv run ruff check extensions/agenticapi-claude-agent-sdk/
uv run mypy extensions/agenticapi-claude-agent-sdk/src
uv run pytest extensions/agenticapi-claude-agent-sdk/tests
The extension's pyproject.toml declares agenticapi >=0.1.0 and
claude-agent-sdk >=0.1.58 as runtime dependencies, plus a [dev] extra
mirroring the main repo's tooling versions.
8. Risks and mitigations¶
| Risk | Mitigation |
|---|---|
| Claude Agent SDK API changes | Public types are imported lazily and pinned via >=0.1.58,<0.2; we re-pin per release with a smoke test |
| Users confused about runner vs harness | README has a clear "when to use what" section; the runner accepts an optional audit_recorder so users can still capture traces |
| Policy bridge gaps (SDK has tools we don't model) | Default to deny in can_use_tool for unmodelled tool names when policies are present |
Bash tool bypassing AgenticAPI policies |
The PreToolUse hook on Bash parses the command and rejects denied modules / shell builtins; documented as best-effort, not a kernel sandbox |
| Subprocess invocation overhead | Documented; Phase B will offer a long-lived ClaudeAgentSession reusing one CLI process |
9. Out of scope (and why)¶
- Replacing the harness's code generator with the SDK. Different shapes (loop vs single shot). Forcing them together would invent a third interface that fits neither.
- Container/VM sandboxing of the SDK process. AgenticAPI's
ProcessSandboxisolates its own subprocess; the SDK starts its own CLI process which is out of our isolation boundary. We document this; container isolation is a Phase 2 topic for the AgenticAPI core (harness/sandbox/container.py). - A custom CLI for the extension. The runner is a library, used from a
user's
AgenticApp. No new CLI surface.