Policies¶
Policy (Base)¶
Policy ¶
Bases: BaseModel
Base class for all harness policies.
Subclasses implement evaluate() to check generated code against their specific constraints. Policies are pure computation (sync, no I/O) and must be deterministic for a given input.
Example
class MyPolicy(Policy): max_lines: int = 100
def evaluate(self, *, code: str, **kwargs: Any) -> PolicyResult:
lines = code.count("\n") + 1
if lines > self.max_lines:
return PolicyResult(
allowed=False,
violations=[f"Code has {lines} lines, max is {self.max_lines}"],
policy_name="MyPolicy",
)
return PolicyResult(allowed=True, policy_name="MyPolicy")
Source code in src/agenticapi/harness/policy/base.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
evaluate ¶
evaluate(
*,
code: str,
intent_action: str = "",
intent_domain: str = "",
**kwargs: Any,
) -> PolicyResult
Evaluate generated code against this policy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
The generated Python source code to evaluate. |
required |
intent_action
|
str
|
The classified action type (read, write, etc.). |
''
|
intent_domain
|
str
|
The domain of the request (order, product, etc.). |
''
|
**kwargs
|
Any
|
Additional context for evaluation. |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
PolicyResult indicating whether the code is allowed. |
Source code in src/agenticapi/harness/policy/base.py
evaluate_intent_text ¶
evaluate_intent_text(
*,
intent_text: str,
intent_action: str = "",
intent_domain: str = "",
**kwargs: Any,
) -> PolicyResult
Evaluate raw user intent text before it reaches the LLM.
Called by the framework before the LLM fires, so policies can block prompt injection, PII, or other unsafe content at the earliest possible point. Policies whose domain is generated code leave the default allow-everything implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
intent_text
|
str
|
The raw natural-language string from the request. |
required |
intent_action
|
str
|
The classified intent action, if available. |
''
|
intent_domain
|
str
|
The classified intent domain, if available. |
''
|
**kwargs
|
Any
|
Additional context for evaluation. |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
class: |
Source code in src/agenticapi/harness/policy/base.py
evaluate_tool_call ¶
evaluate_tool_call(
*,
tool_name: str,
arguments: dict[str, Any],
intent_action: str = "",
intent_domain: str = "",
**kwargs: Any,
) -> PolicyResult
Evaluate a direct tool call against this policy (Phase E4).
The harness's tool-first execution path skips code
generation entirely when the LLM returns a structured
function call. In that case there's no generated code to run
through :meth:evaluate; instead, every registered policy is
asked whether the call itself — identified by the tool's
name plus the keyword arguments the model produced — is
allowed.
Subclasses override this hook to enforce constraints at the
tool-call boundary. CodePolicy uses the default
allow-everything behaviour (its domain is AST analysis of
generated code, not tool arguments). DataPolicy uses it
to block DDL tool names (drop_*, truncate_*) and
argument values that match restricted tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool_name
|
str
|
The name of the tool the model wants to call. |
required |
arguments
|
dict[str, Any]
|
The keyword arguments the model produced for the tool. Always a dict. |
required |
intent_action
|
str
|
The classified intent action. Available for rules that care about read/write/destructive. |
''
|
intent_domain
|
str
|
The classified intent domain. |
''
|
**kwargs
|
Any
|
Additional context for evaluation. |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
class: |
PolicyResult
|
every tool call. Subclasses narrow as needed. |
Source code in src/agenticapi/harness/policy/base.py
PolicyResult¶
PolicyResult ¶
Bases: BaseModel
Result of a policy evaluation.
Attributes:
| Name | Type | Description |
|---|---|---|
allowed |
bool
|
Whether the code is allowed under this policy. |
violations |
list[str]
|
List of violation descriptions if not allowed. |
warnings |
list[str]
|
List of non-blocking warnings. |
policy_name |
str
|
Name of the policy that produced this result. |
Source code in src/agenticapi/harness/policy/base.py
CodePolicy¶
CodePolicy ¶
Bases: Policy
Policy that validates generated code against module and pattern restrictions.
Uses AST parsing to detect dangerous patterns such as forbidden imports, eval/exec usage, dynamic imports, and network access.
Attributes:
| Name | Type | Description |
|---|---|---|
allowed_modules |
list[str]
|
Whitelist of allowed modules (empty = no whitelist filtering). |
denied_modules |
list[str]
|
Blacklist of denied modules. |
max_code_lines |
int
|
Maximum number of lines allowed in generated code. |
deny_eval_exec |
bool
|
Whether to deny eval() and exec() calls. |
deny_dynamic_import |
bool
|
Whether to deny import() calls. |
allow_network |
bool
|
Whether to allow network-related modules. |
allowed_hosts |
list[str]
|
Whitelist of allowed hosts (unused in static analysis). |
Source code in src/agenticapi/harness/policy/code_policy.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
evaluate ¶
Evaluate generated code against code restrictions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
The generated Python source code. |
required |
**kwargs
|
Any
|
Additional context (ignored). |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
PolicyResult with any violations found. |
Source code in src/agenticapi/harness/policy/code_policy.py
DataPolicy¶
DataPolicy ¶
Bases: Policy
Policy that validates SQL and data access patterns in generated code.
Enforces table-level access controls, column restrictions, and DDL prevention through regex-based SQL pattern detection.
Attributes:
| Name | Type | Description |
|---|---|---|
readable_tables |
list[str]
|
Tables allowed for SELECT queries (empty = all allowed). |
writable_tables |
list[str]
|
Tables allowed for INSERT/UPDATE/DELETE (empty = all allowed). |
restricted_columns |
list[str]
|
Column references to deny, e.g. ["users.password_hash"]. |
max_query_duration_ms |
int
|
Maximum allowed query duration hint. |
max_result_rows |
int
|
Maximum result rows hint. |
deny_ddl |
bool
|
Whether to deny DDL statements (DROP, ALTER, CREATE, TRUNCATE). |
Source code in src/agenticapi/harness/policy/data_policy.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 | |
evaluate ¶
Evaluate generated code for data access policy violations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
The generated Python source code containing SQL. |
required |
**kwargs
|
Any
|
Additional context (ignored). |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
PolicyResult with any violations found. |
Source code in src/agenticapi/harness/policy/data_policy.py
evaluate_tool_call ¶
evaluate_tool_call(
*,
tool_name: str,
arguments: dict[str, Any],
intent_action: str = "",
intent_domain: str = "",
**kwargs: Any,
) -> PolicyResult
Block destructive tool calls and forbidden tables (Phase E4).
Enforcement layers:
deny_ddl=Trueblocks any tool whose name starts withdrop_,truncate_,alter_, orcreate_tableregardless of arguments. This is the "stop someone from exposing adrop_tabletool" safety net.- When
readable_tablesis set and the call looks like a read (intent_action in {"read","search","aggregate"}or the argument dict contains atablekey), the table name is checked against the whitelist. - When
writable_tablesis set and the argument dict contains atablekey (common for tool shapes likeinsert_row(table=..., row=...)), the table name is checked against the write whitelist. restricted_columnsmatches<table>.<column>in any argument value that's a string (handy for free-form SQL passed as a parameter).
The default shape (no lists configured) is permissive except for the DDL name check, so turning on DataPolicy for a tool-first endpoint does not silently break unrelated tools.
Source code in src/agenticapi/harness/policy/data_policy.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
ResourcePolicy¶
ResourcePolicy ¶
Bases: Policy
Policy that enforces resource limits on generated code.
Primarily stores resource limits for sandbox enforcement, but also performs basic static checks for obviously resource-intensive patterns.
Attributes:
| Name | Type | Description |
|---|---|---|
max_cpu_seconds |
float
|
Maximum CPU time in seconds. |
max_memory_mb |
int
|
Maximum memory usage in megabytes. |
max_execution_time_seconds |
float
|
Maximum wall-clock execution time. |
max_concurrent_operations |
int
|
Maximum concurrent operations. |
max_cost_per_request_usd |
float
|
Maximum estimated cost per request. |
Source code in src/agenticapi/harness/policy/resource_policy.py
evaluate ¶
Evaluate generated code for resource-intensive patterns.
Performs basic static analysis to detect obviously expensive operations like deeply nested loops or very large ranges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
The generated Python source code. |
required |
**kwargs
|
Any
|
Additional context (ignored). |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
PolicyResult with any violations or warnings found. |
Source code in src/agenticapi/harness/policy/resource_policy.py
RuntimePolicy¶
RuntimePolicy ¶
Bases: Policy
Dynamic policy evaluation for runtime constraints.
Checks code complexity via AST node count and enforces configurable limits. Future versions will integrate with middleware for rate limiting and authentication.
Example
policy = RuntimePolicy(max_code_complexity=500) result = policy.evaluate(code="x = 1") assert result.allowed is True
Source code in src/agenticapi/harness/policy/runtime_policy.py
evaluate ¶
evaluate(
*,
code: str,
intent_action: str = "",
intent_domain: str = "",
**kwargs: Any,
) -> PolicyResult
Evaluate runtime constraints on the generated code.
Checks: - Code complexity via AST node count - Code length (line count)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
code
|
str
|
The generated Python source code. |
required |
intent_action
|
str
|
The classified action type. |
''
|
intent_domain
|
str
|
The domain of the request. |
''
|
**kwargs
|
Any
|
Additional context. |
{}
|
Returns:
| Type | Description |
|---|---|
PolicyResult
|
PolicyResult indicating whether the code passes runtime checks. |
Source code in src/agenticapi/harness/policy/runtime_policy.py
BudgetPolicy¶
Cost-governance primitive with per-request, per-session, per-user-per-day, and per-endpoint-per-day scopes.
In the current implementation, the real budget logic lives in estimate_and_enforce(...) and record_actual(...). See the guide for the current explicit integration pattern.
See the Cost Budgeting guide for usage patterns.
BudgetPolicy ¶
Bases: Policy
Cost-budget enforcement policy.
Composes with the rest of the harness via :class:PolicyEvaluator,
but its real entry point is :meth:estimate_and_enforce, which
the framework calls before the LLM call fires. The standard
:meth:Policy.evaluate hook is also implemented (returning a
no-op result) so existing harness pipelines that pass code-only
policies through PolicyEvaluator continue to compose.
Example
from agenticapi import BudgetPolicy from agenticapi.harness.policy.pricing import PricingRegistry
pricing = PricingRegistry.default() budget = BudgetPolicy( pricing=pricing, max_per_request_usd=0.50, max_per_session_usd=5.00, max_per_user_per_day_usd=50.00, ) harness = HarnessEngine(policies=[budget, ...])
Source code in src/agenticapi/harness/policy/budget_policy.py
157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 | |
evaluate ¶
evaluate(
*,
code: str,
intent_action: str = "",
intent_domain: str = "",
**kwargs: Any,
) -> PolicyResult
No-op for code-level evaluation.
BudgetPolicy operates on the LLM-call boundary via
:meth:estimate_and_enforce and :meth:record_actual,
not on the generated code itself. This stub keeps it
composable with the existing :class:PolicyEvaluator.
Source code in src/agenticapi/harness/policy/budget_policy.py
estimate_and_enforce ¶
Estimate cost for the upcoming LLM call and enforce all budgets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ctx
|
BudgetEvaluationContext
|
Per-request context with the active session/user/endpoint and the model + token estimates. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
The |
CostEstimate
|
class: |
Raises:
| Type | Description |
|---|---|
BudgetExceeded
|
If any configured budget would be breached. |
Source code in src/agenticapi/harness/policy/budget_policy.py
record_actual ¶
record_actual(
ctx: BudgetEvaluationContext,
*,
actual_input_tokens: int,
actual_output_tokens: int,
) -> float
Record the actual spend after the LLM call returns.
The pre-call estimate uses the worst-case max_output_tokens;
the post-call reconciliation replaces that estimate with the
actual usage so the running totals reflect what really happened.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ctx
|
BudgetEvaluationContext
|
The same context used for the pre-call estimate. |
required |
actual_input_tokens
|
int
|
Real input tokens reported by the LLM. |
required |
actual_output_tokens
|
int
|
Real output tokens reported by the LLM. |
required |
Returns:
| Type | Description |
|---|---|
float
|
The actual cost in USD that was added to the running totals. |
Source code in src/agenticapi/harness/policy/budget_policy.py
current_spend ¶
Return the running spend in USD for a given scope/key.
BudgetEvaluationContext
dataclass
¶
Per-request context that callers pass into :meth:BudgetPolicy.estimate_and_enforce.
Attributes:
| Name | Type | Description |
|---|---|---|
endpoint_name |
str
|
Endpoint receiving the call. |
session_id |
str | None
|
Optional session identifier. |
user_id |
str | None
|
Optional authenticated user identifier. |
model |
str
|
LLM model identifier. |
input_tokens |
int
|
Estimated prompt token count. |
max_output_tokens |
int
|
Cap on output tokens (used for the estimate). |
Source code in src/agenticapi/harness/policy/budget_policy.py
CostEstimate
dataclass
¶
Result of a pre-call cost estimate.
Attributes:
| Name | Type | Description |
|---|---|---|
model |
str
|
Model the estimate was made for. |
estimated_input_tokens |
int
|
Token count fed into the estimate. |
estimated_output_tokens |
int
|
Worst-case output token count. |
estimated_cost_usd |
float
|
Computed worst-case cost. |
Source code in src/agenticapi/harness/policy/budget_policy.py
SpendStore ¶
Bases: Protocol
Protocol for the running-spend tracker.
Implementations may be in-memory, Redis-backed, database-backed, or anything else. The protocol is intentionally tiny so that high-traffic deployments can swap in a sharded backend without touching the policy.
Source code in src/agenticapi/harness/policy/budget_policy.py
InMemorySpendStore ¶
Process-local :class:SpendStore keyed by scope/key/day.
Day-scoped totals key off (scope, key, isoformat(day)) so the
same store cleanly handles per-day budgets without rollover code.
Other scopes ignore the day component.
Source code in src/agenticapi/harness/policy/budget_policy.py
PricingRegistry¶
Per-1k-token pricing table with a factory that ships the April 2026 public-price snapshot. Accepts overrides for custom or fine-tuned models.
PricingRegistry ¶
Mutable registry of model → pricing.
Example
pricing = PricingRegistry.default()
Override with negotiated contract pricing:¶
pricing.set("claude-sonnet-4-6", input_usd_per_1k=2.40, output_usd_per_1k=12.00) cost = pricing.estimate_cost( model="claude-sonnet-4-6", input_tokens=1500, output_tokens=400, )
cost == (1500 * 2.40 + 400 * 12.00) / 1000 == $8.40¶
Source code in src/agenticapi/harness/policy/pricing.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
__init__ ¶
Initialize the registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prices
|
dict[str, ModelPricing] | None
|
Optional initial pricing map. If omitted, the
registry starts empty (use :meth: |
None
|
Source code in src/agenticapi/harness/policy/pricing.py
default
classmethod
¶
set ¶
set(
model: str,
*,
input_usd_per_1k: float,
output_usd_per_1k: float,
cache_read_usd_per_1k: float | None = None,
cache_write_usd_per_1k: float | None = None,
) -> None
Register or override pricing for a single model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier as reported by the LLM backend. |
required |
input_usd_per_1k
|
float
|
USD per 1 000 input tokens. |
required |
output_usd_per_1k
|
float
|
USD per 1 000 output tokens. |
required |
cache_read_usd_per_1k
|
float | None
|
Optional cache-read price. |
None
|
cache_write_usd_per_1k
|
float | None
|
Optional cache-write price. |
None
|
Source code in src/agenticapi/harness/policy/pricing.py
get ¶
estimate_cost ¶
estimate_cost(
*,
model: str,
input_tokens: int,
output_tokens: int,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
) -> float
Estimate USD cost for a single LLM call.
Unknown models cost 0.0 (with a warning) so the framework
degrades gracefully on a fresh model rather than raising — a
production deployment can opt into strict mode by checking
get(model) is None first.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier. |
required |
input_tokens
|
int
|
Number of prompt tokens. |
required |
output_tokens
|
int
|
Number of completion tokens. |
required |
cache_read_tokens
|
int
|
Optional cache-read token count. |
0
|
cache_write_tokens
|
int
|
Optional cache-write token count. |
0
|
Returns:
| Type | Description |
|---|---|
float
|
Estimated cost in USD. |
Source code in src/agenticapi/harness/policy/pricing.py
known_models ¶
ModelPricing
dataclass
¶
Per-1k-token pricing for a single LLM model.
Attributes:
| Name | Type | Description |
|---|---|---|
input_usd_per_1k |
float
|
Cost per 1 000 prompt (input) tokens. |
output_usd_per_1k |
float
|
Cost per 1 000 completion (output) tokens. |
cache_read_usd_per_1k |
float | None
|
Optional cost per 1 000 cache-read tokens. When None, treated as equal to input_usd_per_1k. |
cache_write_usd_per_1k |
float | None
|
Optional cost per 1 000 cache-write tokens. When None, treated as equal to input_usd_per_1k. |