Playground API
The Playground API provides sandbox evaluation and test case management. All endpoints are prefixed with /api/v1.
Sandbox Evaluation
POST /api/v1/playground/evaluate
Evaluate sample code or a scenario against an inline rule definition. No audit log, no cache, no persistence.
The endpoint accepts two input modes: code (via sample_code) and scenario (via sample_facts). Each mode triggers a different evaluation prompt optimized for that input type.
Code evaluation request:
{
"rule_statement": "All public functions must have type hints on parameters and return values",
"rule_modality": "MUST",
"rule_severity": "MEDIUM",
"sample_code": "def greet(name):\n return f'Hello, {name}!'"
}
Scenario evaluation request:
{
"rule_statement": "Monthly overtime must not exceed 45 hours without prior 36-agreement filing",
"rule_modality": "MUST_NOT",
"rule_severity": "HIGH",
"sample_facts": {
"narrative": "Employee John submitted 52 hours of overtime for April 2026. No 36-agreement has been filed.",
"employee_id": "E001",
"overtime_hours": "52",
"month": "2026-04",
"agreement_filed": "false"
}
}
| Field | Type | Required | Description |
|---|---|---|---|
rule_statement |
string | yes | The natural-language rule to evaluate against |
rule_modality |
string | no | MUST, MUST_NOT, SHOULD, MAY, or INFO (default: MUST) |
rule_severity |
string | no | LOW, MEDIUM, HIGH, or CRITICAL (default: MEDIUM) |
sample_code |
string | no | Code snippet or unified diff (triggers code evaluation prompt) |
sample_facts |
object | no | Key-value facts with optional narrative key (triggers scenario evaluation prompt) |
If both sample_code and sample_facts are provided, sample_code takes precedence. If neither is provided, the evaluation runs against an empty context.
Response:
{
"verdict": "DENY",
"confidence": 0.95,
"reasoning": "The employee's 52 overtime hours exceed the 45-hour limit, and no 36-agreement has been filed.",
"issue_description": "Overtime of 52 hours exceeds the 45-hour limit without a 36-agreement.",
"fix_suggestion": "File a 36-agreement before registering overtime above 45 hours, or reduce overtime to 45 hours or fewer.",
"locations": []
}
| Field | Type | Description |
|---|---|---|
verdict |
string | ALLOW, DENY, or NEEDS_CONFIRMATION |
confidence |
float | Model confidence (0.0--1.0) |
reasoning |
string | Explanation of the verdict |
issue_description |
string | What's wrong (empty string if ALLOW) |
fix_suggestion |
string or null | Suggested fix if the verdict is DENY or NEEDS_CONFIRMATION |
locations |
array | Code locations relevant to the verdict (typically empty for scenario evaluations) |
Test Cases
POST /api/v1/rules/{rule_id}/test-cases
Create a test case for a rule.
Request:
{
"name": "Missing type hints",
"sample_input": "def add(a, b):\n return a + b",
"input_type": "code",
"expected_verdict": "DENY"
}
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Human-readable label |
sample_input |
string | yes | Code snippet or scenario text |
input_type |
string | no | "code" (default) or "facts" — determines how the test runner evaluates the input |
expected_verdict |
string | yes | ALLOW or DENY |
GET /api/v1/rules/{rule_id}/test-cases
List all test cases for a rule. Returns an array of test case objects.
DELETE /api/v1/rules/{rule_id}/test-cases/{test_case_id}
Delete a specific test case.
POST /api/v1/rules/{rule_id}/test-cases/run
Run all test cases for a rule through sandbox evaluation. The input_type on each test case determines whether the input is treated as code or facts. Returns per-case results and an overall pass rate.
Response:
{
"total": 5,
"passing": 4,
"failing": 1,
"results": [
{
"id": "tc-001",
"name": "Missing type hints",
"sample_input": "def add(a, b):\n return a + b",
"input_type": "code",
"expected_verdict": "DENY",
"last_result": "DENY",
"passing": true,
"last_run_at": "2026-04-29T10:30:00Z"
},
{
"id": "tc-002",
"name": "Fully typed function",
"sample_input": "def add(a: int, b: int) -> int:\n return a + b",
"input_type": "code",
"expected_verdict": "ALLOW",
"last_result": "NEEDS_CONFIRMATION",
"passing": false,
"last_run_at": "2026-04-29T10:30:01Z"
}
]
}
POST /api/v1/rules/{rule_id}/test-cases/generate
Generate test cases using Gemini. The LLM analyzes the rule statement and produces synthetic test cases that cover compliant and non-compliant scenarios.
Request:
{
"count": 6
}
Response: an array of generated test case objects (persisted to the database). Each case has a name, sample_input, input_type, and expected_verdict.
See Also
- Rule Playground Architecture -- design overview
- Evaluate API -- production evaluation endpoint