Skip to content

Data Stores

The Rule Repository uses three data stores, each serving a specific role.

PostgreSQL (System of Record)

PostgreSQL 17 holds the canonical data. All writes go through PostgreSQL first. The database contains the following tables:

Table Purpose
rules Rule statements with metadata (modality, severity, scope, tags, effective period, governance fields).
rule_revisions Immutable revision history for every rule change.
rule_relationships Directed relationships between rules (type, source, target).
audit_log Append-only, hash-chained log of all evaluations and extractions. Each row links to the previous via a hash chain column. Updates and deletes are rejected by a database trigger.
documents Uploaded source documents (filename, MIME type, size, storage path).
extractions Results from the extraction pipeline (candidate rules, model ID, review status).
api_keys API key records for authenticated access.
llm_cache Cached LLM responses keyed by hash of inputs, model, and prompt version. Invalidated on rule revision.
enforcement_policies Gateway policies that map webhook events to evaluation rules (event source, type pattern, scope, mode, response actions).
gateway_evaluations Results from gateway webhook evaluations (policy, event, verdict, actions taken).
discovery_scans Rule discovery scan records (status, source types, candidate count).
discovery_candidates Candidate rules proposed by discovery analyzers (pending review).
corrections Human correction feedback entries from the feedback loop.
rule_federations Federation hierarchy nodes (organization, team, project levels).
rule_federation_memberships Rule-to-federation assignments with optional parent override.
rule_test_cases Playground test cases attached to rules (input, expected verdict).
alerts Alerts raised by intelligence workers (dormant, high deny rate, health decline).
rule_health_scores Persisted health score snapshots computed by background workers.
rule_recommendations Automated improvement recommendations for rules.
rule_set_snapshots Immutable versioned captures of the rule corpus.
rule_set_deployments Snapshot-to-environment deployment tracking.
evaluations Persistent per-rule evaluation records for analytics.
draft_rule_proposals Rule proposals auto-drafted from correction clusters (flywheel).
projects Top-level organizational boundary scoping all resources.
proposals Governance proposal lifecycle (create, amend, retire, merge, split, override).
proposal_comments Threaded comments on governance proposals.
notifications User notifications for proposal activity.
agent_profiles Registered agent identities, trust levels, compliance metrics, mastery data.
agent_exception_requests Agent requests for rule exceptions.
agent_negotiations Agent-initiated verdict challenges and negotiations.
governance_sessions Multi-agent governance session tracking.
rule_translations Multilingual rule translations (per-locale statement, rationale, examples).

37 ORM models across 40 Alembic migrations (001–041, skipping 020). Extensions: uuid-ossp and pgcrypto are installed on first start.

Note: The rules table includes maturity_level (experimental/stable/proven), false_positive_count, true_positive_count, classification, subject_kinds, jurisdiction, kind (normative/computational/procedural/definitional/principle), constraints (JSONB for deterministic evaluation), and structured_scope (JSONB with domain/org_unit/subject_type dimensions) columns.

Migrations are managed by Alembic.

Row-Level Security

Classification-based RLS is enabled on rules, evaluations, and audit_log tables. Session variables (app.user_id, app.user_clearance, app.user_departments) must be set before every query via with_user_context(). Classification RLS coexists with tenant-based RLS; both layers must pass for a row to be visible. See ADR 003.

Elasticsearch (Search Index)

Elasticsearch 8.17 provides full-text and vector search over the rule corpus. There is one index:

rules index

The index uses a custom analyzer (rule_analyzer: standard tokenizer with lowercase, stop word, and snowball filters) and the following field mappings:

Field Type Notes
rule_id keyword Matches the PostgreSQL rule ID.
statement text (analyzed) Searchable rule text with a .keyword sub-field.
tags keyword For filtering.
scope keyword For filtering.
modality keyword MUST, MUST_NOT, SHOULD, MAY, INFO.
severity keyword LOW, MEDIUM, HIGH, CRITICAL.
status keyword Rule lifecycle status.
effective_from date Start of effective period.
effective_until date End of effective period.
embedding dense_vector (768 dims, cosine) For semantic similarity search.
rationale text (analyzed) Searchable rationale.
structured_scope object Multi-axis scope (path, dimensions: domain, org_unit, subject_type).
department keyword Owning department.
kind keyword Rule kind (normative, computational, procedural, definitional, principle).
primary_language keyword Primary rule language (en, ja).
applicable_subject_types keyword Subject types this rule applies to.
created_at / updated_at date Timestamps.

The index template is applied by the es-setup container on first start from infra/elasticsearch/rules-index-template.json.

Search modes available:

  • Full-text (BM25): keyword matching on statement and rationale.
  • Vector (kNN): cosine similarity on the 768-dimensional embedding field.
  • Hybrid: combined BM25 + vector scoring.
  • Category: filter-only queries on keyword fields.
  • Context: given a set of facts, find applicable rules.

Neo4j (Relationship Graph)

Neo4j 5 Community stores the directed graph of rule relationships.

Node label

One label: Rule. The id property matches the PostgreSQL rule ID.

Constraints and indexes

Created on first start from infra/neo4j/init.cypher:

  • Uniqueness constraint on Rule.id.
  • Property indexes on Rule.modality, Rule.severity, Rule.status.

Relationship types

Relationship Direction Meaning
REFINES child --> parent A specific rule operationalizes a more abstract one.
OVERRIDES higher --> lower A rule takes precedence over another.
CONFLICTS_WITH bidirectional Two rules that contradict each other.
DEPENDS_ON dependent --> dependency Evaluation requires another rule's verdict.
DERIVES_FROM derived --> source Originates from a higher-level rule (e.g., a law).
SUCCEEDS new --> old A revision that replaces a prior version.
LOCALIZES localized --> canonical A locale-specific version of a canonical rule.

Consistency Model

PostgreSQL is the source of truth for rule existence and metadata. Neo4j is a derived projection of relationships.

  • When a rule or relationship is created or modified through the API, both PostgreSQL and Neo4j are updated in the same service call.
  • If they disagree, PostgreSQL wins.
  • The scripts/reconcile_graph.py script can rebuild Neo4j entirely from PostgreSQL data as a safety net.