We built governed multi-agent teams months before Anthropic announced
Multi-agent governance with cost checkpoints, but orchestration is table stakes for GenAI now.
AI agent governance framework with audit trails, safety checks, and replayable decision making for autonomous systems
Verifiable decision replay for autonomous systems, but execution complexity limits adoption beyond safety-critical domains.
AI/autonomous systems engineers, security teams building agent frameworks
OpenAI's governance work (early research) · Anthropic's Constitutional AI frameworks
The core idea is that every action must pass through a verifiable execution boundary that produces a replayable evidence bundle.
Pipeline:
Gateway → ActionIntent → PolicyDecisionPoint → SafetyGate → Approval Workflow → ExecutorPlugin → ExecutionTrace → ExecutionProofBundle
This allows you to deterministically replay a decision and verify exactly why an action was allowed or denied.
The repo includes a restrained-autonomy demo where an operator vetoes an action and the system produces a deterministic audit stream.
I'm particularly interested in feedback from people working on agent infrastructure, security, and safety systems.
Multi-agent governance with cost checkpoints, but orchestration is table stakes for GenAI now.
First public implementation of DeepMind delegation paper. Tested on Zork with governance that blocked 'attack'.
Intercepts tool calls before execution to block dangerous actions like DB deletes.
Ambitious architecture: persistent hyperdimensional memory (QAIS), deterministic paragraph retrieval (SLA), and a dual-operator Save Protocol that forces human+AI agreement before writes. The practical touches — a React control panel, an AutoHotkey clipboard bridge/counter, and one-command installer — show someone built this to be used, not just theorized. Platform lock (Windows + AHK) and a README heavy on terminology mean it's exciting for niche adopters but not yet plug-and-play for broader audiences.
Fail-closed policy layer blocks LLM tool calls before execution, no LLM in decision path.
Four-tier stack with algebraic verification feels over-engineered compared to K9 Audit.