Signed receipts for agent actions
Ed25519 signed receipts solve AI agent accountability across org boundaries.
Self‑hosted QA for AI agents: evidence packs, regression diffs, CI gates.
Turns failing agent runs into a self-contained, inspectable package: report.html for human review and compare-report.json for automatic CI decisions. The evidence manifest + integrity checks and the option to apply redaction before artifacts are written are smart, practical details that make offline handoff and automated gating actually usable for teams building agents.
AI/agent engineers, QA and SRE teams who need reproducible incident handoffs and CI gating
Problem I kept hitting: building agents is fast, but when something breaks, handing off “one failing run” is messy (screenshots, scattered logs, partial configs, access to a tracing UI, accidental secrets/PII in payloads).
What this does: run your agent on a case suite and generate a portable evidence pack you can open offline and attach to a GitHub issue/ticket:
report.html (offline viewer)
compare-report.json (machine-readable summary for CI gating: none | require_approval | block)
evidence files referenced via a manifest (so you can verify completeness/integrity)
It’s intentionally self-hosted/local-only: no backend, no accounts, nothing leaves your environment unless you export the pack.
Redaction note: in the “production” pipeline, redaction is applied in the runner before artifacts are written (the agent is not required to support a special header). There’s also a strict mode that scans all manifest-referenced files for residual markers as a safety gate.
I’m not trying to replace tracing/observability tools — this is meant to be the “handoff unit” when sharing a link or granting UI access isn’t viable.
Questions for HN:
If you’ve had to share a single failing run with another engineer/vendor, what was the missing piece that caused the most back-and-forth?
What would you consider “minimum viable contents” vs a “bundle monster”?
Ed25519 signed receipts solve AI agent accountability across org boundaries.
Predicts agent failure at step 10 without reading prompts or payloads.
Bundle-first agent runtime in Rust when LangChain and CrewAI dominate the space.
Agents synthesize and test their own tools when they fail, not just chain APIs.
Multi-agent councils sound promising, but execution clarity and competitive moat unclear.
Porting a complex page-cache mutation exploit to pure portable C with zero assembly is impressive constraint craft.