Back to browse
GitHub Repository

Self‑hosted QA for AI agents: evidence packs, regression diffs, CI gates.

2 starsTypeScript

Local "incident bundle" for AI/agent failures (offline rep and CI JSON)

by Tanyayvr·Feb 19, 2026·1 point·0 comments

AI Analysis

●●SolidNiche GemSolve My Problem
The Take

Turns failing agent runs into a self-contained, inspectable package: report.html for human review and compare-report.json for automatic CI decisions. The evidence manifest + integrity checks and the option to apply redaction before artifacts are written are smart, practical details that make offline handoff and automated gating actually usable for teams building agents.

Target Audience

AI/agent engineers, QA and SRE teams who need reproducible incident handoffs and CI gating

Post Description

Hi,I built a small local-first CLI toolkit for debugging AI/agent incidents.

Problem I kept hitting: building agents is fast, but when something breaks, handing off “one failing run” is messy (screenshots, scattered logs, partial configs, access to a tracing UI, accidental secrets/PII in payloads).

What this does: run your agent on a case suite and generate a portable evidence pack you can open offline and attach to a GitHub issue/ticket:

report.html (offline viewer)

compare-report.json (machine-readable summary for CI gating: none | require_approval | block)

evidence files referenced via a manifest (so you can verify completeness/integrity)

It’s intentionally self-hosted/local-only: no backend, no accounts, nothing leaves your environment unless you export the pack.

Redaction note: in the “production” pipeline, redaction is applied in the runner before artifacts are written (the agent is not required to support a special header). There’s also a strict mode that scans all manifest-referenced files for residual markers as a safety gate.

I’m not trying to replace tracing/observability tools — this is meant to be the “handoff unit” when sharing a link or granting UI access isn’t viable.

Questions for HN:

If you’ve had to share a single failing run with another engineer/vendor, what was the missing piece that caused the most back-and-forth?

What would you consider “minimum viable contents” vs a “bundle monster”?

Similar Projects

AI/ML●●●Banger

Signed receipts for agent actions

Ed25519 signed receipts solve AI agent accountability across org boundaries.

Zero to OneBig Brain
jithinraj
203mo ago