Local "incident bundle" for AI/agent failures (offline rep and CI JSON)

Name: Local "incident bundle" for AI/agent failures (offline rep and CI JSON)
Availability: InStock
Author: Tanyayvr

by Tanyayvr·Feb 19, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidNiche GemSolve My Problem

The Take

Turns failing agent runs into a self-contained, inspectable package: report.html for human review and compare-report.json for automatic CI decisions. The evidence manifest + integrity checks and the option to apply redaction before artifacts are written are smart, practical details that make offline handoff and automated gating actually usable for teams building agents.

Post Description

Hi,I built a small local-first CLI toolkit for debugging AI/agent incidents.

Problem I kept hitting: building agents is fast, but when something breaks, handing off “one failing run” is messy (screenshots, scattered logs, partial configs, access to a tracing UI, accidental secrets/PII in payloads).

What this does: run your agent on a case suite and generate a portable evidence pack you can open offline and attach to a GitHub issue/ticket:

report.html (offline viewer)

compare-report.json (machine-readable summary for CI gating: none | require_approval | block)

evidence files referenced via a manifest (so you can verify completeness/integrity)

It’s intentionally self-hosted/local-only: no backend, no accounts, nothing leaves your environment unless you export the pack.

Redaction note: in the “production” pipeline, redaction is applied in the runner before artifacts are written (the agent is not required to support a special header). There’s also a strict mode that scans all manifest-referenced files for residual markers as a safety gate.

I’m not trying to replace tracing/observability tools — this is meant to be the “handoff unit” when sharing a link or granting UI access isn’t viable.

Questions for HN:

If you’ve had to share a single failing run with another engineer/vendor, what was the missing piece that caused the most back-and-forth?

What would you consider “minimum viable contents” vs a “bundle monster”?