GitHub Repository

Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.

4 starsTypeScript

Agent-triage – diagnosis of agent failures from production traces

Name: Agent-triage – diagnosis of agent failures from production traces
Availability: InStock
Author: oren1531

by oren1531·Mar 11, 2026·4 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemBig Brain

Replays agent traces step-by-step to pinpoint exact failure turns automatically.

Strengths

•Aggregates root causes across conversations to prioritize fixes like missing escalations.
•Extracts behavioral rules directly from system prompts without manual configuration.

Weaknesses

•Competes with built-in eval features in established platforms like LangSmith.

Post Description

I built agent-triage - a CLI that automates diagnosing AI agent failures in production.

I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.

You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.

It flags exactly which turn broke things, which agent caused it, and traces cascading failures across routing, handoffs, and retrieval.

It aggregates root causes across all of them: "24 out of 51 failures are missing escalations." You know exactly what to fix first.

Runs locally. Only LLM API calls leave your machine.

npx agent-triage demo — runs on sample data, uses your own API key (~$0.002/conversation with gpt-4o-mini).

https://github.com/converra/agent-triage Demo report: https://demo-report-sigma.vercel.app/

Similar Projects

AI/ML●●●Banger

Meta-agent: self-improving agent harnesses from live traces

Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.

Big BrainSolve My Problem

essamsleiman

1402mo ago

AI/ML●●●Banger

Galdor – a Go LLM agent framework with built-in tracing and replay

One binary with embedded dashboard beats LangSmith's closed-source SaaS for Go teams.

Big BrainSlick

yassros16

721d ago

SaaS●●Solid

Khaga – AI Infrastructure Diagnosis for AWS, GCP, Azure and Kubernetes

Multi-cloud diagnosis in <30s, but infra observability (Datadog, New Relic) already solves this better.

Solve My ProblemDark Horse

Gowrishankarhq

123mo ago

AI/ML●●Solid

Signals – finding the most informative agent traces without LLM judges

Heuristic signals triage agent traces 1.52x more efficiently than random sampling.

Big BrainNiche Gem

sparacha

302mo ago

AI/ML●●Solid

Putting Git on AI Agents

Git for agent cognition—clever framework, but no working implementation yet.

Big BrainWizardry

vichoiglesias

223mo ago

SaaS●Mid

Trainly – Free 72-hour audit of your AI agent's production traces

Free audit funnel for AI observability when LangSmith and Helicone already do this.

Ship It

kavin_key

621mo ago