Back to browse
GitHub Repository

Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.

4 starsTypeScript

Agent-triage – diagnosis of agent failures from production traces

by oren1531·Mar 11, 2026·4 points·2 comments

AI Analysis

●●SolidSolve My ProblemBig Brain

Replays agent traces step-by-step to pinpoint exact failure turns automatically.

Strengths
  • Aggregates root causes across conversations to prioritize fixes like missing escalations.
  • Extracts behavioral rules directly from system prompts without manual configuration.
Weaknesses
  • Competes with built-in eval features in established platforms like LangSmith.
Target Audience

AI engineers, backend developers building agent systems

Similar To

LangSmith · Langfuse · Arize

Post Description

I built agent-triage - a CLI that automates diagnosing AI agent failures in production.

I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.

You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.

It flags exactly which turn broke things, which agent caused it, and traces cascading failures across routing, handoffs, and retrieval.

It aggregates root causes across all of them: "24 out of 51 failures are missing escalations." You know exactly what to fix first.

Runs locally. Only LLM API calls leave your machine.

npx agent-triage demo — runs on sample data, uses your own API key (~$0.002/conversation with gpt-4o-mini).

https://github.com/converra/agent-triage Demo report: https://demo-report-sigma.vercel.app/

Similar Projects

AI/ML●●Solid

Putting Git on AI Agents

Git for agent cognition—clever framework, but no working implementation yet.

Big BrainWizardry
vichoiglesias
223mo ago