Agent-triage – diagnosis of agent failures from production traces
Replays agent traces step-by-step to pinpoint exact failure turns automatically.

Heuristic signals triage agent traces 1.52x more efficiently than random sampling.
LLM application developers, ML engineers building agentic systems
LangSmith · Arize Phoenix · Helicone
Salman, Shuguang and Adil here from Katanemo Labs (a DigitalOcean company).
Wanted to introduce our latest research on agentic systems called Signals. If you've been building agents, you've probably noticed that there are far too many agent traces/trajectories to review one by one, and using humans or extra LLM calls to inspect all of them gets expensive really fast. The paper proposes a lightweight way to compute structured “signals” from live agent interactions so you can surface the trajectories most worth looking at, without changing the agent’s online behavior. Computing Signals doesn't require a GPU.
Signals are grouped into a simple taxonomy across interaction, execution, and environment patterns, including things like misalignment, stagnation, disengagement, failure, looping, and exhaustion. In an annotation study on τ-bench, signal-based sampling reached an 82% informativeness rate versus 54% for random sampling, which translated to a 1.52x efficiency gain per informative trajectory.
Paper: arXiv 2604.00356. Project where Signals are already implemented: https://github.com/katanemo/plano
Happy to answer questions on the taxonomy, implementation details, or where this breaks down.
Replays agent traces step-by-step to pinpoint exact failure turns automatically.
Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.
One binary with embedded dashboard beats LangSmith's closed-source SaaS for Go teams.
Qualitative eval workflow for PMs when LangSmith and Arize target ML engineers.
Pre-computed market context cuts token usage for financial AI agents.
Testing framework for AI agents with LLM judges and SQLite result tracking.