Back to browse
GitHub Repository

Continual harness optimization

59 starsPython

Meta-agent: self-improving agent harnesses from live traces

by essamsleiman·Apr 6, 2026·14 points·0 comments

AI Analysis

●●●BangerBig BrainSolve My Problem

Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.

Strengths
  • LLM judge scores unlabeled traces while proposer writes targeted harness updates
  • Updates only kept if they improve holdout accuracy, preventing regression
  • Supports Claude Agent SDK with more frameworks coming
Weaknesses
  • Currently only supports Claude Agent SDK, limited framework support
  • Requires both production traces and labeled holdout set to function
Category
Target Audience

AI agent developers, teams deploying agents to production

Similar To

LangSmith · Braintrust · Arize Phoenix

Post Description

We built meta-agent: an open-source library that automatically and continuously improves agent harnesses from production traces.

Point it at an existing agent, a stream of unlabeled production traces, and a small labeled holdout set.

An LLM judge scores unlabeled production traces as they stream.

A proposer reads failed traces and writes one targeted harness update at a time, such as changes to prompts, hooks, tools, or subagents. The update is kept only if it improves holdout accuracy.

On tau-bench v3 airline, meta-agent improved holdout accuracy from 67% to 87%.

We open-sourced meta-agent. It currently supports Claude Agent SDK, with more frameworks coming soon.

Try it here: https://github.com/canvas-org/meta-agent

Similar Projects

AI/ML●●Solid

A Technique for Self-Improving Agents

Git worktree isolation lets agents test instruction changes without breaking other sites—clever regression prevention.

Big BrainRabbit Hole
dataviz1000
102mo ago