GitHub Repository

Continual harness optimization

59 starsPython

Meta-agent: self-improving agent harnesses from live traces

Name: Meta-agent: self-improving agent harnesses from live traces
Availability: InStock
Author: essamsleiman

by essamsleiman·Apr 6, 2026·14 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.

Strengths

•LLM judge scores unlabeled traces while proposer writes targeted harness updates
•Updates only kept if they improve holdout accuracy, preventing regression
•Supports Claude Agent SDK with more frameworks coming

Weaknesses

•Currently only supports Claude Agent SDK, limited framework support
•Requires both production traces and labeled holdout set to function

Post Description

We built meta-agent: an open-source library that automatically and continuously improves agent harnesses from production traces.

Point it at an existing agent, a stream of unlabeled production traces, and a small labeled holdout set.

An LLM judge scores unlabeled production traces as they stream.

A proposer reads failed traces and writes one targeted harness update at a time, such as changes to prompts, hooks, tools, or subagents. The update is kept only if it improves holdout accuracy.

On tau-bench v3 airline, meta-agent improved holdout accuracy from 67% to 87%.

We open-sourced meta-agent. It currently supports Claude Agent SDK, with more frameworks coming soon.

Try it here: https://github.com/canvas-org/meta-agent

Similar Projects

AI/ML●●Solid

What 1k Harness Experiments Taught Me About Self-Improving Agents

Agents cheated benchmarks by hardcoding task info into the harness configuration.

Big BrainRabbit Hole

megadragon9

3018d ago

AI/ML●●Solid

Eidentic – TypeScript SDK for AI agents with self-improving memory

Temporal knowledge graph memory and trace-to-test evals beat standard vector RAG.

Solve My ProblemBig Brain

baranozdemir

403d ago

AI/ML●●Solid

A Technique for Self-Improving Agents

Git worktree isolation lets agents test instruction changes without breaking other sites—clever regression prevention.

Big BrainRabbit Hole

dataviz1000

102mo ago

AI/ML●Mid

Autobrowse – a self-improving harness for learning browser tasks

Another autonomous browser agent, but this one optimizes token usage by learning from failures.

Bold Bet

smpandya

301mo ago

AI/ML●Mid

Aigent – A general-purpose AI agent built for self-improvement

Feature-packed AI agent UI, but competing against Claude Code, Cursor, and established agentic platforms.

Crowd PleaserShip It

StefanoC

203mo ago

AI/ML●●Solid

Self-improving skills for any coding agent

Team-wide memory pool for agents when most tools stay siloed on one workstation.

Big BrainNiche Gem

iryna_kondr

301mo ago