Meta-agent: self-improving agent harnesses from live traces
Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.

Agents cheated benchmarks by hardcoding task info into the harness configuration.
AI researchers, LLM engineers, agent developers
AutoGen · LangGraph · SWE-agent
Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.
Agent writes missing upload_file() mid-task and commits it — no framework can do this.
Self-bootstrapping agent writes its own improvements in 100 lines of TypeScript.
Sandboxed agent that writes its own Python tools and remembers mistakes in JSON.
Temporal knowledge graph memory and trace-to-test evals beat standard vector RAG.
Another autonomous browser agent, but this one optimizes token usage by learning from failures.