Digest AI vs HN About

GitHub Repository

69 starsTypeScript

We built Cobalt, Open source unit testing for AI Agents

by fdefitte·Feb 12, 2026·3 points·1 comment

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemShip It

Testing framework for AI agents with LLM judges and SQLite result tracking.

Strengths

•Solves real pain: testing AI agents is under-addressed, and LLM-based evaluation is smart.
•Good integrations: Langfuse, LangSmith, Braintrust, Basalt tracking built in.
•MCP server lets AI assistants run experiments, tying into editor workflows naturally.

Weaknesses

•Crowded space: Braintrust, Humanloop, and LangSmith already offer agent eval.
•Early stage: 18 stars, limited adoption signals, unclear why this over existing tools.

Category

Developer Tools

Target Audience

AI/LLM engineers, agent developers, teams building AI-powered applications

Similar To

Braintrust · LangSmith · Humanloop

Similar Projects

Developer Tools●●●Banger

Cobalt – Unit tests for AI agents, like Jest but for LLMs

Jest for LLMs—CI-native eval that fails builds on quality drops, not dashboards.

Ship ItSolve My ProblemBig Brain

fdefitte

303mo ago

Developer Tools●●●Banger

CheckAgent The open-source pytest testing framework for AI agents

pytest-native testing for AI agents with 101 built-in safety attack probes.

Solve My ProblemSlick

xydac

301mo ago

AI/ML●●Solid

GEDD – A Systematic Evidence Driven LLM as a Judge Framework

Qualitative eval workflow for PMs when LangSmith and Arize target ML engineers.

Big BrainNiche Gem

balasvce2026

203d ago

Developer Tools●●●Banger

Cheddar-bench – unsupervised benchmark for coding agents

Unsupervised bug benchmark using agents as both attackers and defenders—novel scoring methodology.

Big BrainWizardryShip It

przadka

903mo ago

Developer Tools●●Solid

Agent-triage – diagnosis of agent failures from production traces

Replays agent traces step-by-step to pinpoint exact failure turns automatically.

Solve My ProblemBig Brain

oren1531

423mo ago

AI/ML●●●Banger

Meta-agent: self-improving agent harnesses from live traces

Iteratively improves agent harnesses from 67% to 87% on tau-bench using production traces.

Big BrainSolve My Problem

essamsleiman

1402mo ago