Mdarena – Benchmark your Claude.md against your own PRs
Mining your own PRs as benchmarks beats generic SWE-bench tasks for agent config tuning.

68% Git test parity in Rust with transparent cell-by-cell failure breakdowns.
Systems programmers, Git tooling developers, performance-focused engineers
gix · libgit2 · JGit
Mining your own PRs as benchmarks beats generic SWE-bench tasks for agent config tuning.
Smart reverse-dependency tracking beats naive path-matching in large Rust workspaces.
263k config search space benchmarked across robot fleets—nothing like this exists for robotics AI.
Expands corpus to 16 CVE-anchored scenarios to break model ties.
First benchmark measuring semantic correctness over text similarity for document parsing.
Beats Valkey on GET and SET benchmarks while guaranteeing memory safety with Rust.