ParseBench – Document parsing benchmark for AI agents

Name: ParseBench – Document parsing benchmark for AI agents
Availability: InStock
Author: pierre

by pierre·Apr 13, 2026·9 points·5 comments

AI Analysis

●●●BangerBig BrainDark Horse

First benchmark measuring semantic correctness over text similarity for document parsing.

Strengths

Weaknesses

AI/ML●●●Banger

Frontier models hit 67-75% outcome accuracy but only 25-42% on process compliance.

Big BrainBold Bet

shubh-chat

103mo ago

AI/ML●●●●Gem

Agents fail completely at rebuilding binaries from scratch without source code.

Big BrainBold BetZero to One

lieret

2431mo ago

AI/ML●●Solid

LlamaIndex open-sources their parser core, but LlamaParse cloud still handles complex layouts.

Solve My ProblemShip It

cheesyFish

2013mo ago

AI/ML●●●Banger

Tests agents on 700 policy docs and noisy voice calls where AgentBench stops.

Big BrainNiche Gem

victorbarres

1212mo ago

Unsupervised bug benchmark using agents as both attackers and defenders—novel scoring methodology.

Big BrainWizardryShip It

przadka

903mo ago

AI/ML●●Solid

Interactive DuckDB-WASM benchmark beats static leaderboards for agentic SQL eval.

Big BrainNiche Gem

102mo ago