udoc. Dependency-free document extraction in Rust

Name: udoc. Dependency-free document extraction in Rust
Availability: InStock
Author: newelh

by newelh·May 20, 2026·5 points·1 comment

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig Brain

Pure Rust parsers for legacy Office formats with zero external dependencies.

Strengths

•Native parsers for binary .doc and .xls eliminate need for LibreOffice headless.
•Streaming JSONL output allows processing multi-gigabyte PDFs without OOM errors.
•Unified document model abstracts away format-specific quirks for all inputs.

Weaknesses

•Cargo crate not yet published, requiring manual build or uvx for now.
•OCR and layout detection require piping output to external hooks manually.

Similar Projects

AI/ML●●●Banger

LiteParse, a fast open-source document parser for AI agents

Beats PyPDF and MarkItDown on accuracy without needing GPUs or cloud APIs.

Solve My ProblemSlickCrowd Pleaser

freezed8

1202mo ago

Developer Tools●●●Banger

LibreOffice-rs – I built a pure-Rust LibreOffice using autoresearch

Pure-Rust DOCX to PDF converter running 100x faster than LibreOffice with zero C dependencies.

WizardryBig BrainZero to One

stan_kirdey

10124d ago

Infrastructure●Mid

Aegis-DB – Multi-paradigm database in Rust,in production

Six data models in one binary, but no proof of production use or comparison benchmarks.

Bold Bet

AutomataNexus

113mo ago

AI/ML●●Solid

DocMason – AI Agent Knowledge Base for local complex office files

Preserves document structure instead of flattening to text like most RAG tools.

Solve My ProblemBold Bet

Jet_Xu

232mo ago

AI/ML●●Solid

ProofPudding – Document Extraction API with Citations (PDF/Docx)

ProofPudding returns extraction results with explicit links back to the exact page and source text, supports native and scanned PDFs plus DOCX/images, and ships Python/TypeScript SDKs — handy for agents that need auditable facts. It’s a pragmatic product (per-extraction pricing and confidence scores are nice), but the market is crowded; I want clarity on underlying models, real-world accuracy numbers, and how it compares to Document AI/Textract in edge cases.

Solve My ProblemSlick

garai

104mo ago

AI/ML●●Solid

DocMason – Agent Knowledge Base for local complex office files

Provenance-first RAG beats anonymous text chunks, but Cursor and Continue already own this space.

Big BrainNiche Gem

Jet_Xu

1102mo ago