Ocrbase.dev – pdf→.md/.json OCR for developers

Name: Ocrbase.dev – pdf→.md/.json OCR for developers
Availability: InStock
Author: adammajcher

by adammajcher·Feb 18, 2026·3 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemNiche GemSlick

94.5% accuracy, self-hostable, open source—beats Textract on cost and accuracy.

Strengths

•Significantly cheaper than Textract/Azure with higher accuracy on OmniDocBench benchmarks
•Open source and self-hostable—real alternative to proprietary competitors
•Dead-simple API (one-line parsing) with schema-driven structured output

Weaknesses

•OCR/document extraction is solved problem; dozens of tools exist (Textract, Azure, Anthropic)
•No clear tech differentiation explained—why does it beat others? Model choice unclear

Post Description

https://x.com/adammajcher20/status/2024221836053995944

Similar Projects

Developer Tools●Mid

ocrbase – PDF/IMG –>.MD/JSON Model-Agnostic OCR API

Yet another OCR API wrapper when JinaAI and Firecrawl already exist.

Ship It

adammajcher

101mo ago

Productivity●●Solid

AutoRename-PDF – Open-source tool that uses AI to rename your PDFs

Offline Ollama + OCR keeps your documents private when cloud APIs won't.

Solve My ProblemCozy

SPQRK

102mo ago

Developer Tools●●●Banger

Smelt – Extract structured data from PDFs and HTML using LLM

LLM infers schema once, Go does 10k-row extraction—avoids token waste.

Big BrainSolve My Problem

smeltcli

603mo ago

AI/ML●●Solid

ProofPudding – Document Extraction API with Citations (PDF/Docx)

ProofPudding returns extraction results with explicit links back to the exact page and source text, supports native and scanned PDFs plus DOCX/images, and ships Python/TypeScript SDKs — handy for agents that need auditable facts. It’s a pragmatic product (per-extraction pricing and confidence scores are nice), but the market is crowded; I want clarity on underlying models, real-world accuracy numbers, and how it compares to Document AI/Textract in edge cases.

Solve My ProblemSlick

garai

104mo ago

SaaS●●Solid

AI-Powered Structured Data Extraction from Any Document (93%+ Accuracy)

93% accuracy document extraction, but remove.bg-style competition already exists.

Solve My ProblemSlick

chaitanyavelaga

124mo ago

Data●Mid

I used NLP to turn UK planning PDFs into a clean CSV

Useful dataset for UK researchers but it's a Kaggle upload, not a reusable tool.

Niche Gem

david_s_data

132mo ago