TurboOCR up to 1200 pages/s with Paddle and TensorRT (C++/CUDA, FP16)

Name: TurboOCR up to 1200 pages/s with Paddle and TensorRT (C++/CUDA, FP16)
Availability: InStock
Author: pfdomizer

by pfdomizer·Apr 16, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig Brain

50x faster than PaddleOCR Python with real TensorRT benchmarks on RTX 5090.

Strengths

•270 img/s throughput with 90.2% F1 score beats Python alternatives decisively.
•C++/TensorRT implementation with gRPC and HTTP APIs for production use.
•Prometheus metrics and Docker images make deployment straightforward.

Weaknesses

•Requires NVIDIA GPU, excludes CPU-only or AMD deployments.
•OCR at scale is niche; most users won't need this throughput.

Similar Projects

Infrastructure●●Solid

We built an OCR server that can process 270 dense images/s on a 5090

50x faster than PaddleOCR Python with real TensorRT benchmarks.

WizardryNiche Gem

pfdomizer

821mo ago

Infrastructure●●●Banger

cuSBF – Faster GPU Bloom Filter for Sequence Data

92× faster than CPU Super Bloom with minimizer-based shard selection.

WizardryNiche Gem

tdortman

2018d ago

Security●●Solid

GPU-accelerated search for Bitcoin keys generated with weak entropy

This reads like a GPU engineer's field notes — one ~3,400-line CUDA file implements a full per-thread crypto pipeline (key gen → EC multiply → SHA-256 → RIPEMD-160) and a two-stage bloom+binary-search matcher to check ~3,100 targets at ~100M keys per batch. The article digs into concrete low-level choices (LUT layout, memory hierarchy, __ldg reads, atomicCAS reporting, and per-mode keygen strategies), which is rare in public writeups; downside is it's closed-source and the dual-use/ethical implications should be called out more explicitly.

WizardryNiche Gem

orkblutt

213mo ago

Design●●Solid