Bit-packed segmented prime sieve in Rust, 32KB working memory, 0 deps

Name: Bit-packed segmented prime sieve in Rust, 32KB working memory, 0 deps
Author: whisprer

by whisprer·Feb 14, 2026·3 points·1 comment

View on HN

AI Analysis

●●●Banger

The Take

Impressively engineered low-level, bit-packed segmented sieve in Rust — a technical showcase with niche appeal to systems and algorithm enthusiasts.

Post Description

Started with a hyper-compact C++ bit-packed sieve — months of whittling it down to ~9 lines of raw bit manipulation. One bit per odd number, Brian Kernighan's bit trick for extraction, hardware tzcnt intrinsics. Then I ported it to Rust.

The borrow checker had opinions. You can't .filter() over a bit array (immutable borrow) and .for_each() mutate it simultaneously. Fair enough — explicit loops for the sieve phase, iterators for collection. Same assembly output, provably safe. Along the way I caught two assertion bugs the sieve was too correct to trigger (off-by-one on the 10,001st prime; 499,999 is composite, not prime).

The flat sieve hit a wall at ~10M — the bit array exceeds L1 cache (32KB) and every sieving pass thrashes. So I built a segmented version: bootstrap sieving primes via flat sieve up to √n, then process the full range in 32KB L1-sized segments. One buffer, reused, never leaves cache.

Results (single-threaded, 25 iterations, cross-verified against primal crate):

n=500K: ~440µs, 30KB sieve mem n=10M: ~10ms, 32KB sieve mem n=50M: ~52ms, 32KB sieve mem (flat: ~67ms) n=100M: ~137ms, 32KB sieve mem (flat: ~190ms)

Compared to the primal crate (Rust's best-in-class), raw speed is ~2x behind — primal uses wheel-30 factorisation which skips multiples of 2, 3, and 5. This sieve only skips evens. But memory is where it wins: 187x less sieve working memory than primal at n=50M (32KB vs 6MB), and a tighter result vector from prime-counting pre-allocation.

The target use case is embedded/constrained: ESP32 nodes, Raspberry Pi Zeros, distributed timing networks where every KB matters. It's also a clean reference implementation — one file, zero dependencies, compiles with a single rustc invocation.

Single file. No Cargo. No crates. Just:

rustc -C opt-level=3 -C target-cpu=native primer.rs && ./primer

Code, benchmarks, borrow checker writeup, and a full development narrative are all in the repo.

Similar Projects

Developer Tools●●●Banger

TurkishSieve CPU/GPU prime sieve found errors in Nicely's tables

Found bugs in 30-year-old twin prime data; RTX 5090 hits 1.1T candidates/sec.

WizardryDark Horse

bilgisoft

203mo ago

Data●●Solid

Turkish Sieve Engine – Full Prime Statistics Up to 10^14 and V2 Preview

1.13 trillion candidates/sec on RTX 5090 with 6x memory savings over classical sieves.

Niche GemWizardry

bilgisoft

103mo ago

Hardware●●●●Gem

FPGA prime finder – discovered a 1,123-digit Proth prime

4096-bit hardware Proth prover on $200 FPGA finds new prime with deterministic proof.

WizardryZero to OneNiche Gem

sickthecat

503mo ago

Developer Tools●●●Banger

Warp_cache – SIEVE cache in Rust for Python, 25x faster than cachetools

SIEVE cache beats LRU with one-line swap, but only matters if you're bottlenecked on cache.

WizardryBig Brain

tolopalmer

203mo ago

Open Source●●Solid

Sievers a Rust SIEVE filter editor

The repo delivers the two UX pieces you actually need: a visual rule builder plus a raw SIEVE editor and an IMAP connection dialog for direct uploads. It’s pragmatic — cross-compile notes, native build deps, and a tuned release profile (strip, LTO, size opt) show the author cared about distribution and binary size — but it’s solving a narrow, well-known problem rather than inventing one.

Niche GemSolve My Problem

oscarcp

103mo ago

Open Source●●Solid

Luminous – fast image viewer in Rust, SAM 3 and CLIP support

SAM 3 and CLIP plugins bring AI smarts to a Rust image viewer.

Niche GemShip It

jarda_64

304d ago