TurkishSieve CPU/GPU prime sieve found errors in Nicely's tables
Found bugs in 30-year-old twin prime data; RTX 5090 hits 1.1T candidates/sec.
Impressively engineered low-level, bit-packed segmented sieve in Rust — a technical showcase with niche appeal to systems and algorithm enthusiasts.
Systems programmers, Rust developers, performance engineers, algorithm enthusiasts
The borrow checker had opinions. You can't .filter() over a bit array (immutable borrow) and .for_each() mutate it simultaneously. Fair enough — explicit loops for the sieve phase, iterators for collection. Same assembly output, provably safe. Along the way I caught two assertion bugs the sieve was too correct to trigger (off-by-one on the 10,001st prime; 499,999 is composite, not prime).
The flat sieve hit a wall at ~10M — the bit array exceeds L1 cache (32KB) and every sieving pass thrashes. So I built a segmented version: bootstrap sieving primes via flat sieve up to √n, then process the full range in 32KB L1-sized segments. One buffer, reused, never leaves cache.
Results (single-threaded, 25 iterations, cross-verified against primal crate):
n=500K: ~440µs, 30KB sieve mem n=10M: ~10ms, 32KB sieve mem n=50M: ~52ms, 32KB sieve mem (flat: ~67ms) n=100M: ~137ms, 32KB sieve mem (flat: ~190ms)
Compared to the primal crate (Rust's best-in-class), raw speed is ~2x behind — primal uses wheel-30 factorisation which skips multiples of 2, 3, and 5. This sieve only skips evens. But memory is where it wins: 187x less sieve working memory than primal at n=50M (32KB vs 6MB), and a tighter result vector from prime-counting pre-allocation.The target use case is embedded/constrained: ESP32 nodes, Raspberry Pi Zeros, distributed timing networks where every KB matters. It's also a clean reference implementation — one file, zero dependencies, compiles with a single rustc invocation.
Single file. No Cargo. No crates. Just:
rustc -C opt-level=3 -C target-cpu=native primer.rs && ./primer
Code, benchmarks, borrow checker writeup, and a full development narrative are all in the repo.Found bugs in 30-year-old twin prime data; RTX 5090 hits 1.1T candidates/sec.
1.13 trillion candidates/sec on RTX 5090 with 6x memory savings over classical sieves.
4096-bit hardware Proth prover on $200 FPGA finds new prime with deterministic proof.
SIEVE cache beats LRU with one-line swap, but only matters if you're bottlenecked on cache.
The repo delivers the two UX pieces you actually need: a visual rule builder plus a raw SIEVE editor and an IMAP connection dialog for direct uploads. It’s pragmatic — cross-compile notes, native build deps, and a tuned release profile (strip, LTO, size opt) show the author cared about distribution and binary size — but it’s solving a narrow, well-known problem rather than inventing one.
SAM 3 and CLIP plugins bring AI smarts to a Rust image viewer.