GitHub Repository

Schema-aware JSON compression with millisecond lookups — cut transfer/storage while enabling exists /pos queries. (Demo + wheels; core is binary-only)

24 starsRust

Searchable compression for JSON/NDJSON (skip ~99% pages; sub-ms lookups

Name: Searchable compression for JSON/NDJSON (skip ~99% pages; sub-ms lookups
Availability: InStock
Author: Tetsuro

by Tetsuro·Feb 19, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig Brain

Searchable JSON compression at 7.7% with 0.085ms random lookups; skips 99% of pages.

Strengths

•Eliminates the compress-vs-index tradeoff by combining delta encoding, dictionary, and Bloom filters in a single format—genuinely novel compression architecture for JSON.
•Proof-first evaluation model (Demo ZIP + DD Pack with mismatch==0 verification) demonstrates rigor and confidence; reduces friction for enterprise adoption.
•Quantified benchmarks (7.7% vs Zstd 13.7%, p50 latency 0.083ms, 99.4% skip ratio) are reproducible and significantly outperform naive compression for query-heavy workloads.

Weaknesses

•Repositioning as "strategic asset for acquisition/exclusive license" signals the author is shopping it rather than building an open-source community; limits adoption.
•No public benchmarks against ParquetJS, ClickHouse columnar, or other compressed query engines—positioning is vague about where this genuinely outperforms existing infra.

Post Description

Hi HN — I built SEE (Semantic Entropy Encoding), a schema-aware, searchable compression format for JSON/NDJSON.

Goal: Reduce the “data tax” (storage/egress) and the “CPU tax” (decompress/parse) by keeping data searchable while compressed.

What’s different: - Page-based layout + Bloom filter skipping + a small directory index - Fast exists/pos/eq-style lookups without full decompression

Proof-first (no meeting required): - A 10-minute offline Demo Pack (wheel + sample .see + scripts + OnePager) - A DD Evidence Pack designed for reviewers (mismatch-zero checks, audit status, verification file list)

I’m exploring either: (A) acquisition (asset purchase), or (B) an exclusive license with a strategic buyer with a clear integration path. Evaluation slots are limited.

Repo + release assets are linked in the comments. Happy to answer technical questions here.