Back to browse
GitHub Repository

Schema-aware JSON compression with millisecond lookups — cut transfer/storage while enabling exists /pos queries. (Demo + wheels; core is binary-only)

24 starsRust

See – searchable JSON compression, smaller than ZSTD (on our data)

by Tetsuro·Feb 18, 2026·3 points·1 comment

AI Analysis

●●●BangerBig BrainWizardry

Beats Zstd-19 on size, keeps JSON queryable without external indexes.

Strengths
  • Schema-aware structure encoding achieves provably better compression than Zstd while maintaining random-access query support, not just byte savings.
  • Rigorous benchmarking with DD Pack reproducibility artifacts and mismatch==0 verification gates, targeting serious enterprise evaluation.
  • Memory-mapped I/O design for Bloom/Skip data structures is algorithmically non-obvious—contrasts with commodity compression commoditization.
Weaknesses
  • Limited to JSON/NDJSON; general-purpose compression tools already own mindshare in most stacks.
  • NDA eval slots and gated access reduce visibility; needs to demonstrate adoption wins to move from platform curiosity to industry standard.
Target Audience

Infrastructure engineers, data platform teams, observability vendors, storage optimization specialists

Similar To

Zstandard · Parquet · ORC

Post Description

I built SEE (Semantic Entropy Encoding): a page-level JSON/NDJSON format that stays searchable while compressed (exists/pos/eq-style probes), using Bloom+skip + structure-aware encoding.

On our GitHub events dataset, SEE ended up smaller than Zstd-19 while still supporting random access queries: - combined: 40.4MB vs Zstd 71.8MB (raw 524.1MB) → 7.7% of raw - str: 9.1MB vs Zstd 9.5MB - int: 31.3MB vs Zstd 62.3MB Lookup microbench (one column): p50 ~0.085ms.

Repo + release assets are here: https://github.com/kodomonocch1/see_proto

NDA eval request (optional): https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLU...

Happy to answer questions about the design trade-offs and where this beats “Zstd + separate index”.

Similar Projects