Searchable JSON compression (offline demo and DD evidence and BYOD kit)
Searchable JSON compression with page-level access is clever, but it's a pre-revenue tech asset, not a working product.
Schema-aware JSON compression with millisecond lookups — cut transfer/storage while enabling exists /pos queries. (Demo + wheels; core is binary-only)
Beats Zstd-19 on size, keeps JSON queryable without external indexes.
Infrastructure engineers, data platform teams, observability vendors, storage optimization specialists
Zstandard · Parquet · ORC
On our GitHub events dataset, SEE ended up smaller than Zstd-19 while still supporting random access queries: - combined: 40.4MB vs Zstd 71.8MB (raw 524.1MB) → 7.7% of raw - str: 9.1MB vs Zstd 9.5MB - int: 31.3MB vs Zstd 62.3MB Lookup microbench (one column): p50 ~0.085ms.
Repo + release assets are here: https://github.com/kodomonocch1/see_proto
NDA eval request (optional): https://docs.google.com/forms/d/e/1FAIpQLScV2Ti592K3Za2r_WLU...
Happy to answer questions about the design trade-offs and where this beats “Zstd + separate index”.
Searchable JSON compression with page-level access is clever, but it's a pre-revenue tech asset, not a working product.
Schema-aware JSON compression stays searchable; reaches 7.7% vs Zstd's 13.7%.
Searchable JSON compression at 7.7% with 0.085ms random lookups; skips 99% of pages.
Beats zstd by 12% on Ethereum JSON by decoding hex and delta-encoding fields first.
Binary JSON with table reuse, but CBOR and MessagePack already own this space.
32x embedding compression without calibration beats product quantization's training overhead.