I analyzed 227M Medicaid billing records to flag potential fraud
Public fraud detection across 227M Medicaid claims—1,860 flagged providers, zero false positives yet.
Data tools and analytics from Show HN — visualization, ETL pipelines, databases, and data science utilities.
Public fraud detection across 227M Medicaid claims—1,860 flagged providers, zero false positives yet.
1.3M case analysis proves systemic sentencing bias with rigorous controls; visceral real-time cost counter.
350x faster GPU Bloom filter with academic paper backing the performance claims.
Bills + AI summaries + bias-rated news + politician posts all in one.
PostgreSQL extension for BM25 relevance-ranked full-text search. Postgres OSS licensed.
Native Postgres BM25 with ORDER BY content <@> 'query' operator syntax.
Explore 230 global music charts from 1940–2025 with a Back to the Future UI.
ACORN-1 pushes WHERE filters into HNSW traversal, fixing pgvector's post-filter limitation.
Finally, a history archive that lets you search by meaning instead of just keywords.
Real-time MCP sentiment tracker when everyone's debating if the protocol is dead or thriving.
Infers invisible submarine cable paths using A* pathfinding between ocean-crossing hops.
Recovers metadata for deleted videos across 1.5B indexed entries since 2005.
Git-like branching for columnar data with DuckDB-beating benchmarks from pure JVM.
GPU-accelerated layout renders 7M articles with shortest-path finding between any two.
Agent-guided compilation handles merged cells and multi-level headers LLMs choke on.
207 papers buried in government records, now searchable with full context extraction.
Distributes DuckDB queries across BEAM nodes when Polars maintenance became unsustainable.
Open-source streaming SQL engine written in Rust using Apache Arrow and DataFusion. Supports continuous queries, temporal stream joins, tumbling/session windows, and CDC/Kafka connectors. Lightweight, embeddable, and sub-microsecond latency
Sub-microsecond streaming SQL via zero-alloc hot path; genuine advancement over SQLite+DataFusion.
Reactive SQL graph in the browser beats Tableau for local prototyping.
3,000 years of literary citation networks visualized with WebGPU; LLM pipeline resolves messy bibliographic data.
7,560 runs proving cheaper models beat expensive ones on production OCR tasks.
Stunning 3D globe tracking 725 AI clusters and 80M H100 equivalents globally.
WebGPU renders millions of galaxies in-browser with real-time density correction filters.
Runs billions of rows locally on your machine, bypassing cloud warehouse costs entirely.
24M spending rows searchable instantly; missing data but fills a real transparency gap.
Beats utility forecasts on 6 of 7 RTOs using only public EIA data and open models.
Monte Carlo sampling inside a spreadsheet beats collapsing uncertainty into single scalars.
Column-oriented analytics for SQLite, designed for fast local analytical queries with benchmarked speedups of up to 130,000x over standard SQLite scans on selected workloads.
Columnar storage inside SQLite delivers 130,000x speedup on aggregation scans.
BM25 search directly on S3 Parquet files without ingesting data into Elasticsearch.
Local-first hybrid vector database. HNSW + BM25 + fusion search. Single binary, no cloud.
Single-binary Rust DB fusing HNSW and BM25 without cloud dependencies or API keys.
Sample-level versioning for ML pipelines—solves real compute-waste problem with clever metadata design.
A command-line tool to extract plain text from Wikipedia dumps with category and section filtering
Category-aware Wikipedia text extraction with 20-year maintenance history and parallel M4 speed.
Rasteret is a library for 20x+ faster reads of GeoTIFF than Rasterio/GDAL. Interops with TorchGeo, Xarray, DuckDB, Polars
Parquet-cached COG metadata cuts satellite image cold starts from minutes to seconds.
Geospatial conflict dashboard beats raw-feed competitors with spatial intel and AI signal curation.
DuckDB-powered desktop app queries 5GB CSVs locally in seconds without cloud uploads.
Bankstatemently Open Benchmark for bank statement parsing. 15 synthetic statements, 40 parsing challenges, automated scoring via API.
First shared benchmark for bank statement parsers when real data is legally off-limits.
Live fuel levels from 4k exposed industrial sensors—basically FlightRadar24 for gas stations.
Scores 16 years of HN comments for AI pessimism, showing a doubling trend since ChatGPT.
Finally proves Rust has overtaken Java in HN hiring with hard numbers, not vibes.
Stitched 330K files into a 12-mile scroll that actually scrolls smoothly in the browser.
Automates regulatory compliance tracking—links proposed rules to final text with diffs and extracted obligations.
Replaces scraping dozens of inconsistent government APIs with one normalized REST endpoint.
Independent CDMO scoring prevents pay-to-play bias in vendor selection.
100+ structural coupling rules enforce causal economics across 195 countries with probabilistic outputs.
A new search paradigm where documents have gravity, queries converge into basins, and multi-signal scoring uses interference instead of linear fusion.
Gravity-based search with interference scoring beats linear fusion by 18.5% NDCG.
Query the full Hacker News archive from Postgres via duckdb_fdw, with zero copies. Stream row groups straight from the Hugging Face Parquet dataset on demand.
Zero-copy Postgres queries against 47M rows using DuckDB FDW and HTTP range requests.
First public NRC regulatory embeddings dataset—37K chunks ready for ChromaDB and Pinecone.
TPC-H 1GB in 2 seconds on iPhone—Arrow Flight SQL running locally.
An experimental embedded SQL engine in C++20. Query Parquet, CSV, JSON, Arrow, Avro, SQLite, and Excel files directly with SQL, in-process. Early-stage.
1.3MB WASM database beats DuckDB 5x on 10M row analytics, runs in Cloudflare Workers.
389 projects