GoldenMatch – Entity resolution with LLM scoring, 97% F1, no Spark
Fellegi-Sunter matching with active learning beats Dedupe.io on complex datasets.
Zero-config entity resolution. The zero-tuning Fellegi-Sunter path beats hand-rolled Splink head-to-head; scales from a CSV to a verified 100M-row dedupe in 9.2 min on Ray. Fuzzy/exact/probabilistic + PPRL + LLM, identity graph. Python + edge-safe TypeScript (optional WASM), SQL-native in Postgres & DuckDB, MCP/REST + dbt/Airflow.
Ray-based dedupe at 100M rows without Spark — that's a real architectural choice.
Data engineers, data scientists
Splink · Dedupe.io · OpenRefine
Fellegi-Sunter matching with active learning beats Dedupe.io on complex datasets.
100M free tokens is generous, but Hugging Face and Replicate already host models.
Spark without Databricks markup, but Kubernetes management is still ops work.
Ray-casting engine brings retro Wolf3D vibes to a browser-based moon trucking sim.
ByteBuddy injects trace context into Spark tasks; sees executor-level details no competitor offers.
TPC-H 1GB in 2 seconds on iPhone—Arrow Flight SQL running locally.