Data●●Solid
10k harmonized time-series datasets of African data
7,900+ harmonized African datasets with BibTeX provenance and one-line dataset library loading.
Niche GemBig Brain
kossisoroyce
204d ago

Clean Parquet dump of 55M Open Library rows saves weeks of data cleaning.
ML Engineers, Data Scientists
Google Books API · Internet Archive Dumps · Goodreads Datasets
7,900+ harmonized African datasets with BibTeX provenance and one-line dataset library loading.
518k Vietnamese legal documents fill a massive gap in Southeast Asian NLP datasets.
47M HN items in Parquet, auto-updating every 5 minutes on Hugging Face.
Pre-cleaned ArXiv metadata in Parquet saves hours of ETL pipeline work.
Cross-platform dataset search with health scores when Kaggle and HF are fragmented.
MCP-native tool lets AI agents fetch and clean datasets without human intervention.