ELDC – Natural language identification, faster than FastText and CLD2

Name: ELDC – Natural language identification, faster than FastText and CLD2
Availability: InStock
Author: nitotm

by nitotm·Jun 15, 2026·3 points·3 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainDark Horse

C implementation beats CLD2 by 2x and FastText by 6x in language detection benchmarks.

Strengths

•Memory-constrained architecture adds languages and n-grams with minimal database scaling impact.
•Bindings provided for Java, Go, Rust, .NET, PHP, Ruby, and Python from single C codebase.
•Author's first C project after building ELD in PHP, JavaScript, and Python previously.

Weaknesses

•Language detection is solved category with CLD2, CLD3, FastText, and Lingua established.
•Cross-language bindings noted as not 100% validated yet in the README.

Post Description

I want to introduce ELDC, an efficient language detector, written in C, designed to maximize speed and accuracy within a relatively constrained memory footprint.

ELDC is the latest iteration of the ELD software I made years ago. This version is available as an executable, a library, and a Python package.

This is my first C software, or anything compiled for that matter, I previously built this in pure PHP, JavaScript, and Python.

Highlights: - Performance: In my benchmarks, it runs faster than CLD2 and much faster than FastText. I believe the results are reproducible for any workload. - Accuracy: Within its supported language set, the benchmarks show it to be more accurate than Lingua, CLD3, CLD2, FastText, and etc. Accuracy is very benchmark dependent, so I will make no claim other than ELDC is highly accurate. - It supports 60 languages. Its architecture is highly efficient with database size scaling, I can add more n-grams or languages with a relatively low impact. - Memory usage: The compiled software is about 26MB, and it also builds a 32MB hashtable on load.

Notes: - Database size: I do have other database sizes (featured in the PHP version), but I went for simplicity and used the optimal size. But more sizes could be added. - Single Detection: I optimized for multi-detection. For single, a B-tree would offer faster loading and lower memory usage than the current hashtable. I haven't anticipated to be the most common use case, but it could be optimized for.

I would like to get some feedback, I'm curious to see if my speed claims hold true against your own tests. :)