I embedded 685M public texts in 32 minutes (on 8x A100, Rust, TensorRT)
3.6x faster than Hugging Face TEI on same hardware with zero Python overhead at runtime.
Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.
Beats Hugging Face TEI by 3x with raw TensorRT and zero Python runtime overhead.
ML engineers building RAG pipelines or vector search indexes
Hugging Face Text Embeddings Inference · NVIDIA Triton Inference Server
3.6x faster than Hugging Face TEI on same hardware with zero Python overhead at runtime.
Rust port of Qdrant's fastembed when the Python original already works fine.
Rust LSM-Tree engine, but RocksDB and Redb already dominate this space.
Sub-microsecond CAN frame detection with zero heap allocation in 122K lines of Rust.
Sparse matrix graph operations with MCP server integration for AI agents.
32x embedding compression without calibration beats product quantization's training overhead.