Back to browse
GitHub Repository

Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.

7 starsRust

IgniteMS – batch text embeddings at 253K msg/s on 8x A100

by ddayanov·May 20, 2026·3 points·0 comments

AI Analysis

●●●BangerWizardrySolve My Problem

Beats Hugging Face TEI by 3x with raw TensorRT and zero Python runtime overhead.

Strengths
  • Bucketed batching reduces padding waste by grouping texts of similar token lengths.
  • Production proof: sustained 357K msg/s embedding 685M social media events.
  • Cost efficiency drops embedding price to $0.01 per million messages on spot instances.
Weaknesses
  • Requires NVIDIA GPUs and ONNX export compatibility, excluding CPU or Mac users.
  • Niche utility for batch reindexing rather than real-time inference serving.
Category
Target Audience

ML engineers building RAG pipelines or vector search indexes

Similar To

Hugging Face Text Embeddings Inference · NVIDIA Triton Inference Server

Similar Projects