Diom – Back end primitives (queue, rate limit, etc.) in one Rust binary
Replaces Redis and Kafka with one Rust binary using embedded fjall storage.
Build apps powered by on-device AI
Single Rust library replaces backend servers for LLM + speech in Unity and mobile apps.
Mobile and game developers building offline AI features
MLC LLM · llama.cpp · MediaPipe Tasks
We built Xybrid, a Rust library for running LLM + speech pipelines directly inside your app, no server, no daemon, just one binary.
We started building it while working on a privacy-focused LLM app with Tauri and realized there wasn’t a straightforward way to embed models directly into shipped applications without relying on a separate server process.
Xybrid links into your process like any other library. It supports GGUF / ONNX / CoreML and integrates with Flutter, Swift, Kotlin, Unity, and Tauri, letting you run pipelines like speech → LLM → speech in a single call.
On recent phones, we’re seeing ~20 tok/s on Android and ~40 tok/s on iOS for small (~3B) quantized models (varies by device, backend, and thermals).
The demo that shows it best: a Unity tavern scene where 6 NPCs generate real-time dialogue fully on-device — no API key, no internet, no per-request cost.
Unity demo: https://youtu.be/vSPeTyeow6A Desktop demo (Tauri): https://youtu.be/o83YShqV7O4
GitHub: https://github.com/xybrid-ai/xybrid
It’s still early — there are rough edges, especially around model support and performance tuning. Happy to answer questions about the architecture, backends, or integrations (Flutter, Swift, Kotlin, Unity, Tauri).
Replaces Redis and Kafka with one Rust binary using embedded fjall storage.
Backend-agnostic memory SDK with local embeddings and benchmark claims on BEAM.
Visual DAG for markdown skill dependencies with token limit analytics.
Local LLM + RAG for datasheets beats cloud AI for proprietary firmware.
Apple's neglected on-device AI APIs finally get a real showcase here.
Rust port of Qdrant's fastembed when the Python original already works fine.