UnifyRoute – Self-hosted OpenAI-compatible LLM gateway with failover
Drop-in OpenAI API gateway with failover—LiteLLM does this but this has a dashboard.
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
Multi-backend LLM manager when Ollama and LM Studio already handle this.
Developers running self-hosted LLM infrastructure
Ollama · LM Studio · LocalAI
I originally built this because I got tired of constantly SSHing to my server to edit a config just try out a new model. It's grown a lot since then.
What it does:
Web UI for creating and managing LLM instances from your browser
Full llama.cpp model lifecycle - download from HuggingFace, create preset.ini configs with an in-browser editor, load/unload models via router mode
Automatic idle timeout, LRU eviction, and instance limits
llama.cpp, mlx_lm and vllm backends
OpenAI and Anthropic API compatible endpoints (backend-dependent)
Multi-node support for distributing instances across hosts
Inference API keys with per-instance access control
Drop-in OpenAI API gateway with failover—LiteLLM does this but this has a dashboard.
Go gateway with circuit breakers, but auth isn't production-ready yet.
LiteLLM and OpenRouter already solve multi-provider routing better and have production users.
Zero-trust networking via zrok beats LiteLLM when your GPUs sit behind NAT.
Plug any OpenAI-compatible provider into a single UI, switch models mid-session, and run side-by-side comparisons while tracking usage — everything you'd expect from a multi-model chat client. The design is eye-catching and the web/desktop split suggests a real app, but this is a crowded niche; the product will live or die on stability of provider integrations, context/memory handling, and clear privacy controls.
Granular API key controls and token cost tracking beat basic llama.cpp wrappers.