Nexus Gateway – Reduce LLM API Costs Using Semantic Caching

Name: Nexus Gateway – Reduce LLM API Costs Using Semantic Caching
Availability: InStock
Author: Sunnyanand_dev

by Sunnyanand_dev·Mar 5, 2026·2 points·1 comment

Visit Project View on HN

AI Analysis

●MidShip ItSolve My Problem

Semantic caching for LLM APIs exists (Anthropic prompt caching, Langchain, Miniplex, vLLM); gateway routing is table stakes.

Strengths

•Multi-provider BYOK (Bring Your Own Key) removes vendor lock-in—genuine customer control
•Vector-based semantic cache with configurable thresholds is a sound technical approach; claims 40–70% cost reduction
•Sub-millisecond routing overhead and SOC2 Type II certification signal production-readiness

Weaknesses

•Semantic caching itself is not novel—Anthropic's native prompt caching, vLLM, and smaller competitors (Miniplex, Helicone) already ship this
•'Universal router' for 200+ models sounds marketing-first; actual routing logic and failover strategy undefined in public docs

Post Description

Hi HN,

I'm building Nexus Gateway, an AI gateway that helps developers reduce LLM API costs.

Problem: Many applications send repeated or semantically similar prompts to LLMs, which leads to unnecessary API calls and higher costs.

Solution: Nexus Gateway uses semantic caching to detect similar prompts and serve cached responses instead of calling the LLM again.

Features: • Semantic caching to reduce repeated API calls • Multi-model support (OpenAI, Gemini, Llama, Anthropic) • BYOK support • PII protection and sovereign AI layer (in progress)

Goal: Reduce LLM costs by 40–70% while improving latency.

I’d really appreciate feedback from the community.

Website: https://www.nexus-gateway.org