GitHub Repository

🪂 Latency-aware model cascade for agentic LLM workflows. Auto-switches to a faster model when yours is slow.

4 starsPython

glide – LLM cascade proxy, auto-switches models before timeout

Name: glide – LLM cascade proxy, auto-switches models before timeout
Availability: InStock
Author: phanisaimuni116

by phanisaimuni116·Mar 7, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemNiche Gem

TTFT-aware model fallback—avoids timeouts by hedging between Opus, Sonnet, Haiku automatically.

Strengths

•Proactive p95 routing skips slow models without waiting, reducing perceived latency.
•Request hedging fires parallel models and streams whichever answers first, canceling losers.
•Zero-config setup: single shell command starts proxy, integrates via base URL env var.

Weaknesses

•Limited to Anthropic models plus local Ollama; no support for OpenAI, Gemini, Grok.
•Routing logic depends on historical TTFT percentiles—may thrash if load is bursty or unpredictable.

Similar Projects

AI/ML●Mid

Gptbased – LLM leaderboard that emails you when to switch

Pareto frontier optimization finds cheaper, stronger models when they ship.

Solve My Problem

gptbased

104d ago

Security●●Solid

Aegis.rs, the first open source Rust-based LLM security proxy

Zero-code LLM firewall; heuristics under 1ms, optional Groq semantic layer.

Solve My ProblemSlick

ParzivalHack

224mo ago

AI/ML●●Solid

Openfusion - enhanced results from a panel of models

Self-hosted OpenRouter Fusion alternative with tunable judge strategies.

Ship ItBig Brain

shadag

204d ago

Developer Tools●●Solid

Preventing runaway LLM agents (enforcement layer)

VERONICA puts an enforcement shim between your agent and the model so you can halt costly spirals before a request hits the provider — it natively exposes hard budget enforcement, circuit breakers, retry containment and degradation levels. The README + runnable runaway-loop demo make the failure mode concrete and the API (BudgetEnforcer, RuntimeContext, BudgetExceeded) is small and practical. I'd like to see richer observability/adapter docs for common agent frameworks, but as an enforcement-first primitive this is a clever, useful tool.

Niche GemBig Brain

amabito

124mo ago

Developer Tools●●●Banger

Isartor – Pure-Rust prompt firewall, deflects 60-95% of LLM traffic

Local semantic caching cuts LLM costs without changing your code.

Solve My ProblemSlick

zippode

312mo ago

Security●●Solid

Pseudonymizing sensitive data for LLMs without losing context

Proxy pseudonymizes sensitive data so LLMs don't hallucinate like they do with regex.

Big BrainSolve My Problem

n00pn00p

482mo ago