Back to browse
GitHub Repository

Multi-model AI orchestration layer with routing, fallback, and caching

3 starsPython

AgentForge – Multi-LLM Orchestrator in 15KB

by chunktort·Feb 18, 2026·1 point·0 comments

AI Analysis

●●SolidDark HorseWizardry
The Take

AgentForge packs provider adapters (Claude, GPT‑4, Gemini, Perplexity), token-aware rate limiting, retry/backoff, and a MockLLMClient for tests into a tiny dependency surface — the 15KB footprint and 2 dependencies is an attention-grabber. The 3‑tier Redis cache and benchmark claims (huge latency/memory wins vs LangChain, 88% cache hit) make it a tempting low-overhead alternative, though you should validate provider feature parity and benchmarks against your workload.

Category
Target Audience

Backend/ML engineers and infra devs integrating multiple LLM providers or building production agent systems

Post Description

I built AgentForge, a minimal multi-LLM orchestrator. Total size: ~15KB of Python code.

Why? LangChain added 250ms overhead per request. I needed something simpler.

Performance vs LangChain (1,000 requests): - Avg latency: 420ms -> 65ms - Memory/request: 12MB -> 3MB - Cold start: 2.5s -> 0.3s - Test time: 45s -> 3s

Size: 15KB + 2 dependencies (httpx, pytest) vs LangChain's 15MB+ and 47 packages.

In production: 89% LLM cost reduction via 3-tier Redis caching (88% cache hit rate, verified benchmarks). 4.3M tool dispatches/sec in the core engine.

What it does: 1. Multi-agent orchestration -- route tasks to specialized agents with automatic fallbacks 2. Testing built-in -- MockLLMClient lets you assert agent behavior without API keys 3. Production patterns -- circuit breakers, rate limiting, caching included

Install: pip install agentforge

Live demo: https://ai-orchest-7mnwp9untg7gyyvchzevid.streamlit.app/

Packaged version with docs and deployment guide: https://chunkmaster1.gumroad.com

When to use it: production reliability matters, latency is a concern, you want full test coverage. When not to: prototyping, internal tools, team already on LangChain ecosystem.

Questions welcome.

Similar Projects

AI/ML●●Solid

AgentForge – Multi-LLM Orchestrator in 15KB of Python

AgentForge compresses common production patterns—token-aware rate limiting (token-bucket), retry+exponential backoff, prompt templates and cost tracking—into a tiny async core and lets you flip providers with one parameter. The multi-agent mesh and ReAct loop bits are the most interesting engineering bets here, and the repo includes benchmarks and a Streamlit demo, but it lives in a crowded space next to LangChain and similar toolkits so real differentiation will come from adoption and edge-case robustness.

Niche GemShip It
chunktort
213mo ago