Dragoman – Multi-model routing for Claude Code via sub-agents
Smart key management via 1Password keeps secrets out of Claude's context window.

Routes subagents at the gateway level instead of forcing the main agent to waste tokens on routing decisions.
Developers using Claude Code for coding tasks
LiteLLM · OpenRouter · Portkey
I’m one of the builders of Rayline.
Rayline is a Claude Code compatible LLM gateway. It intercepts and overrides claude code’s internal routing and lets you route subagent calls to different models instead. For example, you can run the main agent on Opus, some subagents on cloud-hosted open models, and other subagents on-device.
We’ve seen others implement routing for claude code as tools the agent can invoke. In our experience, that doesn’t work well because it requires the main agent to use tokens to think about + call the tools, and LLMs are generally a very inefficient way to make routing decisions. By implementing Rayline as a gateway, we let users deterministically configure routing decisions, and you can optionally use our ML model to make routing decisions.
We built it after noticing that Claude Code sessions contain a lot of subagent calls that don’t all need the same model. Other routers exist, but we built Rayline to let us continue using claude code (no separate harness), route tasks at a subagent level, and route across cloud and on-device. The main agent often benefits from Opus. But many delegated calls have narrow scope: search the repo, summarize context, inspect an error, poll for CI updates, etc.
The thing we’re exploring is subagent-level routing. The main cost lever in coding agents is usually cached vs non-cached input. Subagent delegations are a natural point to make routing decisions because you avoid busting cache. We look at the message-thread context for a delegated call and choose a model for that call. At a task level, Sonnet and Haiku are almost always less capability-per-dollar than open models, so the main advantage is better + (much) cheaper subagents (60-90% in our private beta).
The whole world seems to have started talking about model routing in the past two weeks, so apparently others agree it’s a relevant product area.
We’d love to get feedback from the HN community!
Smart key management via 1Password keeps secrets out of Claude's context window.
Product Algebra routing plus an explicit 'dharma' pipeline (no-self regularization, entropy/mindfulness metrics, compassion and ethos scores) is a strikingly specific approach — it moves beyond cost/capability heuristics into cross-modal interaction scoring and reputation-driven incentives. There's real engineering here (1s perception loop, SQLite memory, Telegram UX, multi-provider SDK support), but the repo reads young and claim-heavy: I want reproducible benchmark artifacts, links from the code to the cited 439-model experiments, and clearer deployment/security guidance before trusting it for critical workloads.
First gateway with native MCP server—connect Claude Code or Cursor in one command.
Blog post masquerading as a product—no code, no demo, no install command.
Semantic caching for LLM APIs exists (Anthropic prompt caching, Langchain, Miniplex, vLLM); gateway routing is table stakes.
Multi-model consensus for code review, but orchestrating existing Claude Code team system—table stakes.