Rayline routes Claude Code subagents to on-device and cheaper models

Name: Rayline routes Claude Code subagents to on-device and cheaper models
Availability: InStock
Author: davidvgilmore

by davidvgilmore·Jun 8, 2026·11 points·9 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

Routes subagents at the gateway level instead of forcing the main agent to waste tokens on routing decisions.

Strengths

•Gateway architecture avoids the token waste of tool-based routing approaches
•ML model (Arc-1) makes routing decisions deterministically, not via LLM reasoning
•Claims 60-90% cost reduction with specific metrics shown in dashboard

Weaknesses

•Only works with Claude Code, not other agent frameworks
•Enterprise router space is well-funded with established competitors

Post Description

Hi HN,

I’m one of the builders of Rayline.

Rayline is a Claude Code compatible LLM gateway. It intercepts and overrides claude code’s internal routing and lets you route subagent calls to different models instead. For example, you can run the main agent on Opus, some subagents on cloud-hosted open models, and other subagents on-device.

We’ve seen others implement routing for claude code as tools the agent can invoke. In our experience, that doesn’t work well because it requires the main agent to use tokens to think about + call the tools, and LLMs are generally a very inefficient way to make routing decisions. By implementing Rayline as a gateway, we let users deterministically configure routing decisions, and you can optionally use our ML model to make routing decisions.

We built it after noticing that Claude Code sessions contain a lot of subagent calls that don’t all need the same model. Other routers exist, but we built Rayline to let us continue using claude code (no separate harness), route tasks at a subagent level, and route across cloud and on-device. The main agent often benefits from Opus. But many delegated calls have narrow scope: search the repo, summarize context, inspect an error, poll for CI updates, etc.

The thing we’re exploring is subagent-level routing. The main cost lever in coding agents is usually cached vs non-cached input. Subagent delegations are a natural point to make routing decisions because you avoid busting cache. We look at the message-thread context for a delegated call and choose a model for that call. At a task level, Sonnet and Haiku are almost always less capability-per-dollar than open models, so the main advantage is better + (much) cheaper subagents (60-90% in our private beta).

The whole world seems to have started talking about model routing in the past two weeks, so apparently others agree it’s a relevant product area.

We’d love to get feedback from the HN community!

Similar Projects

Developer Tools●●Solid

Dragoman – Multi-model routing for Claude Code via sub-agents

Smart key management via 1Password keeps secrets out of Claude's context window.

Solve My ProblemCozy

asakin

101mo ago

AI/ML●Mid

Consciousness Gateway – AI routing with consciousness-first alignment

Product Algebra routing plus an explicit 'dharma' pipeline (no-self regularization, entropy/mindfulness metrics, compassion and ethos scores) is a strikingly specific approach — it moves beyond cost/capability heuristics into cross-modal interaction scoring and reputation-driven incentives. There's real engineering here (1s perception loop, SQLite memory, Telegram UX, multi-provider SDK support), but the repo reads young and claim-heavy: I want reproducible benchmark artifacts, links from the code to the cited 439-model experiments, and clearer deployment/security guidance before trusting it for critical workloads.

Bold BetBig Brain

AIconscious

204mo ago

Infrastructure●●Solid