GitHub Repository

Multiplexer for MCP tool calls: parallel execution, batching, caching, and pipelining for any MCP server

8 starsTypeScript

Callmux – MCP multiplexer that cuts tool call context pollution by ~19x

Name: Callmux – MCP multiplexer that cuts tool call context pollution by ~19x
Availability: InStock
Author: edimuj

by edimuj·Apr 22, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainWizardry

19x context pollution reduction via batching — solves a problem nobody's talking about yet.

Strengths

•Specific math: 7 sequential calls become 1 batched call, eliminating intermediate reasoning tokens.
•Works as transparent proxy — no code changes to existing MCP servers.
•Adds parallel execution, pipelining, and caching that original MCP servers lack.

Weaknesses

•Only 1 GitHub star at submission — unproven in production workflows.
•MCP itself is still emerging, limiting immediate audience size.

Post Description

Every tool call an AI agent makes adds tokens to the conversation context. Not just the payload data, but the JSON wrappers, the role markers, and worst of all, the model's intermediate reasoning between calls ("Now I'll fetch the next one..."). These compound: each subsequent call re-processes everything before it, so total input tokens grow quadratically with sequential calls.

I built callmux to fix this. It's an MCP proxy that sits between your agent (Claude, Codex, etc.) and any MCP server, adding parallel execution, batching, pipelining, and caching as meta-tools. Instead of 7 sequential get_issue calls, the agent makes 1 callmux_parallel call. The actual data transferred is identical. What you eliminate is the per-call overhead.

The math surprised me. For a batch of 7 operations:

Without callmux: ~525 tokens of structural overhead + ~900 tokens of intermediate reasoning = ~1,425 tokens of pollution With callmux: ~75 tokens total That's ~19:1 less context pollution from a 7:1 reduction in tool calls

Prompt caching helps with the cost side of re-reading previous turns, but it doesn't shrink your context window. Every intermediate reasoning turn still sits there taking up space, and compaction still triggers at the same threshold.

In practice, callmux reduces tool calls to about 15% of the original count. But the context savings are larger than that ratio suggests, because you're also eliminating the intermediate reasoning between those calls, which is the biggest source of pollution.

The result: sessions last longer before hitting context limits, and the context window has less noise competing with your actual conversation.

I wrote up the full context math with diagrams here: https://longgamedev.substack.com/p/your-ai-agent-is-re-readi...

Setup is one line:

npx -y callmux -- npx -y @modelcontextprotocol/server-github

Works with Claude Code, Codex, Claude Desktop. Also supports multi-server mode, remote HTTP/SSE servers, and tool filtering.

npm: https://www.npmjs.com/package/callmux

Similar Projects

AI/ML●●●Banger

Alumnium – SOTA Browsing for Claude Code

SOTA 98.5% WebVoyager score using compressed browser tools that keep context windows clean.

Big BrainWizardry

p0deje

602mo ago

Developer Tools●●Solid

Unix-style pipeline composition for MCP tool calls

The project implements a sandboxed, server-side 'shell' that pipes MCP tool calls together so agents return only final outputs — a smart way to save tokens and handle datasets too large for LLM context. The repo includes a demo video, tests, and a real shell_engine/mcp_client implementation, but it's a focused infra play for the MCP ecosystem and will matter most to teams building agent platforms rather than general devs.

Niche GemBig BrainShip It

kantord

304mo ago

Developer Tools●●●Banger