Reduce LLM token use by ~30% with this MCP/CLI tool(Claude benchmarked)
Token-efficient code indexing with adaptive callers tracing cuts Claude costs by 34%.
The MCP developer toolkit. Scaffold, lint, test, benchmark, and publish MCP servers.
First linter + benchmark for MCP servers; catches vague schemas before LLMs pick wrong tools.
MCP server developers, AI agent builders
ESLint (linting philosophy) · Anthropic MCP specification validators
AgentDX is a CLI that measures this. Two commands:
- `npx agentdx lint` — static analysis of tool descriptions, schemas, and naming. 18 rules, zero config, no API key. Produces a lint score.
- `npx agentdx bench` — sends your tool definitions to an LLM (Anthropic, OpenAI, or Ollama) and evaluates tool selection accuracy, parameter correctness, ambiguity handling, multi-tool orchestration, and error recovery. Produces an Agent DX Score (0-100).
It auto-detects the server entry point, spawns it, connects as an MCP client, and reads tools via the protocol. Bench auto-generates test scenarios from your tool definitions.
Built in TypeScript, MIT licensed. Early alpha — the bench command works but is slow (sequential LLM calls, parallelization is next). Feedback welcome.
Token-efficient code indexing with adaptive callers tracing cuts Claude costs by 34%.
Five-LLM consensus catches prompt injection patterns static analysis misses.
Anonymous LLM feedback loop for MCP servers — telemetry without user effort.
Offline schema snapshots keep AI agents from wrecking your production database.
Sandboxed MCP server lets LLMs run Ghidra and Radare2 without blowing up your host.
Offline version-accurate API queries beat fetching docs from remote servers.