GitHub Repository

The "Cloudflare for AI Agents". 7-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.

14 starsPython

ToolGuard – Pytest for AI agent tool calls

Name: ToolGuard – Pytest for AI agent tool calls
Availability: InStock
Author: Heer_J

by Heer_J·Mar 17, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerShip ItSolve My ProblemBig Brain

Layer 2 execution testing without LLMs when eval frameworks only test intelligence.

Strengths

•Separates execution reliability from intelligence evals — genuinely useful framing for agent developers
•Deterministic fuzzing from type hints means no LLM needed to run the tests themselves
•Reliability score with deploy block recommendations gives clear go/no-go signals

Weaknesses

•Very early with only 16 commits and 1 star — needs more real-world validation
•Python-only support limits adoption for teams using other agent frameworks

Post Description

I got tired of my AI agents crashing because the LLM hallucinated a JSON key or passed a string instead of an int. So I built ToolGuard — it fuzzes your Python tool functions with edge-cases (nulls, missing fields, type mismatches, 10MB payloads) and gives you a reliability score out of 100%.

No LLM needed to run tests. It reads your type hints, generates a Pydantic schema, and deterministically breaks things.

pip install py-toolguard

GitHub: https://github.com/Harshit-J004/toolguard

If you are building complex tool chains, I would be incredibly honored if you checked out the repo. Brutal feedback on the architecture is highly encouraged!

Similar Projects

AI/ML●●●Banger

ToolGuard – Pytest for AI agent tool calls

Finally, pytest for AI tool calls when evals only test intelligence.

Solve My ProblemZero to One

Heer_J

123mo ago

AI/ML●●●Banger

Needle: We Distilled Gemini Tool Calling into a 26M Model

Distilled Gemini tool-calling into a 26M model that runs at 1200 tok/s on phones.

Big BrainWizardry

HenryNdubuaku

7762111mo ago

Developer Tools●●Solid

A lightweight compiler for untrusted AI Agent scripts

Restricted DSL for AI agents wraps existing functions instead of sandboxing entire runtimes.

Big BrainShip It

hoansdz

2217d ago

Education●●Solid

Let's build Claude Code from scratch (tutorial)

No-framework Python agent build that actually runs, not just theory.

CozyShip It

CohleM

301mo ago

Developer Tools●●Solid

PolyMCP – Run MCP Python Tools in WASM via Pyodide

PolyMCP turns Python functions into a single Pyodide WASM bundle so agents can call tools directly in the browser or at the edge — neat and practical. It keeps MCP niceties like input validation, error handling, and orchestration inside the bundle and ships runnable demo HTML to prove the flow. Be realistic about Pyodide trade-offs: bundle size and no native-extension support make this best for lightweight, interactive tools and demos rather than heavy backend workloads.

WizardryNiche Gem

justvugg

204mo ago

Developer Tools●●●Banger

Evalcraft – cassette-based testing for AI agents (pytest, $0/run)

VCR for LLM calls—eliminates API costs and non-determinism in agent testing.

Solve My ProblemShip ItSlick

beyhang

103mo ago