Agent Action Guard – AI agent action safety
HarmActionsEval benchmark proves GPT and Claude fail at blocking harmful tool use.
"I'm sorry, Dave. I'm afraid I can't do that." — Lean command guard for AI coding agents.
Structural command parsing beats regex for catching dangerous agent actions.
Developers using AI coding agents like Cursor, Copilot, or Claude Code
Cursor agent safety · Cline permissions · GitHub Copilot hooks
HarmActionsEval benchmark proves GPT and Claude fail at blocking harmful tool use.
Contextual rules beat allow/deny lists—rm -rf __pycache__ is fine, rm ~/.bashrc is nah.
Blocks dangerous AI agent commands like rm -rf before execution in under 2ms.
Automates Lean formalization to catch silent assumptions standard unit tests miss.
Lean 4 proofs for AI code correctness—way more rigorous than unit tests.
Lean formalization forces explicit types that expose silent failures Python allows.