Z3r0 – Multi-agent red team collaboration platform
Docker-sandboxed agent orchestration for red teams joins a crowded automated pentesting space.
LLM-as-Judge red-teaming for system prompts, but Anthropic/OpenAI already ship this internally.
AI engineers and teams building LLM-based agents and chatbots
Anthropic's internal red-teaming suite · OpenAI's moderation API · Pydantic's guardrails tooling
It’s an open-source sandbox that runs an automated barrage of standard exploits against your target LLM to see if it leaks data or ignores core instructions.
How it works under the hood: - The UI is built with Streamlit, backend is FastAPI, and dependency management is handled by `uv`. - You paste your system prompt and hit run. It fires 12 baseline attack vectors (Direct leaks, XSS payloads, Context overflows, etc.) concurrently. - The core mechanic is "LLM-as-a-Judge". It uses a hardcoded `gpt-4.1-mini` with strict alignment rules to systematically evaluate the target's responses. - It supports OpenAI, Anthropic, and a solid list of open-weight models via OpenRouter (including DeepSeek V3/R1, Qwen 2.5, and Llama 3.3).
There is a hosted free version if you want to play with it immediately (I capped it at 15 requests/IP to survive the launch), but the entire tool is open-source and takes 30 seconds to spin up locally with Docker or `uv`.
Repo: https://github.com/BreakMyAgent/breakmyagent-os Live demo: https://breakmyagent.dev
Next on the roadmap: I'm building a dedicated CLI/GitHub Action so teams can drop this into their own CI/CD pipelines to block prompt regressions. I'm also developing a PoC for multi-turn agentic fuzzing and expanding the payload database for complex tool-spoofing.
I’d love to hear your feedback! What other test configurations (besides temperature and response format) do you think are essential for a tool like this? Also open to any feedback on the architecture, the judge prompt, or specific zero-day vectors you'd like to see included in the public database.
Docker-sandboxed agent orchestration for red teams joins a crowded automated pentesting space.
Multi-model ensemble scoring with Python-side arithmetic prevents LLM manipulation during live demos.
Dynamic LLM sandbox analysis detects injections that static pattern matching tools miss.
First automated red teaming for agentic AI at scale—enterprise gap now weaponized.
The UI turns complex attack chains into an immediately scannable graph with per-path metrics (risk score, time-to-compromise, assets/credentials impacted) — great for threat modeling and tabletop drills. Feels more like a very polished BAS visualization than a novel research tool; what I want to know next is where the simulation inputs come from (real telemetry, vulnerability feeds, or canned scenarios).
One-line SDK swap + PR red-teaming with A-F grades—but 'detect-and-block' gateway category is well-funded.