Prompt2pwn – CTF Automated Solver
LLM-powered CTF solver with multi-provider support; weekend results: 13 solved across xAI, Google, Anthropic.

Live CTF stress-testing AI guardrails by attacking a real agent—novel approach to agent security validation.
AI security researchers, red teamers, and developers building AI agent safeguards
HackTheBox · Bugcrowd · Intigriti
LLM-powered CTF solver with multi-provider support; weekend results: 13 solved across xAI, Google, Anthropic.
Self-play loop fine-tunes a custom bot on its own battle replays.
Deterministic capture + replay for LLM agents is a practical, under-served problem and this repo actually ships a 'golden run' zip with cold‑run verification hashes — that’s the kind of evidence chain auditors want. The focus on portable evidence bundles and stress verification suggests useful forensics and load testing of agent logic, but the release page looks early-stage; I'd like to see integrations (tooling for popular agent frameworks), richer docs, and example pipelines before I'd evangelize it.
Agent red-teaming via UI, but attack catalog is shallow and comparison unclear vs. manual testing.
Agent-native eval workflow beats LangSmith's manual dashboard setup.
Six AI models reviewing your idea — same result as prompting each one manually.