Back to browse

Open-source scanner finds 97% of AI agent code non-compliant EU AI Act

by airblackbox·Mar 4, 2026·1 point·1 comment

AI Analysis

●●SolidBig BrainBold Bet

Linter for EU AI Act: scans agent code against Articles 9–15, finds 97% non-compliance.

Strengths
  • Novel framing: compliance as static analysis problem, not vague legal interpretation—lowering governance from lawyers to tooling
  • Empirical audit of 5,754 files across 341K GitHub stars demonstrates real scope and produces actionable scorecards per project
  • Concrete checks (risk management, human oversight, record-keeping) translate regulation into detection rules, not marketing claims
Weaknesses
  • Compliance scanning is inherently a cat-and-mouse game: as projects patch findings, rules require constant updates and legal reinterpretation
  • Tool adoption requires governance buy-in; typical open-source teams won't run mandatory linters for external regulations until enforcement is real
Category
Target Audience

AI governance teams, compliance officers, open-source maintainers scaling AI products

Similar To

Semgrep · Snyk · Checkov

Post Description

I built AIR Blackbox, an open-source static analysis tool that scans Python AI agent code against 6 technical requirements from the EU AI Act (Articles 9, 10, 11, 12, 14, 15). Think of it as a linter for AI governance. To stress-test the scanner — and to see where the industry actually stands — I ran it against 5,754 Python files across 11 major open-source projects. Combined GitHub stars: 341,000+. Projects scanned: AutoGPT (170K stars), Microsoft AutoGen (38K), LlamaIndex (37K), Mem0 (24K), Phidata (18K), LiteLLM (15K), GPT-Researcher (14K), Embedchain (9.2K), LangGraph (8.5K), OpenAI Agents SDK (5.2K), CrewAI Examples (2.8K). Results:

Average compliance score: 2.2 out of 6 articles 97% of files fail Article 9 (Risk Management) 89% fail Article 12 (Record-Keeping) 84% fail Article 14 (Human Oversight) Only 23 out of 5,754 files (0.4%) pass all 6 checks Best scoring repo: AutoGPT at 2.9/6. Worst: CrewAI examples at 1.4/6

What the scanner checks (per article):

Art. 9: risk classification, access control, risk audit Art. 10: input validation, PII handling, data schemas, provenance Art. 11: logging, documentation, type hints Art. 12: structured logging, audit trail, timestamps, log integrity Art. 14: human review, override mechanism, notifications Art. 15: input sanitization, error handling, testing, rate limiting

An article "passes" if at least 1 sub-check is detected. This is generous — real compliance requires substantially more. Caveats I'll save you the trouble of pointing out:

This is static analysis. It can't verify runtime behavior. File-level scanning misses cross-file compliance patterns. The pass threshold is intentionally lenient (1-of-N sub-checks). This checks technical requirements, not legal compliance. It's a linter, not a lawyer.

The EU AI Act enforcement deadline is August 2026. The full report, raw data (JSON), and the scanning scripts are all in the repo.

GitHub: https://github.com/air-blackbox/air-blackbox-mcp Full report: https://github.com/air-blackbox/air-blackbox-mcp/blob/main/b... Install: pip install air-blackbox-mcp Demo: https://huggingface.co/spaces/airblackbox/air-blackbox-scann...

Happy to answer questions about the methodology, the scanner internals, or what we're building next (fine-tuned local LLM for deeper analysis — your code never leaves your machine).

Similar Projects