Open-source scanner finds 97% of AI agent code non-compliant EU AI Act

Name: Open-source scanner finds 97% of AI agent code non-compliant EU AI Act
Availability: InStock
Author: airblackbox

by airblackbox·Mar 4, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●SolidBig BrainBold Bet

Linter for EU AI Act: scans agent code against Articles 9–15, finds 97% non-compliance.

Strengths

•Novel framing: compliance as static analysis problem, not vague legal interpretation—lowering governance from lawyers to tooling
•Empirical audit of 5,754 files across 341K GitHub stars demonstrates real scope and produces actionable scorecards per project
•Concrete checks (risk management, human oversight, record-keeping) translate regulation into detection rules, not marketing claims

Weaknesses

•Compliance scanning is inherently a cat-and-mouse game: as projects patch findings, rules require constant updates and legal reinterpretation
•Tool adoption requires governance buy-in; typical open-source teams won't run mandatory linters for external regulations until enforcement is real

Post Description

I built AIR Blackbox, an open-source static analysis tool that scans Python AI agent code against 6 technical requirements from the EU AI Act (Articles 9, 10, 11, 12, 14, 15). Think of it as a linter for AI governance. To stress-test the scanner — and to see where the industry actually stands — I ran it against 5,754 Python files across 11 major open-source projects. Combined GitHub stars: 341,000+. Projects scanned: AutoGPT (170K stars), Microsoft AutoGen (38K), LlamaIndex (37K), Mem0 (24K), Phidata (18K), LiteLLM (15K), GPT-Researcher (14K), Embedchain (9.2K), LangGraph (8.5K), OpenAI Agents SDK (5.2K), CrewAI Examples (2.8K). Results:

Average compliance score: 2.2 out of 6 articles 97% of files fail Article 9 (Risk Management) 89% fail Article 12 (Record-Keeping) 84% fail Article 14 (Human Oversight) Only 23 out of 5,754 files (0.4%) pass all 6 checks Best scoring repo: AutoGPT at 2.9/6. Worst: CrewAI examples at 1.4/6

What the scanner checks (per article):

Art. 9: risk classification, access control, risk audit Art. 10: input validation, PII handling, data schemas, provenance Art. 11: logging, documentation, type hints Art. 12: structured logging, audit trail, timestamps, log integrity Art. 14: human review, override mechanism, notifications Art. 15: input sanitization, error handling, testing, rate limiting

An article "passes" if at least 1 sub-check is detected. This is generous — real compliance requires substantially more. Caveats I'll save you the trouble of pointing out:

This is static analysis. It can't verify runtime behavior. File-level scanning misses cross-file compliance patterns. The pass threshold is intentionally lenient (1-of-N sub-checks). This checks technical requirements, not legal compliance. It's a linter, not a lawyer.

The EU AI Act enforcement deadline is August 2026. The full report, raw data (JSON), and the scanning scripts are all in the repo.

GitHub: https://github.com/air-blackbox/air-blackbox-mcp Full report: https://github.com/air-blackbox/air-blackbox-mcp/blob/main/b... Install: pip install air-blackbox-mcp Demo: https://huggingface.co/spaces/airblackbox/air-blackbox-scann...

Happy to answer questions about the methodology, the scanner internals, or what we're building next (fine-tuned local LLM for deeper analysis — your code never leaves your machine).