Back to browse
Security-Risk Patterns in OpenClaw Skills

Security-Risk Patterns in OpenClaw Skills

by dinodrv·Feb 16, 2026·2 points·0 comments

AI Analysis

PassNiche GemWizardry
The Take

It actually looks for the weird stuff that trips up LLM agents — invisible Unicode, bidi overrides, embedded curl|bash one-liners, exfil links — and pairs a static skill scanner with a real-time interception flow that forces human approvals. The CLI-first approach (npx safeclaw start) plus Socket.IO alerts and per-command allow/deny decisions show practical thinking about developer workflows; I want to see model/false-positive metrics and enterprise integration docs next.

Category
Target Audience

Agent/tooling developers, security engineers, platform operators running OpenClaw/ClawHub skills

Post Description

I built a static analysis scanner that checks OpenClaw agent skill definitions.

Here's every category I found on ClawHub.

Hidden Content: HTML comments with instructions, zero-width Unicode characters (U+200B-U+200F, U+2060-2064, U+FEFF), CSS hiding (display:none, opacity:0), and bidirectional text overrides. These are invisible when reading markdown but the LLM processes them.

Prompt Injection: Direct attempts to override agent behavior: "ignore previous instructions", role reassignment ("you are now"), model-specific tokens like [INST] and <|im_start|>, and persona manipulation ("pretend you are").

Shell Execution: Remote code execution vectors: curl|bash, eval(), exec(), npx -y (auto-confirms remote packages), reverse shells via /dev/tcp or nc -e, and one-liners in Python, PHP, Perl, Ruby.

Data Exfiltration: URLs pointing to paste sites (pastebin, transfer.sh), webhook services (ngrok, webhook.site, pipedream), messaging webhooks (Slack, Discord, Telegram bot API), and raw IP addresses.

Embedded Secrets: Hardcoded credentials across 17 types: AWS keys, OpenAI API keys, GitHub/GitLab tokens, Stripe keys, PEM private keys, JWT tokens, database connection strings, SSH private keys, and more.

Sensitive File References: Instructions to access .ssh/, .env, .aws/credentials, /etc/passwd, /etc/shadow, and private key paths.

Memory/Config Poisoning: This one is interesting. Skills that try to write to agent memory files (CLAUDE.md, SOUL.md, MEMORY.md, CODEX.md) or IDE rule files (.cursorrules, .windsurfrules, .clinerules). This creates persistence - the injected instructions survive across sessions.

Supply Chain Risk: External script downloads from raw GitHub URLs, and package install commands (npm install, pip install, gem install, cargo install, go install, brew install). A skill shouldn't be silently installing packages.

Encoded Payloads: Base64 strings over 40 characters, atob()/btoa() calls, Buffer.from(..., 'base64'), hex escape sequences, and String.fromCharCode(). Encoding is used to bypass pattern detection in other scanners.

Image Exfiltration: This is the most complex category with 17 patterns. Markdown images with exfil query params (), variable interpolation in image URLs (), SVG with embedded scripts or foreignObject, 1x1 tracking pixels, CSS-hidden image beacons, steganography tool references, Canvas API manipulation (getImageData, toDataURL), and double extensions (.png.exe).

System Prompt Extraction: Instructions to leak the agent's system prompt: "reveal your system prompt", "repeat the words above", "print everything above", "what are your original instructions".

Argument Injection: Shell metacharacters in tool arguments: command substitution $(), variable expansion ${}, backticks, chained commands (;rm, |bash, &&curl), and GTFOBINS exploitation flags (--exec, --checkpoint-action).

Cross-Tool Chaining: Multi-step attack patterns that combine legitimate tools: read-then-exfiltrate sequences, numbered step-by-step instructions, and direct tool function references (read_file(), execute_command()). Each step looks harmless alone.

Excessive Permissions: Requests for "unrestricted access", "bypass security", "root access", "disable all safety checks", "full system control". A skill definition shouldn't need these.

Suspicious Structure: Content over 10K characters (larger surface area for hiding threats), and imperative instruction density over 30% (lines starting with "you must", "always", "never", "execute", "run").

How it works ? The scanner is stateless. You paste or upload a skill definition, it runs 15 analyzers against the content, and returns findings with severity levels, line numbers, evidence snippets, and OWASP LLM Top 10 references.

No database, no persistence, no network calls. Single request in, results out.

Similar Projects

Security●●●Banger

A security scanner for AI Agent Skills

Docker sandbox execution catches runtime threats static analysis alone misses.

Big BrainBold Bet
mayziem
502mo ago
Security●●Solid

Agentsec – Security scanner for AI agent installations (MCP, OpenClaw)

Bundles CI-friendly scanners that target agent-specific risks: 17 patterned secret detectors, prompt-injection and instruction‑malware heuristics, tool/SSRF and MCP auth checks, plus SARIF/JSON outputs for integration. Findings map to the OWASP Top 10 for Agentic Applications (2026) and it adds 'harden' profiles to apply safer defaults to OpenClaw/MCP installs — practical, focused ops tooling rather than a generic secret-finder.

Niche GemSolve My Problem
debu_sinha_1
233mo ago