OpenClaw skills degrade agent safety
Behavioral safety testing reveals 45 regressions static analysis misses—guardrails provided.
Agents that evolve their own skills. Self-healing multi-agent systems.
Self-healing agents patch prompts automatically via replay validation; beats manual iteration.
AI engineers building multi-agent pipelines, prompt engineers optimizing workflows
Promptfoo · DSPy · LangSmith tracing
Each skill is a structured SKILL.md. After every run, an LLM judge scores each skill and tags exact failures. An LLM patcher generates candidate fixes to just the failing section. Each candidate is replayed on past traces. Winner gets promoted. Loser discarded.
One command: evoagents autofix
Key decisions: - LLM-as-judge, not regex — constraints are natural language, evaluation should be too - Section-level patching — only the broken part gets touched - Replay gating — no patch ships without proving it improves on real data - Versioned — every change is a new version, instant rollback
You can steer it: evoagents autofix --guide "prefer primary sources"
pip install evoagents https://github.com/jatingargiitk/evoagents
Behavioral safety testing reveals 45 regressions static analysis misses—guardrails provided.
Turns pass/fail eval signals into reusable skills without retraining the model.
Bulk-install AI skills across 30+ agents from one terminal UI.
Lightweight A/B testing for SKILL.md files when LangSmith feels too heavy.
Security-scanned SKILL.md marketplace when GitHub repos have no vetting.
Single Go binary with OpenAPI spec beats framework-locked orchestration platforms.