Salacia – The First Runtime OS for Agentic Coding
Fault-localization scaffolding for AI agents; claims 93% top-5 recall, but Cursor/Cline already integrate similar.
Reproducible recon/craft/audit agent pipeline for SWE-bench Verified. Official-graded, codex-attested, GPL-3.0. Run it yourself.
97% on SWE-bench Verified with full artifact transparency, not just a score claim.
AI researchers and SWE-bench skeptics
SWE-agent · OpenDevin · Aider
Fault-localization scaffolding for AI agents; claims 93% top-5 recall, but Cursor/Cline already integrate similar.
Transparent proxy cuts Codex context tokens by 87% via working memory.
Verifies AI agent receipts offline before the audit compliance headache actually starts.
They split responsibilities across isolated agents (engineer, reviewer, manager) that get real shell access and independent filesystems, which makes failures traceable and lets you tune model capacity per role. Hitting 72.2% on SWE-bench Verified with no benchmark-specific tuning is an impressive empirical result — interesting architecture and strong evidence — though the security and long-term reliability of autonomous shell-executing agents remain the big open questions.
Agents fail completely at rebuilding binaries from scratch without source code.
AI agent actually fixes bugs in real VMs, not just prompting. Firecracker isolation + verified PRs.