Sift, a local-first CLI for failures, root causes and next steps
198k tokens down to 129 — local heuristics beat LLM summarization.
Collapses 8KB cargo test output to one line while preserving failure details.
Developers using AI coding agents (Cursor, Continue, local LLMs)
Cursor · Continue · Sourcegraph Cody
Starting to run local inference has highlighted something I've been aware for longer: just running tests output shedloads of text into the context window that is there for good until compaction or starting afresh. For example, a single `cargo test` dumping 8KB into the agent's context just to communicate "47 test passed." The agent reads all of it, learns nothing useful, and the context window fills with noise. Makes LLM prefill slower as well as costs more when using per token APIs.
I created a small program that sits between the command output and the LLM: oo, or double-o ... yes, sad play on words. Double-o, the agent's best friend :)
oo wraps commands and classifies their output:
- Small output (<4KB): passes through unchanged - Known success pattern: one-line summary (oo cargo test → cargo test (47 passed, 2.1s)) - Failure: filtered to actionable errors - Large unknown output: indexed locally, queryable via oo recall
It currently ships with 10 built-in patterns (pytest, cargo test, go test, jest, eslint, ruff, cargo build, cargo clippy, go build, tsc), but users can add their own via TOML files or use oo learn <cmd> to have an LLM generate one from real command output (currently only with Anthropic models).No agent modification needed: add "prefix commands with oo" to your system prompt. Single Rust binary, 197 tests, Apache-2.0.
The classification engine works using regex-based pattern matching with per-command failure strategies (tail, head, grep, between) and automatic command categorization (status/content/data/unknown) that determines what happens with unrecognized commands. Content commands like git diff always pass through; data commands like git log get indexed when large.
Especially noticeable with local models & wall-clock time. Helps with frontier models too ... cleaner context, fewer confused follow-ups.
198k tokens down to 129 — local heuristics beat LLM summarization.
Nested agent summarization cuts token costs ~45% for command-heavy workflows.
Adds structure layer to AI agents: +9pp pass rate, 93% fault localization on SWE-bench.
98% token savings on Gradle output—genuinely smart compression for coding agents.
Lightweight A/B testing for SKILL.md files when LangSmith feels too heavy.
Spotify Wrapped for agents is gimmicky, but bidirectional IDE-free control is useful.