Breathe-Memory – Associative memory injection for LLMs (not RAG)
Graph-based context compression beats lossy summarization when tokens run out.

150M model replaces LLM calls for evidence extraction with comparable F1 scores.
ML engineers building RAG systems, teams needing cheaper evidence extraction than LLM calls
Zilliz Semantic-Highlight · Provence · MultiSpanQA
Graph-based context compression beats lossy summarization when tokens run out.
LLM cost optimizer, but Anthropic's batch API and local quantization solve this cheaper.
LLM-as-judge metrics beat guessing chunk sizes, but Ragas and LangSmith already exist.
Structurally verifies LLM judge reasoning instead of paying for a second model check.
Single-file RAG bundle runs entirely in browser without server setup.
Cache-aware LLM eval with self-hosted model support beats Ragas on flexibility.