Back to browse
GitHub Repository

An open-source retrieval engine implementing FABLE — structured, hierarchy-aware RAG

5 starsPython

OpenFable – Open-source RAG engine using tree-structured indexes

by alainbrown·Apr 8, 2026·5 points·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

LLM-identified discourse boundaries beat fixed-size chunking for complex queries.

Strengths
  • Tree-structured indexes with embeddings at every hierarchy level enable context-aware retrieval.
  • Bi-path retrieval combines LLM reasoning with vector similarity for better accuracy.
  • Adaptive token budget control lets you constrain what gets sent to the generation model.
Weaknesses
  • RAG infrastructure is extremely crowded with LlamaIndex, LangChain, and dozens of alternatives.
  • No benchmark comparisons showing FABLE actually outperforms existing hierarchical RAG approaches.
Category
Target Audience

Developers building RAG pipelines and retrieval systems

Similar To

LlamaIndex · LangChain · RAPTOR

Post Description

Hi HN, I built OpenFable, an open-source retrieval engine that implements the FABLE algorithm (https://arxiv.org/abs/2601.18116) for RAG pipelines. I'm using it in another project and thought that others might benefit.

Most RAG systems chunk documents into flat segments and retrieve by vector similarity. This works for simple lookups but breaks when answers span multiple sections, when relevant content is buried in a subsection, or when you need to control how many tokens you're sending to an LLM. OpenFable takes a different approach: when you ingest a document, it uses an LLM to identify discourse boundaries (not fixed-size windows), then builds a hierarchical tree, root, sections, subsections, leaf chunks, with embeddings at every level. Retrieval combines two paths: 1. LLM-guided path: the LLM reasons about which documents and subtrees are relevant from summaries 2. Vector path: similarity search with structure-aware score propagation through the tree Results from both paths are fused, deduplicated, and trimmed to fit a token budget you specify. You get the most relevant chunks, in document order, within budget. From the FABLE paper: the algorithm matches full-context inference (517K tokens) using only 31K tokens, 94% reduction, while hitting 92% completeness vs. Gemini-2.5-Pro at 91% with the full document. Retrieval only; OpenFable returns ranked chunks, not generated answers. Bring your own LLM for generation. It runs as a Docker stack (FastAPI + PostgreSQL/pgvector) and exposes both a REST API and an MCP server, so LLM agents like Claude Desktop or Cursor can use it directly. Trade-offs I want to be upfront about: - Ingestion is expensive; every document requires multiple LLM calls for chunking and tree construction - Retrieval isn't sub-second, the LLM-guided paths add round-trips - No built-in auth; designed to sit behind a reverse proxy - v0.1.0 — works end to end but the roadmap includes async ingestion, document deletion, and metadata filtering Stack: Python 3.12, FastAPI, SQLAlchemy, pgvector, LiteLLM, fastMCP. Apache 2.0. Happy to answer questions about the algorithm, implementation choices, or benchmarks.

Similar Projects