Back to browse
A tool to create and evaluate document processing pipelines for RAG

A tool to create and evaluate document processing pipelines for RAG

by martimchaves·Mar 27, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemSlick

LLM-as-judge metrics beat guessing chunk sizes, but Ragas and LangSmith already exist.

Strengths
  • Configurable pipeline stages (OCR → chunking → embedding) for systematic testing
  • Transparent about LLM-as-judge limitations — calls it a compass, not GPS
  • Side-by-side dataset comparison with precision, recall, MRR metrics
Weaknesses
  • RAG evaluation is crowded (Ragas, Arize Phoenix, LangSmith all compete here)
  • API integration details unclear — how do you actually lock and query datasets?
Category
Target Audience

Developers building RAG applications

Similar To

Ragas · LangSmith · Arize Phoenix

Post Description

Hey HN, I built [ragbandit](https://ragbandit.com), a tool to help you evaluate different document processing pipelines for the retrieval stage of your RAG systems.

I was a bit overwhelmed with the different ways that you can process documents to create embeddings for RAG, so I wanted to create a tool to experiment with different OCR models, refining the OCR results, different chunking methods, and different embedding models.

You can: - search processed documents in the playground - evaluate the retrieval results using an llm-as-judge (not perfect, but can be a useful signal) - compare different datasets (using aggregate metrics or by side by side comparison in the playground)

You can also manually inspect the results of each query, and of each intermediate document processing result.

To get a better idea, check out one of the use cases: https://ragbandit.com/use-cases/optimizing-insurance-documen...

To be completely fair, I haven't added that many options for the different stages of the document processing pipeline! There are tons of features that I'd like to add, but I've already spent quite a bit of time on this, so I'd really appreciate it if you could let me know if this is something that could be useful for you/you find interesting. Would you use something like this?

Tech stack: Postgres (with pgvector), fastapi, [ragbandit-core](https://github.com/MartimChaves/ragbandit-core) (the document processing core is open source), typescript with react, celery for background tasks (and redis as the broker).

It's currently a credits-based subscription with optional top-ups. You can get 1000 credits to try it out (I ask for card info for these 1000 credits as a spam filter).

Thanks, Martim

Similar Projects

AI/ML●●●Banger

Legal RAG Bench

Legal RAG benchmark revealing embedding quality > LLM choice by 19-point margin.

Big BrainNiche GemSolve My Problem
beowa
413mo ago