Regrada – The CI gate for LLM behavior

Name: Regrada – The CI gate for LLM behavior
Availability: InStock
Author: matiasmolinolo

by matiasmolinolo·Mar 16, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemBig BrainSlick

Zero-code proxy capture beats SDK-based eval tools like LangSmith and Arize.

Strengths

•HTTP proxy interception means zero instrumentation—works with existing apps day one.
•--explain flag diagnoses why behavior shifted, not just that it failed.
•regrada fuzz mutates inputs to find brittle prompt edge cases before production.

Weaknesses

•LLM eval space is crowded—LangSmith, Helicone, and Arize already have mindshare.
•Cloud runner option raises questions about data residency for sensitive traces.

Post Description

I built Regrada to help me with prompt changes.

Working on LLM-based applications led me to discover 2 big pain points in my opinion:

1. it's difficult to monitor how a prompt change might break behavior. 2. testing SDKs are difficult/high friction to actually integrate and run.

Regrada solves this by intercepting LLM calls, to then build traces and baselines we can compare against and create CI gates to make sure behavior drifts don't reach prod.

Some cool features we've built:

`--explain`: when a case fails after a model change, an LLM helps you detect why the behavior shifted. Not just "assertion failed" but "the model is now truncating before the conclusion clause." Saves a lot of digging to diagnose.

`regrada fuzz`: runs mutations on your inputs (typos, reorderings, edge cases) to find cases where your prompt is more brittle than you think. Caught a production issue for me before launch.

You can run fully local or connect to the cloud runner.

Still pre-launch, actively looking for teams to try it.

Happy to answer questions about how the assertion layer works, the model-agnostic design, or anything else.

https://www.regrada.com/