EvalsHub: Your AI is failing in production and you don't know it

Name: EvalsHub: Your AI is failing in production and you don't know it
Availability: InStock
Author: neilsharma425

by neilsharma425·Mar 20, 2026·4 points·1 comment

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemSlick

Replaces stitching Langfuse and promptfoo together with one unified eval dashboard.

Strengths

•Unifies tracing, red-teaming, and CI/CD gates in a single workflow
•Custom rubric weighting allows tailored quality scoring per use case
•Automated production scoring catches model regressions before they hit deployment

Weaknesses

•Crowded market with established players like LangSmith and Arize
•Proprietary SaaS model creates vendor lock-in for eval data

Post Description

I was tired of stitching together Langfuse for tracing, promptfoo for red teaming and evals, and custom scripts for CI/CD. It was a mess so I built EvalsHub.

EvalsHub does all of it in one place. Automatic production scoring, red teaming, prompt versioning, and CI/CD integration. Zero to full eval coverage in 30 minutes.

Would love brutal feedback from anyone shipping AI in production.

evalshub.ai