Back to browse
GitHub Repository

Evaluate structured LLM outputs with precision. Compare model outputs against expected schemas and values — row by row.

4 starsTypeScript

EvalLens – Open-source tool to evaluate structured LLM outputs

by simonrendon·Apr 6, 2026·1 point·0 comments

AI Analysis

●●SolidNiche GemShip It

Schema conformance checks beat generic text evals for JSON-heavy LLM pipelines.

Strengths
  • Failure taxonomy explains why structured outputs broke instead of just binary pass/fail.
  • Self-hosted mode generates actuals via API keys without sending data externally.
  • Exports branded PDF reports for sharing regression testing results with stakeholders.
Weaknesses
  • Zero GitHub stars suggests very early stage with unproven community traction.
  • Hosted version requires uploading sensitive prompt/output data to external servers.
Category
Target Audience

AI engineers building structured output pipelines

Similar To

LangSmith · Ragas · Arize Phoenix

Similar Projects

AI/ML●●Solid

Valohai LLM – Track and compare LLM evaluation results in one dashboard

Streams evals from a tiny Python client into a shared dashboard and lets you run parameter sweeps and compare up to six configurations with radar/bar charts and scorecards — exactly the sort of tooling that stops results getting lost in notebooks. Useful, pragmatic product for teams who repeatedly evaluate models, but it's competing with general observability/experiment trackers (W&B, Neptune) and will need strong integrations and metric flexibility to stand out.

Niche GemSolve My Problem
radicain
304mo ago