Custom datasets for testing AI agents

Name: Custom datasets for testing AI agents
Availability: InStock
Author: rishavmitra

by rishavmitra·Mar 18, 2026·3 points·2 comments

Visit Project View on HN

AI Analysis

●MidShip It

CSV-based agent testing works but LangSmith already owns this evaluation workflow.

Strengths

•Upload CSVs with expected outputs to validate agent behavior quickly.
•Generates new test cases from existing data to cover edge cases.

Weaknesses

•Landing page lacks pricing or feature differentiation details.
•Direct competition with established LLMOps platforms like LangSmith.

Post Description

We just shipped a new feature in Zalor: custom datasets for agent testing.

You can now: • Upload CSVs with real inputs and expected outputs • Run your agent against those datasets • Generate new test cases from existing ones to cover edge cases

This makes it easier to test scenarios you were previously testing manually and catch regressions when your agent changes.

Demo below. Would love feedback from anyone building agents.

Similar Projects

AI/ML●Mid

Apery – Synthetic Data Generator for AI Agents

Yet another synthetic data tool when Faker and Mockaroo already exist.

Ship It

compuficial

2122d ago

Developer Tools●●●Banger

Focused input cuts LLM output tokens by 63% bench on CC with FastAPI

Dependency-graph filtering cuts output tokens 63%, not just input—Claude stops narrating when focused.

Solve My ProblemWizardryShip It

nicola_alessi

203mo ago

AI/ML●●Solid

Open Agent Spec. Treat AI agents like typed functions not prompt chains

Schema validation catches LLM output mismatches before they break downstream systems.

Big BrainNiche Gem

andrewvector

212mo ago

Infrastructure●●●Banger

IP ranges for 22 cloud providers in 12 formats, updated daily

One-click firewall rules for 22 providers—no more hunting AWS/Azure/GCP feeds separately.

Solve My ProblemSlick

rezmoss

204mo ago

Developer Tools●●Solid

Npx Claude-traces, visualizer for Claude Code/Agent SDK traces

Runs with one npx command and immediately surfaces a helpful timeline view with token counts, tool I/O panes and subagent nesting — exactly the sort of visibility you want when an agent goes off the rails. Cleverly reads the local ~/.claude/projects traces so setup is trivial, but its usefulness is limited by being Claude-only and local; add search/aggregation or a team-sharing mode and this jumps up a tier.

Niche GemSolve My ProblemSlick

hahawhatsgood

204mo ago

AI/ML●●Solid

Vesper – MCP-native tool that automates dataset prep for AI agents

MCP-native tool lets AI agents fetch and clean datasets without human intervention.

Niche GemSolve My ProblemShip It

sultanchek

202mo ago