Back to browse
Custom datasets for testing AI agents

Custom datasets for testing AI agents

by rishavmitra·Mar 18, 2026·3 points·2 comments

AI Analysis

MidShip It

CSV-based agent testing works but LangSmith already owns this evaluation workflow.

Strengths
  • Upload CSVs with expected outputs to validate agent behavior quickly.
  • Generates new test cases from existing data to cover edge cases.
Weaknesses
  • Landing page lacks pricing or feature differentiation details.
  • Direct competition with established LLMOps platforms like LangSmith.
Target Audience

AI engineers, ML teams

Similar To

LangSmith · Braintrust · Arize Phoenix

Post Description

We just shipped a new feature in Zalor: custom datasets for agent testing.

You can now: • Upload CSVs with real inputs and expected outputs • Run your agent against those datasets • Generate new test cases from existing ones to cover edge cases

This makes it easier to test scenarios you were previously testing manually and catch regressions when your agent changes.

Demo below. Would love feedback from anyone building agents.

Similar Projects

AI/ML●●Solid

Open Agent Spec. Treat AI agents like typed functions not prompt chains

Schema validation catches LLM output mismatches before they break downstream systems.

Big BrainNiche Gem
andrewvector
212mo ago
Infrastructure●●●Banger

IP ranges for 22 cloud providers in 12 formats, updated daily

One-click firewall rules for 22 providers—no more hunting AWS/Azure/GCP feeds separately.

Solve My ProblemSlick
rezmoss
204mo ago
Developer Tools●●Solid

Npx Claude-traces, visualizer for Claude Code/Agent SDK traces

Runs with one npx command and immediately surfaces a helpful timeline view with token counts, tool I/O panes and subagent nesting — exactly the sort of visibility you want when an agent goes off the rails. Cleverly reads the local ~/.claude/projects traces so setup is trivial, but its usefulness is limited by being Claude-only and local; add search/aggregation or a team-sharing mode and this jumps up a tier.

Niche GemSolve My ProblemSlick
hahawhatsgood
204mo ago