Black-box API bug detection across 7 AI systems
Execution-based scoring with live APIs beats LLM-graded benchmarks, but they evaluated themselves.
Simple to use cross-platform BDD driver for black box testing
Tests live in README as plain English; clever partial parsing eliminates Gherkin boilerplate overhead.
CLI tool developers, documentation-first teams, technical writers
Gherkin/Cucumber · BATS (Bash Automated Testing System)
Why bbt? - Zero duplication: Test scenarios live in your READMEs, user guides, or any Markdown file. - Natural language: Steps like When I run gcc --version → Then the output contains 14.2.0 just work. - No learning curve: Partial parsing means you write almost-normal English; bbt extracts the test logic.
Execution-based scoring with live APIs beats LLM-graded benchmarks, but they evaluated themselves.
Per-span confidence scores let you review uncertain OCR before trusting 200k-page runs.
Auto-generates API tests from OpenAPI specs when Schemathesis and Postman already exist.
Single source of truth beats drifting Postman collections, but early alpha.
Replaces flaky LLM judges with strict Python equality checks for tool arguments.
Agent testing platform, but screenshot only shows login page—no actual product demo or proof.