I generated a "stress test" of 200 rare defects from 7 real photos

Name: I generated a "stress test" of 200 rare defects from 7 real photos
Availability: InStock
Author: jmalevez

by jmalevez·Feb 12, 2026·6 points·4 comments

Visit Project View on HN

AI Analysis

●●SolidNiche GemSolve My Problem

Synthetic rare-defect dataset solves real validation gap, but relies on closed Silera tool.

Strengths

•Addresses genuine pain point: most models fail on unseen rare failure modes with insufficient test coverage.
•Procedural generation from 7 real samples to 200 labeled variants is clever data multiplication technique.
•Pixel-perfect COCO/YOLO labels via rendering pipeline eliminates manual annotation overhead.

Weaknesses

•Generative process is proprietary (Silera Studio); users can't reproduce or adapt to other defects.
•Limited to one domain (broken insulators); unclear if approach generalizes or if more datasets will follow.
•No comparative benchmarks showing actual recall improvements vs. standard test sets from other inspection tools.

Post Description

Hello HN,

I work on vision systems for structural inspection. A common pain point is usually that while we have a lot of "healthy" images, we often lack a reliable "Golden Set" of rare failures (like shattered porcelain) to validate our models before deployment.

You can't trust your model's recall if your test set only has 5 examples of the failure mode for example.

So to fix this, I built a pipeline to generate datasets. In this example, I took 7 real-world defect samples, extracted their topology/texture, and procedurally generated 200 hard-to-detect variations across different lighting and backgrounds.

I’m releasing this batch of broken insulators (CC0) specifically to help teams benchmark their model's recall on rare classes:

https://www.silera.ai/blog/free-200-broken-insulators-datase...

- Input: 7 real samples.

- Output: 200 fully labeled evaluation images (COCO/YOLO).

- Use Case: Validation / Test Set (not full training).

How do you guys currently validate recall for "1 in 10,000" edge cases?

Jérôme