StageWright – A performance-focused Playwright reporter with AI

Name: StageWright – A performance-focused Playwright reporter with AI
Availability: InStock
Author: qagaryparker

by qagaryparker·Feb 26, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidSlickSolve My Problem

Flakiness grades + 2-sigma anomaly detection beats default retry logic; but Playwright Cloud exists.

Strengths

•Flakiness grading across runs (A+–F) is genuinely useful signal Playwright native reporters skip.
•Flamechart step timeline with Nav/Action/API categorization makes bottleneck hunting concrete.
•2-sigma anomaly detection catches real regressions; trend analytics over raw pass/fail counts is mature thinking.

Weaknesses

•Playwright Cloud, Currents, and TestProject all do test analytics; StageWright is polished but not differentiated.
•AI analysis is black-box LLM upsell (5K analyses/month paywall); claims 'actionable' but reviewable examples missing.

Post Description

Hi HN,

I’m the creator of StageWright (and the open-source playwright-smart-reporter).

I’ve been frustrated by the "black box" nature of E2E test failures. Standard reporters tell you that a test failed, but they don't help you understand why it’s failing across 50 different runs or whether its execution time is trending toward a regression.

I built StageWright to treat test results as a performance and stability dataset.

Key Technical Features:

Historical Flakiness Detection: Unlike Playwright's default "retry" logic, we track failures across runs. A test only gets a high "Stability Grade" if it consistently passes over time.

Flamechart Step Timelines: We added a color-coded flamechart for test steps (v1.0.8). It categorizes steps into Navigation, Action, and API, making it easy to see if a 10s test is hanging on a locator or a slow backend response.

2-Sigma Anomaly Detection: The trends view uses moving averages and 2-sigma outlier detection to flag performance regressions that might otherwise go unnoticed.

AI-Powered Failure Clustering: We batch failures and use Claude/GPT-4 to cluster similar errors. Instead of 20 separate failures, you see "1 cluster: TimeoutError on payment-submit-btn."

Virtual Scroll Performance: We optimized the UI with virtual scrolling to handle suites with 500+ tests without the browser freezing—a common issue with the default HTML reporter.

Native Trace & Network Logs: Traces and network waterfalls are embedded directly in the report. No downloading .zip files from CI; they open instantly in an inline viewer.

The Architecture: StageWright is built to be "Playwright-native." It hooks into the reporter API and can run locally (outputting a standalone HTML/JSON history) or via our new Starter/Pro cloud tiers. The Pro tier provides a centralized dashboard for teams, long-term history retention, and cross-project analytics.

I’m currently supporting both Node.js and Python (pytest-playwright) environments.

I’d love to hear what the community thinks—especially regarding how you handle "test debt" in large CI pipelines. I'm here for any questions!