Software Testing Metrics Every QA lead Should Track

Three things to know before reading:

A defect escape rate above 20% means one in five bugs is reaching users. The IBM Systems Sciences Institute reported that fixing a production bug costs up to 100x more than catching it during design.
Google's engineering team found that 16% of all tests at Google are flaky, and 84% of CI failures come from flaky tests, not real bugs. If you aren't measuring flaky test rate, your CI feedback is unreliable.
The Capgemini World Quality Report found that up to 50% of automation budgets get consumed by script maintenance. Test maintenance ratio tells you whether your QA team is expanding coverage or just keeping lights on.

Software testing metrics are only useful if they change a decision. Most QA dashboards are packed with numbers that nobody acts on. The metrics that matter fall into four categories: defect detection, coverage, suite health, and speed. Getting even two of these right gives a QA lead enough data to make release calls, allocate engineer time, and show stakeholders where quality actually stands.

The problem is knowing which ones to start with. Teams that track automated regression testing results without connecting them to defect escape rate or flaky test rate are measuring activity, not outcomes. The eight metrics below are organized by question they answer.

Software testing metrics that tell you if bugs are escaping

These two metrics answer most basic question in QA: are we finding bugs before users do?

Defect escape rate

Defect escape rate is percentage of bugs that reach production out of all bugs found (in testing + in production). It's single most direct measure of whether your testing process is working.

Formula: (Defects found in production / Total defects found) x 100

A team that finds 80 bugs during testing and 20 in production has an escape rate of 20%. That means one in five bugs is getting past QA.

According to IBM Systems Sciences Institute, fixing a defect found after release costs up to 100x more than fixing it during design. The exact multiplier is debated, but direction is consistent: later detection means higher cost in engineering time, incident response, and customer impact.

Benchmark: Below 10% is strong. Between 10-20% is average for teams shipping weekly. Above 20% means your testing coverage or test quality has gaps that need attention.

What to do when it's high:

Check whether your tests cover flows where production bugs are actually appearing. Often gap isn't "not enough tests" but "tests in wrong places."
Review your smoke testing in CI/CD pipeline. A weak smoke suite lets obvious regressions through.
Look at severity distribution. If escaping bugs are mostly cosmetic, that's different from critical crashes reaching users.

Defect removal efficiency (DRE)

DRE is inverse view. Instead of asking "what percentage escaped," it asks "what percentage did we catch."

Formula: (Defects found before release / Total defects found) x 100

Using same example: 80 found in testing out of 100 total gives a DRE of 80%.

Benchmark: 85% or higher is target for teams shipping mobile or web apps on a regular release cycle. The CISQ 2022 report found that poor software quality, driven largely by defects that escape QA, cost US economy $2.41 trillion. High DRE is first line of defense against that.

Software testing metrics that measure coverage

Coverage metrics answer: how much of product do your tests actually touch?

Test case coverage

This is percentage of defined requirements or user stories that have at least one test case mapped to them.

Formula: (Requirements with test coverage / Total requirements) x 100

This metric matters because untested requirements are unknown risk. If your team has 200 user stories and only 120 have tests, those 80 gaps are where production bugs will show up.

Benchmark: 100% for anything marked as critical. 80% or higher across board. Below 60% means large parts of product are shipping without any automated validation.

Automation coverage

Automation coverage is percentage of your total test cases that are automated versus manual.

Formula: (Automated test cases / Total test cases) x 100

This isn't about automating everything. Some tests (exploratory testing, usability testing, edge cases that change every sprint) should stay manual. But your regression suite, your smoke tests, and your core transaction flows should be automated.

Benchmark: For regression suites, 70-80% automation coverage is a reasonable target. For mobile apps specifically, this number tends to be lower because of test maintenance overhead that selector-based mobile frameworks create.

Metrics that show if test suite is healthy

These are metrics most teams ignore, and ones that cost most time. A test suite with high coverage but high flakiness is worse than a smaller suite that's stable.

Flaky test rate

Flaky test rate is percentage of tests that produce inconsistent results (pass sometimes, fail sometimes) on same code.

Formula: (Tests with inconsistent results over N runs / Total tests) x 100

The Google Testing Blog (John Micco) reported that 16% of all tests at Google exhibit some level of flakiness, and 84% of pass-to-fail transitions in their CI system involved a flaky test. Even at Google, with world-class engineering, flakiness is a persistent problem.

Benchmark: Below 3% is healthy. Between 3-8% is manageable with quarantine practices. Above 8% means your CI feedback is unreliable and developers will stop trusting test results.

What to do when it's high:

Identify top 10 flakiest tests by failure count. Fix or quarantine them first.
Check for timing dependencies: tests that fail because an element hasn't loaded yet, or an API call is slow.
Isolate test state. If tests share data or depend on execution order, that's a flake source.

Test maintenance ratio

This is percentage of total QA time spent maintaining existing tests versus writing new ones or doing exploratory testing.

Formula: (Hours spent on test maintenance / Total QA hours) x 100

The Capgemini World Quality Report has found that up to 50% of automation budgets get consumed by script maintenance. When this number is high, your team looks busy but isn't expanding coverage or catching new bugs.

Benchmark: Below 30% is healthy. Between 30-50% means your suite is creating drag. Above 50% means maintenance is consuming more time than suite is worth.

What to do when it's high:

Audit which tests break most often. The top 10% of high-maintenance tests are usually responsible for 50%+ of repair work.
Consider whether your framework is problem. Selector-based tools break on every UI change. Vision-based tools (like Drizz) don't.
Delete tests that break repeatedly but have never caught a real bug.

Metrics that measure speed

These tell you whether your testing process is getting faster or slower over time.

Mean time to detect (MTTD)

MTTD is average time between when a bug is introduced and when your tests catch it.

Formula: Average of (Time bug detected - Time bug introduced) across all bugs in a period

Short MTTD means your CI pipeline and test suite are catching regressions quickly. Long MTTD means bugs sit in codebase for days or weeks before anyone notices, which makes them harder and more expensive to fix.

Benchmark: For teams running CI on every commit, MTTD should be under 24 hours for critical flows. For teams running nightly builds, under 48 hours.

Test execution time

This is total time it takes to run your full test suite (or your CI-blocking subset).

Formula: Time from first test start to last test completion in a pipeline run

This matters because slow suites delay releases. If your suite takes 90 minutes to run and your team makes 10 commits a day, developers wait in a queue. That's lost engineering time.

Benchmark: Under 15 minutes for a CI-blocking smoke suite. Under 60 minutes for a full regression run. If your regression takes over 2 hours, you need parallelization or test pruning.

All eight metrics at a glance

Metric	Formula	Benchmark
Defect escape rate	(Prod defects / Total defects) x 100	Below 10%
Defect removal efficiency	(Pre-release defects / Total defects) x 100	Above 85%
Test case coverage	(Covered requirements / Total requirements) x 100	Above 80%
Automation coverage	(Automated tests / Total tests) x 100	70-80% for regression
Flaky test rate	(Flaky tests / Total tests) x 100	Below 3%
Test maintenance ratio	(Maintenance hours / Total QA hours) x 100	Below 30%
Mean time to detect	Avg (Detection time - Introduction time)	Under 24 hours
Test execution time	Start to finish of pipeline run	Under 15 min (smoke)

How Drizz reporting feeds these software testing metrics

Drizz generates structured reports after every test plan run. Each run produces pass/fail results per test case, step-by-step logs with timestamps, screenshots before and after each step, video recordings, and error categorization. This data feeds directly into metrics a QA lead needs to track.

For flaky test rate, Drizz auto-retries failed tests once to separate real bugs from flaky behavior. If a test passes on retry, it's flagged, giving you a built-in flake signal without writing your own retry logic.

For test execution time, Drizz Cloud runs tests on real Android and iOS devices in parallel across your test plan. The execution time per run is logged and available in report, so you can track whether your suite is getting slower over time.

For test maintenance ratio, Drizz's Vision AI removes biggest maintenance driver: broken selectors. Tests are written in plain English and find elements visually. A typical Drizz test looks like this:

Tap on "Add to Cart"
Scroll down until "Proceed to Pay"
Validate "Order Confirmed" is visible

‍

When UI changes but visible elements stay same, test still passes. Teams using selector-based tools spend 30-50% of QA time repairing broken locators. Drizz eliminates that category of maintenance entirely.

For defect escape rate, every Drizz test run is tied to a specific app version and device configuration. When a bug is caught in a test plan run, you can trace it to build, step, and screen state. That makes it straightforward to measure how many defects your suite catches per release versus how many reach production.

FAQ

What are most useful software testing metrics?

Defect escape rate and flaky test rate. One measures testing effectiveness, other measures suite reliability.

How many metrics should a QA team track?

Start with two or three. Add more only when you have a specific decision or problem they'd help with.

What's a good defect escape rate?

Below 10% is strong for teams shipping weekly. Above 20% signals gaps in test coverage or test quality.

How do you calculate flaky test rate?

Divide number of tests with inconsistent pass/fail results by total number of tests over a set period.

Why is test maintenance ratio important?

It tells you whether your QA team is expanding coverage or just keeping existing tests alive.

Can metrics replace exploratory testing?

No. Metrics measure what automated tests catch. Exploratory testing finds bugs that no one thought to automate.

‍

About the Author:

Asad Abrar

Co-founder & CEO, Drizz

Ex-Coinbase PM and IIT Kharagpur grad killing flaky mobile tests by day, and obsessing over F1 lap timings by night.