Page 1

Test, Metrics, Measures and KPIs

Discussion Document

Title: Test Process & Practice Metrics & Measures

Description:This document discusses the importance of useful test metrics, measures and KPIs and the way they can be captured and presented.

Author(s):Mark Crowther

Version:1.0

Metrics, Measures, KPIs and Reporting

An important deliverable from the test team is information that conveys the progress of test work and the test status of the project’s development items. This information can include Metrics, Measures and Key Process Indicators and will be communicated in various ways such as through graphs, charts, dashboards and written reports.

Data of this type is vital to understand the current testing effort, review how it was conducted previously and consider how to make improvements in the future. However, it’s not only the test manager that will find this information useful.

Various groups and individuals within the organisation require this information in support of their own activities and decision making. As such any information provided by the test manager should be useful to the the intended recipients and be published in a timely, clear and consistent manner.

This document outlines the key Metrics and Measures that inform the test manager about the testing effort, how these feed into departmental and organisational Key Process Indicators, where the information will be drawn from and how it will be published.

Defining Metrics, Measures and KPIs

The term used to describe the direct measurement of a particular attribute related to testing practice or the state of the software under test is a Testing Metric. The Testing Metrics that the test team produce are concerned solely with test actvities and product test status. While this informs the project as a whole other teams must still produce and publish their own metrics and measures.

- Metrics

Metrics define the individual bits of data that will be captured as the team perform their test tasks. A metric is a standard unit of measure, for example the severity of a bug, a type of test case, time spent on various pieces of analysis. Defining and capturing a range of metrics allows them to be measured.

- Measures

A measure is a way to quantify and analyse the collection of metrics that have been collected. A measure is described in a statement or question. Measurement allows comparison between various related metrics. Using a metric given above an example is comparison against bugs of varying severities. These operational metrics can then be used to inform strategic indicators.

Mark

Page 1

- KPIs

Key Performance Indicators (KPIs)are measurements defined to show the state of critical strategic goals or targets the organisation has set for itself. They are calculated by using sets or combinations of measures.

Quantitative and Qualitative Data

Metrics feed into Measures which both feed into KPIs. Metrics are primarily quantitative data and KPIs primarily qualitative.

  • Quantitative data:
  • Quantity, amount, proportion, size
  • Qualitative data:
  • Characteristic, aspect, contextual meaning

A combination of quantitative and qualitative data should be collected, analysed and communicated. This ensures that statistically reliable information is captured which is then ascribed meaning and contextual relevance. Quantitative data alone means nothing and this will become apparent when we discuss in the following section.

Levels of measurement

While it’s possible to break metrics and measurement down into specific types we can also think in terms of broader categories of qualitative or quantitative measurement.

At the highest level is First order measurements are performed without any complex processes or techniques being applied. An example might be scanning the size of a log file and deciding if its size means sufficient transactions had taken place. This experience based assessment would provide us a qualitative measure.

A Second Order measurement would follow if experience alerted us to the fact the log file was unusually large or small given what we believe the system was expected to have performed. In this case we’d open the file and review the data it contained. This review of what had actually been happening to vary the number of transactions we expected would provide us a quantitative measurement.

First Order and Second Order measurement can be applied in conjunction with the Testing Metrics discussed next. They provide a framework from which to apply our metrics regime on a day to day basis.

Typesof Testing Metrics

Despite the wide range of metrics that could be gathered most organisations will define just a small number and then re-use them over successive projects. This is essential to ensure consistency of measurement and reporting.

However, the test manager needs to be aware that testing metrics break down into two subsets, namely Test Process and Test State metrics.

Diagram 1: Testing metrics are broken down into two subsets

- Test Process Metrics

These measures provide information about preparation for testing, test execution and test progress. They don’t provide information about the test state of the product and are primarily of use in measuring progress of the Test Phase activities.

- Test State Metrics

These measures provide information about the test state of the product and are generated by test execution, the raising of bugs and code fixes or deferrment. Using these metrics we can guage the products test state and indicative level of quality, useful for product release decisions.

Capturing Metrics Data

Having defined which metrics are of interest the test manager should ensure they are clear on how the data will actually be captured and that it can be done so with consistency and accuracy.

For example, it’s of no use to state a metric of interest is ‘bug severity’ if there’s no way or nowhere to record this data. Equally those who are responsible for recording the data must know how to do so and knowing why is usually helpful.

The test manager must therefore ensure appropriate tools are in place to capture and store the data and then allow it to be accessed and analysed for to deliver agreed sets of measures.

Tools such as NMQA’s Vienna resolve many data capture and reporting issues by design. If necessary the test manager can always revert to Excel, etc. where tools are not available and the administrative burden is manageable.

Defining and Publishing Chosen Measures

With testing process and test state metrics agreed the test manager will need to define a suitable set of measures. It’s common to define a core set that the organisation agrees on broadly as being a set they would like to re-use across successive projects.

A project may be delivered using a number of development, testing and project management approaches. The test manager must therefore be mindful of which of the agreed measures to use and which to exclude depending on the needs of the project.

At the Initiation phase of the project the agreed set of measures should be published in some way. This standard set of measures can be stated in the Test Strategy or if no strategy is being created the chosen subset can be added to the Test Plan if one is being created.

- Validity

A common mistake when designing a measurement programme is failing to check the validity of the metrics chosen and the measures derived from them.

It’s essential that:

  • Metrics provide data that is proven to relate directly to the measured attribute
  • Measures provide the meaning intended and not a distorted in some way

The Purpose and Usefulness of Measurement

When selecting metrics and measures it must be clear why these are being selected and designed. The test manager must understand the value they hope to derive from the measurement activities or risk measuring for measuring sake.

It’s said that if something can’t be measured it can’t be improved and while this may be true in most cases it isn’t accurate at all times and it isn’t a reason to measure everything possible. If there is no defined purpose or intended use planned as a response to the measurements being taken then the act of doing is by definition useless and should not be done.

When defining the purpose for producing a particular measurement it’s useful to ask what question it is that wehope will be answered. For example, a metric of ‘bug severity’ will provide a data set with a range of bug severities.

Looking at the data set we might ask if the set is a high or low quantity, if the distribution is as expected, we may wonder if they are found in equal quantities across all areas of the software. In this way can decide what the question we’re trying to answer is and whether we even have a question relevant to this data set.

- Management by Measurement

Managing the organisation, teams and individuals using the outcome of Measures and KPIs is often seen as a fair and robust approach. Where measures are used to manage there is a great risk that an individual’s behaviour might change in unexpected and undesirable ways.

Some common examples of this are defining good performance by the number of bugs a tester finds or that a developer fixes each day. A risk here is that the tester will merrily log as many bugs as possible with no regard to quality or importance.

The developer in turn will happily fix as many trivial issues as possible. In both of these cases the individuals will have performed well in relation to the way they were being measured even is this wasn’t the intended outcome.

It should be apparent to the test manager that measures and KPIs aren’t the complete story about performance of individuals, teams or the organisation. Measures should be used in combination with other forms of information to arrive at informed views on performance.

Test Phase Measures

While there’s no template set of correct measures that can be defined it should be possible for the test manager to create a core set that are relevant to their organisation.

Relevant would be those measures that provide context and meaning to attributes of testing process and practice which the organisation sees as important to understand.

The importance attributed to the measures would hopefully motivate the organisation to then look at how to use the measures to inform continuous improvement actions..

- Examples Test Process Measures: Analysis & Planning

When thinking about performing analysis of functional requirements in order to assess the test planning effort we could start by asking:

  • How many test requirements are there for each of the various areas of functionality?

With these known we could then ask;

  • How many test cases would be needed to provide effective coverage for all of the test requirements?

Once we arrive at this question it would be natural to ask:

  • How long will it take to write each of the test cases we know are needed?

From these questions we can identify the attributes we want to capture data about and so we will define the metrics that are relevant to us. These include number of test requirements, number of functional areas, number of test cases and time to author test cases.

Simple measures that could be derived include those that make comparison about scope of complexity or effort such as:

-Number of Test Requirements V Functional Areas

-Number of Test Cases V Test Requirements

As the test manager needs to schedule work and report on progress against plan there would be a need to derive measures that put the effort identified in context of time to deliver on it. Measures for this could include;

-Total Time Spent on Test Case Authoring V Estimated Time

-Number of Test Cases Planned V Ready for Execution

- Examples Measures: Test Execution

When the test team complete planning and move onto execution there will be further measures in place to answer questions relevant to this phase. These might include:

-Number of Test Cases Executed V Test Cases Planned

-Number of Test Cases Passed, Failed and Blocked

-Total Number of Test Cases Passed v Functional Area

-Total Time Spent on Execution V Estimated Time

- Examples Test State Measures: Bugs and Fixes

As testing finds bugs and these are passed forward for resolution it’s important to understand what’s been found in context of what’s being tested. Measures relating to this can include:

-Total Number of Bugs Raised Per Period and Closed over time

-Total Number of Bugs Closed VTotal Number of Bugs Re-Opened

-Bug Distribution Totals by Severity per Period

-Bug Distribution Totals by Functional Areas by Severity per Period

Burndown Charts (Test Process)

When looking at how to communicate the amount of work to be delivered such as authoring of test cases and whether progress is sufficient to hit a desired delivery date Burndown Charts are often the simplest approach.

Total amount of work over an available amount of time provides a way to create a Burndown line. As work is completed it can be plotted on the chart. If progress is plotted below the Burndown line the team is ahead of schedule, if plotted above it they’re behind schedule.

Data Comparison Charts (Test State)

It’s possible to create data driven charts for each of the measures given in the previous section. However, by way of example what follows are two classic charts related to variants of the measure given in the previous section; “Total Number of Bugs Raised Per Period and Closed over time”.

As the total number of bugs raised each period lowers the products test state should be reaching a higher level of quality. Additionally as the Cumulative Opened Bugs levels off this indicates a decreasing bug-find rate. If this remains consistent it may be further testing is adding little value and the product can be released.

Taking the Cumulative Opened Bugs line from the first chart and comparing this against Resolved (Deferred) Bugs and Closed Bugs it’s easy to see what work is left to do. If the black band opens up then the test team are raising bugs far quicker than the developers can fix them.

If the dark grey band opens up bugs are being resolved (deferred) faster than the test team can close them. If the bands are converging then a release is in sight.

Reporting on KPIs

Whilst the test manager should be involved in defining sensible, achievable and measureable KPIs these are often handed-down from the organisation and simply require reporting on periodically.

As such the test manager can draw on the data from charts and the measures that make them to report on delivery against KPIs.

It was already mentioned the KPIs should be accompanied with a plan of action that is executed dependant on what the KPI report says. For the test manager this should be a natural part of running a testing function as there is always focus on continuous improvement of process, practice and quality.

Dashboards and Reporting

As each measure is recorded over the duration of the project the test manager will collect sets of data for these measures. Tables of data are useful but more commonly they are converted to charts and graphs of various forms.

Collections of charts and often summary data are in turn added to an overall display referred to as a dashboard. The dashboard allows the project team access to the information derived from measures in an accessible form.

Refer to the “Test Metrics Dashboard Workbook.xls” for further information.

References:

Kaner, Cem and Walter P. Bond

"Software Engineering Metrics: What Do They Measure and How Do We Know?."

10th International Software Metrics Symposium. Chicago, IL, 2004.

Weinberg, Gerald M.

Quality Software Management, Volume 2: First-Order Measurement.

Dorset House Publishing, New York, 1993.

Mark