Common Testing Pitfalls –
Ways to Prevent and Mitigate Them:
Descriptions, Symptoms, Consequences, Causes, and Recommendations

Donald G. Firesmith

26July 2013

Acquisition Support Program

Unlimited distribution subject to the copyright.

This document was prepared for the

SEI Administrative Agent
ESC/XPK
5 Eglin Street
Hanscom AFB, MA 01731-2100

The ideas and findings in this report should not be construed as an official DoD position. It is published in the interest of scientific and technical information exchange.

This work is sponsored by the U.S. Department of Defense. The Software Engineering Institute is a federally funded research and development center sponsored by the U.S. Department of Defense.

Copyright 2013 Carnegie Mellon University.

NO WARRANTY

THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

Use of any trademarks in this report is not intended in any way to infringe on the rights of the trademark holder.

Internal use. Permission to reproduce this document and to prepare derivative works from this document for internal use is granted, provided the copyright and “No Warranty” statements are included with all reproductions and derivative works.

External use. This document may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission.Permission is required for any other external and/or commercial use.Requests for permission should be directed to the Software Engineering Institute at .

This work was created in the performance of Federal Government Contract Number FA8721-05-C-0003with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The Government of the United States has a royalty-free government-purpose license to use, duplicate, or disclose the work, in whole or in part and in any manner, and to have or permit others to do so, for government purposes pursuant to the copyright license under the clause at 252.227-7013.

For information about SEI publications, please visit the library on the SEI website (

Common Testing Pitfalls: Ways to Prevent and Mitigate Them26 July 2013

Descriptions, Symptoms, Consequences, Causes, and Recommendations

Table of Contents

1Introduction

1.1What is Testing?

1.2What is a Defect?

1.3Why is Testing Critical?

1.4The Limitations of Testing

1.5What is a Testing Pitfall?

1.6Categorizing Pitfalls

1.7Scope

1.8Usage

1.9Pitfall Specifications

2Testing Pitfalls Overview

2.1General Testing Pitfalls

2.1.1Test Planning and Scheduling Pitfalls

2.1.2Stakeholder Involvement and Commitment Pitfalls

2.1.3Management-related Testing Pitfalls

2.1.4Staffing Pitfalls

2.1.5Test Process Pitfalls

2.1.6Test Tools and Environments Pitfalls

2.1.7Test Communication Pitfalls

2.1.8Requirements-related Testing Pitfalls

2.2Test-Type-Specific Pitfalls

2.2.1Unit Testing Pitfalls

2.2.2Integration Testing Pitfalls

2.2.3Specialty Engineering Testing Pitfalls

2.2.4System Testing Pitfalls

2.2.5System of Systems (SoS) Testing Pitfalls

2.2.6Regression Testing Pitfalls

3Testing Pitfalls Detailed Information

3.1General Testing Pitfalls

3.1.1Test Planning and Scheduling Pitfalls

3.1.1.1No Separate Test Plan (GEN-TPS-1)

3.1.1.2Incomplete Test Planning (GEN-TPS-2)

3.1.1.3Test Plans Ignored (GEN-TPS-3)

3.1.1.4Test Case Documents as Test Plans (GEN-TPS-4)

3.1.1.5Inadequate Test Schedule (GEN-TPS-5)

3.1.1.6Testing is Postponed (GEN-TPS-6)

3.1.2Stakeholder Involvement and Commitment Pitfalls

3.1.2.1Wrong Testing Mindset (GEN-SIC-1)

3.1.2.2Unrealistic Testing Expectations (GEN-SIC-2)

3.1.2.3Lack of Stakeholder Commitment (GEN-SIC-3)

3.1.3Management-related Testing Pitfalls

3.1.3.1Inadequate Test Resources (GEN-MGMT-1)

3.1.3.2Inappropriate External Pressures (GEN-MGMT-2)

3.1.3.3Inadequate Test-related Risk Management (GEN-MGMT-3)

3.1.3.4Inadequate Test Metrics (GEN-MGMT-4)

3.1.3.5Inconvenient Test Results Ignored (GEN-MGMT-5)

3.1.3.6Test Lessons Learned Ignored (GEN-MGMT-6)

3.1.4Staffing Pitfalls

3.1.4.1Lack of Independence (GEN-STF-1)

3.1.4.2Unclear Testing Responsibilities (GEN-STF-2)

3.1.4.3Inadequate Testing Expertise (GEN-STF-3)

3.1.4.4Developers Responsible for All Testing (GEN-STF-4)

3.1.4.5Testers Responsible for All Testing (GEN-STF-5)

3.1.5Test Process Pitfalls

3.1.5.1Testing and Engineering Processes Not Integrated (GEN-PRO-1)

3.1.5.2One-Size-Fits-All Testing (GEN-PRO-2)

3.1.5.3Inadequate Test Prioritization (GEN-PRO-3)

3.1.5.4Functionality Testing Overemphasized (GEN-PRO-4)

3.1.5.5Black-box System Testing Overemphasized (GEN-PRO-5)

3.1.5.6Black-Box System Testing Underemphasized (GEN-PRO-6)

3.1.5.7Too Immature for Testing (GEN-PRO-7)

3.1.5.8Inadequate Evaluations of Test Assets (GEN-PRO-8)

3.1.5.9Inadequate Maintenance of Test Assets (GEN-PRO-9)

3.1.5.10Testing as a Phase (GEN-PRO-10)

3.1.5.11Testers Not Involved Early (GEN-PRO-11)

3.1.5.12Incomplete Testing (GEN-PRO-12)

3.1.5.13No Operational Testing (GEN-PRO-13)

3.1.6Test Tools and Environments Pitfalls

3.1.6.1Over-reliance on Manual Testing (GEN-TTE-1)

3.1.6.2Over-reliance on Testing Tools (GEN-TTE-2)

3.1.6.3Too Many Target Platforms (GEN-TTE-3)

3.1.6.4Target Platform Difficult to Access (GEN-TTE-4)

3.1.6.5Insufficient Test Environments (GEN-TTE-5)

3.1.6.6Poor Fidelity of Test Environments (GEN-TTE-6)

3.1.6.7Inadequate Test Environment Quality (GEN-TTE-7)

3.1.6.8Test Assets not Delivered (GEN-TTE-8)

3.1.6.9Inadequate Test Configuration Management (GEN-TTE-9)

3.1.7Test Communication Pitfalls

3.1.7.1Inadequate Architecture/Design Documentation (GEN-COM-1)

3.1.7.2Inadequate Defect Reports (GEN-COM-2)

3.1.7.3Inadequate Test Documentation (GEN-COM-3)

3.1.7.4Source Documents Not Maintained (GEN-COM-4)

3.1.7.5Inadequate Communication Concerning Testing (GEN-COM-5)

3.1.8Requirements-related Testing Pitfalls

3.1.8.1Ambiguous Requirements (GEN-REQ-1)

3.1.8.2Obsolete Requirements (GEN-REQ-2)

3.1.8.3Missing Requirements (GEN-REQ-3)

3.1.8.4Incomplete Requirements (GEN-REQ-4)

3.1.8.5Incorrect Requirements (GEN-REQ-5)

3.1.8.6Requirements Churn (GEN-REQ-6)

3.1.8.7Improperly Derived Requirements (GEN-REQ-7)

3.1.8.8Verification Methods Not Specified (GEN-REQ-8)

3.1.8.9Lack of Requirements Trace (GEN-REQ-9)

3.2Test Type Specific Pitfalls

3.2.1Unit Testing Pitfalls

3.2.1.1Testing does not Drive Design and Implementation (TTS-UNT-1)

3.2.1.2Conflict of Interest (TTS-UNT-2)

3.2.1.3Unit Testing Considered Unimportant (TTS-UNT-3)

3.2.2Integration Testing Pitfalls

3.2.2.1Integration Decreases Testability Ignored (TTS-INT-1)

3.2.2.2Inadequate Self-Monitoring (TTS-INT-2)

3.2.2.3Unavailable Components (TTS-INT-3)

3.2.2.4System Testing as Integration Testing (TTS-INT-4)

3.2.3Specialty Engineering Testing Pitfalls

3.2.3.1Inadequate Capacity Testing (TTS-SPC-1)

3.2.3.2Inadequate Concurrency Testing (TTS-SPC-2)

3.2.3.3Inadequate Internationalization Testing (TTS-SPC-3)

3.2.3.4Inadequate Performance Testing (TTS-SPC-4)

3.2.3.5Inadequate Reliability Testing (TTS-SPC-5)

3.2.3.6Inadequate Robustness Testing (TTS-SPC-6)

3.2.3.7Inadequate Safety Testing (TTS-SPC-7)

3.2.3.8Inadequate Security Testing (TTS-SPC-8)

3.2.3.9Inadequate Usability Testing (TTS-SPC-9)

3.2.4System Testing Pitfalls

3.2.4.1Lack of Test Hooks (TTS-SYS-1)

3.2.4.2Inadequate Testing of Code Coverage (TTS-SYS-2)

3.2.4.3Inadequate End-To-End Testing (TTS-SYS-3)

3.2.5System of Systems (SoS) Testing Pitfalls

3.2.5.1Inadequate SoS Test Planning (TTS-SoS-1)

3.2.5.2Unclear SoS Testing Responsibilities (TTS-SoS-2)

3.2.5.3Inadequate Resources for SoS Testing (TTS-SoS-3)

3.2.5.4SoS Testing not Properly Scheduled (TTS-SoS-4)

3.2.5.5Inadequate SoS Requirements (TTS-SoS-5)

3.2.5.6Inadequate Support from Individual System Projects (TTS-SoS-6)

3.2.5.7Inadequate Defect Tracking Across Projects (TTS-SoS-7)

3.2.5.8Finger-Pointing (TTS-SoS-8)

3.2.6Regression Testing Pitfalls

3.2.6.1Insufficient Regression Test Automation (TTS-REG-1)

3.2.6.2Regression Testing not Performed (TTS-REG-2)

3.2.6.3Inadequate Scope of Regression Testing (TTS-REG-3)

3.2.6.4Only Low-Level Regression Tests (TTS-REG-4)

3.2.6.5Test Resources Not Delivered For Maintenance (TTS-REG-5)

3.2.6.6Only Functional Regression Testing (TTS-REG-6)

4Conclusion

4.1Testing Pitfalls

4.2Common Consequences

4.3Common Solutions

5Potential Future Work

6Acknowledgements

Appendix A: Glossary

Appendix B: Acronyms and Abbreviations

Appendix C: Notes

Appendix D: References

Appendix E: Checklist

Abstract

This special report documents the different types of pitfalls that commonly occur when testing software-reliant systems. These 90pitfalls are organized into 14 categories. Each of these pitfalls is given a title, description, a set of characteristicsymptoms by which it can be recognized, a set of potential negative consequences that can result if the pitfall occurs, a set of potential causes for the pitfall, and recommendations for avoiding the pitfall or solving it should it occur.

1Introduction

1.1What is Testing?

Testing is the activity of executing a system/subsystem/componentunder specific preconditions (e.g., pretest mode, states, stored data, and external-conditions) with specific inputs so thatits actual behavior (outputs and postconditions) can be compared with its expected/required behavior.

Testing differs from other verification and validation methods (e.g., analysis, demonstration, and simulation) in that it is a dynamic as opposed to static analysis method that involves the actual execution of the thing being tested.

Testing has the following goals:

•Primary goals:

enable the system/software under test (SUT) to be improved

—by “breaking” the SUT (i.e., by causing the faults and failures)

—to expose its defects

—so that they can be fixed

•Secondary goals:

determine the:

—quality of the SUT

—SUT’s fitness for purpose

—SUT’s readiness for shipping, deployment, and/or operation

1.2What is a Defect?

A system defect(a.k.a., bug) is a flaw or weakness in the system or one of its components that could cause it to behave in an unintended unwanted mannerand/or exhibit an unintended unwanted property. Software defects are related to but different from:

•Errors – human mistakes that cause the defect (e.g., making a programming mistake or inputting incorrect data)

•Faults – incorrect conditions that are system-internal and not directly visible from outside of the system’s boundary (e.g., the system stores incorrect data or is in an incorrect mode or state)

•Failures – events or conditions when the system visibly behaves incorrectly and/or has incorrect properties (i.e., one or more of its behaviors or properties are differentfrom what its stakeholders can reasonably expect).

Common examples of defects include flaws or weaknesses in the system that:

•cause it to violate specified (or unspecified) requirements

•cause it to be inconsistent with its the architecture or design

•result from incorrect or inappropriate architecture, design, and/or implementation decisions

•violate coding standards

•safety or security vulnerabilities (e.g., the use of inherently unsafe language features or lack of verification of input data)

1.3Why is Testing Critical?

ANational Institute of Standards & Technology (NIST) report[NIST 2002] states that inadequate testing methods and tools annually cost the U.S. economy between $22.2 billion and $59.5 billion, with roughly half of these costs borne by software developers in the form of extra testing and half by software users in the form of failure avoidance and mitigation efforts. The same study notes that between 25 percent and 90 percent of software development budgets are often spent on testing.

Testing is currently the most important of the standard verification and validation methods used during system/software development and maintenance. This is not because testing is necessarily the most effective and efficient way to verify that the system/software behaves as it should; it is not.(See Table 1 below). Rather, it is because far more effort, funding, and schedule are expended on testing than all other types of verification put together.

According to Capers Jones, most forms of testing only find about 35% of the code defects. [Jones 2013]Similarly, individual programmers find less than have the defects in their own software average.

For example, Capers Jones has analyzed data from projects that completed during 2013 and produced the following results regarding defect identification effectiveness (average percent of defects found as a function of verification method and defect type):

Verification Method / Defect Type (Location) / Total Effectiveness
Requirements / Architecture / Design / Code / Documentation
Rqmts Inspection / 87% / 5% / 10 % / 5 % / 8.5% / 25.6%
Architecture Inspection / 10% / 85% / 10 % / 2.5% / 12 % / 14.9%
Design Inspection / 14% / 10% / 87 % / 7 % / 16 % / 37.3%
Code Inspection / 15% / 12.5% / 20% / 85 % / 10 % / 70.1%
Static Analysis / 2% / 2% / 7% / 87 % / 3 % / 33.2%
IV & V / 12% / 10 % / 23 % / 7 % / 18 % / 16.5%
SQA review / 17% / 10% / 17% / 12% / 12.4% / 28.1%
Total / 95.2% / 92.7% / 96.1% / 99.1% / 58.8% / 95.0%

Table 1: Average Percent of Defects found as a Function of Static Verification Method and Defect Type

Thus, the use of requirements inspections identifies 87% of requirements defects and 25.6% of all defects in the software and its documentation. Similarly, static analysis of the code identifies 87% of the code defects and 33.2% of all defects. Finally, a project that uses all of these static verification methods will identify 95% of all defects.

As can be seen in Table 2, static verification methods are cumulatively more effective at identifying defects except surprisingly documentation defects:

Verification Method / Defect Type (Location) / Total Effectiveness
Requirements / Architecture / Design / Code / Documentation
Static / 95.2% / 92.7% / 96.1% / 99.1% / 58.8% / 95.0%
Testing / 72.3% / 74.0% / 87.6% / 93.4% / 95. 5% / 85.7%
Total / 98.11% / 98.68% / 99.52% / 99.94% / 98.13% / 99.27%

Table 2: Cumulative Defect Effectiveness for Static Verification Methods, Testing, and their Combination

1.4The Limitations of Testing

In spite of its critical nature, testing has a number ofpitfalls that make it far less effective and efficient than it should be. Testing is relatively ineffective in the sense that a significant number of residual defects remain in the completed system when it is placed into operation. Testing is also relatively inefficient in terms of the large amount of effort, funding, and schedule that is currently being spent to find defects.

According to Capers Jones, most types of testing only find about 35% of the software defects. [Jones 2013] This is consistent with the following, more detailed analysis of defect detection rates as a function of test type and test capabilities found in Table 3. [McConnell 2004]

Defect Detection Rates
Test Type / Lowest / Mode / Highest
Unit Test / 15% / 30% / 50%
Component Test / 20% / 30% / 35%
Integration Test / 25% / 35% / 40%
System Test / 25% / 40% / 55%
Regression Test / 15% / 25% / 30%
Low-volume Beta Test / 25% / 35% / 40%
High-volume Beta Test / 60% / 75% / 85%

Table 3: Defect Detection Rate [McConnell 2004]

Static Verification / Project Defect Detection Rate
Worst / Average / Best
Desk Checking / 23% / 25% / 27%
Static Analysis / 0% / 55% / 55%
Inspection / 0% / 0% / 93%
Static Subtotal / 19% / 64% / 98%
Testing / Project Defect Detection Rate
Worst / Average / Best
Unit Test / 28% / 30% / 32%
Function Test / 31% / 33% / 35%
Regression Test / 10% / 12% / 14%
Component Test / 28% / 30% / 32%
Performance Test / 6% / 10% / 14%
System Test / 32% / 34% / 36%
Acceptance Test / 13% / 15% / 17%
Testing Subtotal / 72% / 81% / 87%
Cumulative Total / 81.1% / 95.6% / 99.96%

Table 4: Defect Detection Rate [Capers Jones 2013b]

As can be seen from the previous table, no single type of testing is very effective at uncovering defects, regardless of defect type. Evenwhen all of these testing methods are used on an average project, they only identify 4 out of 5 of the code defects.

1.5What is a Testing Pitfall?

A testing pitfall is any situation, action, decision, or mindset that unnecessarily and unexpectedly causes testing to be less effective, less efficient, and less fulfilling to perform. A testing pitfall is a commonly occurringway to screw up testing and thereby suffer unintended negative consequences. Ultimately, a testing pitfall is a test-related source of project risk.

In a sense, the description of a testing pitfall constitutes a testing anti-pattern. However, the term pitfall was specifically chosen to evoke the image of a hidden or not easily identifiedtrap for the unwary or uninitiated. As with any trap, it is better to avoid a testing pitfall than it is to have to dig one’s self and one’s project out of it after having fallen in.

1.6Categorizing Pitfalls

Many testing pitfalls can occur during the development or maintenance of software-reliant systems and software applications. While no project is likely to be so poorly managed and executed as to experience the majority of these pitfalls, most projects will suffer several of them. Similarly, while these testing pitfalls do not guarantee failure, they definitely pose serious risks that need to be managed.

Based on over 30 years of experience developing systems and software as well as performingnumerous independent technical assessments, this technical report documents 90pitfalls that have been observed to commonly occur during testing. These pitfalls have beencategorized as follows:

•General Testing Pitfalls

Test Planning and Scheduling Pitfalls

Stakeholder Involvement and Commitment Pitfalls

Management-related Testing Pitfalls

StaffingPitfalls

Test Process Pitfalls

Test Tools and Environments Pitfalls

Test Communication Pitfalls

Requirements-related Testing Pitfalls

•Test Type Specific Pitfalls

Unit Testing Pitfalls

Integration Testing Pitfalls

Specialty Engineering Testing Pitfalls

System Testing Pitfalls

System of Systems (SoS) Testing Pitfalls

Regression Testing Pitfalls

1.7Scope

The scope of this report is testing, which is only one of several methods commonly used to validate that a system meets its stakeholder needs and verify that the system conforms to its specified requirements. Although other such methods (e.g., inspections, demonstrations, reviews, analysis, and simulation) exist and could be documented in a similar manner, they are beyond the scope of this already rather large report.

The pitfalls in this report primarily apply to large and medium-sized projects producing important systems and software application that require at least a quasi-rigorous testing program and process. The pitfalls do not necessarily apply to very small and simple projects producing relatively trivial systems and programs — that (1) will only be used in-house with close collaboration between stakeholders and developers, (2) will not beplaced into operation, (3) will be used once and not maintained, and (4) are neither business-, mission-, safety-, nor security-critical — can often be adequately tested in a highly informal and ad hoc manner. Some of the pitfalls apply only or primarily to the testing of systems having significant hardware, and these pitfalls therefore do not [primarily] apply to the testing software-only applications.

1.8Usage

Theinformation describing each of the commonly occurring testing pitfalls can be used:

•To improve communication regarding commonly occurring testing pitfalls

•As training materials for testers and the stakeholders of testing

•As checklists when:

Developing and reviewing an organizational or project testing process or strategy

Developing and reviewing the:

—Test and Evaluation Master Plan (TEMP), System/Software Test Plan (STP), or Test Strategy Document (TSD)

—The testing sections of such planning documents such as the System Engineering Management Plan (SEMP) and System/Software Development Plan (SDP)

Evaluating the testing-related parts of contractor proposals

Evaluating test plans, test documents, and test results (quality control)

Evaluating the actual as-performed testing process (quality assurance)[1]
(Notes are identified by number [#] and located in Appendix C: Notes.)

Identifying testing risks and appropriate risk mitigation approaches

•To categorize testingpitfalls for metrics collection, analysis, and reporting

•As an aid to identify testingareas potentially needing improvement during project post mortems (post implementation reviews)

Although each of these testing pitfalls has been observed on multiple projects, it is entirely possible that you may have testing pitfalls not addressed by this document.

1.9PitfallSpecifications

Section 2 contains high-level descriptions of the different pitfalls, while the tables in Section 3 document each testingpitfall with the following detailed information: