Software Defect Reduction Top-10 List

Barry Boehm, USC and Victor Basili, U. of Maryland

Recently, a grant from NSF enabled us to establish a national Center for Empirically-Based Software Engineering (CeBASE). The CeBASE objective is to transform software engineering as much as possible from a fad-based practice to an engineering-based practice through derivation, organization, and dissemination of empirical data on software development and evolution phenomenology.

"As much as possible" reflects the fact that software development will always remain a people-intensive and continuously changing field. However, we have found that people in the field have been able to establish objective and quantitative data, relationships, and predictive models which have helped many software developers to avoid predictable pitfalls and improve their ability to predict and control efficient software projects.

As a way of illustrating this, we are devoting this column to an update of one of our previous columns ("Industrial Metrics Top-10 List," by Barry Boehm, IEEE Software, September 1987, pp. 84-85) which provided a concise selection of empirical data which many software practitioners found very helpful. As a major CeBASE focus is on software defect reduction, here is a software defect reduction top 10-list, in rough priority order. More details and references can be found in an expanded Web version of this column, at

1. Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase.

This was also the top-priority item in the 1987 list. As in 1987, "This insight has been a major driver in focusing industrial software practice on thorough requirements analysis and design, on early verification and validation, and on up-front prototyping and simulation to avoid costly downstream fixes."

The only thing we have changed since 1987 is to add the word "often," to reflect additional insights on the relationship. For one, the cost-escalation factor for small, noncritical software systems is more like 5:1 than 100:1, enabling such systems to be developed most efficiently in a less formal, "continuous prototype" mode -- but still with emphasis on getting things right early rather than late. Another is that the cost-escalation factor can be reduced significantly even for large critical systems via good architectural practices. These reduce the cost of most fixes by confining them to small, well-encapsulated modules. An excellent example was the million-line TRW CCPDS-R project described in Appendix D of Walker Royce's Software Project Management: A Unified Approach, Addison-Wesley, 1988, where the cost-escalation factor was only about 2:1.

2. About 40-50% of the effort on current software projects is spent on avoidable rework.

"Avoidable rework" is effort spent fixing difficulties with the software that could have been avoided or discovered earlier and less expensively. This implies that there is such a thing as "unavoidable rework." This fact has been increasingly appreciated with the growing realization that better user-interactive systems result from "emergent" processes (where the requirements emerge from prototyping and other multi-stakeholder shared learning activities) than from "reductionist" processes (where the requirements are stipulated in advance and then reduced to practice via design and coding). We believe that this distinction is essential to a modern theory and practice of software defect reduction. Changes to the definition of a system that make it more cost-effective should not be discouraged by classifying them as defects to be avoided.

Reducing avoidable rework is thus a major source of software productivity improvement. In our behavioral analysis of the effects of software cost drivers on effort for the COCOMO II model (B. Boehm et al., Software Cost Estimation with COCOMO II, Prentice Hall, 2000) most of the effort savings from improving software process maturity, software architectures, and software risk management came from reductions in avoidable rework.

3. About 80% of the avoidable rework comes from 20% of the defects.

For smaller systems, the 80% number may be lower; for very large systems, it may be higher. Two major sources of avoidable rework are hastily-specified requirements and nominal-case design and development (where late accommodation of off-nominal requirements causes major architecture, design, and code breakage). If you have a software problem report tracking system which records the effort to fix each defect, it is fairly easy for you to analyze the data to determine and address additional major sources of rework in your organization.

4. About 80% of the defects come from 20% of the modules and about half the modules are defect free.

Studies from different environments over many years have been amazingly consistent, with figures between 60% and 90% of the defects coming from 20% of the modules, and a median of about 80%. What also appears to be consistent is that all of the defects are contained in about half of the modules. This data is representative of each of the studies cited in the web version of this paper.

Thus, it is worth the effort to identify the characteristics of error prone modules in a particular environment. There are a variety of factors that contribute to error-proneness that appear to be context dependent. However, some factors that usually contribute to error-proneness are the level of data coupling and cohesion, size, complexity, and amount of change to reused code.

5. About 90% of the downtime comes from at most 10% of the defects.

It is obvious that all faults are not equal in terms of their rate of occurrence. That is, some defects have a disproportionate effect on downtime and reliability of a system than others. An analysis of the software failure history of nine large IBM software products, found that about .3% of the defects accounted for about 90% of the downtime. Thus risk-based testing, including understanding the operational profiles of a system and emphasizing testing of high-risk scenarios, is clearly cost effective.

6. Peer reviews catch 60% of the defects.

Given that the cost of finding and fixing most defects rises the later we find them in the lifecycle, we are interested in techniques that find defects earlier in the lifecycle. Numerous studies have confirmed that peer reviews are very effective in this regard. The data range from catching 31% to 93% of the defects, with a median of around 60%. Thus the 60% number, which comes from the 1987 column, is still a reasonable estimate.

Factors effecting the percentage of defects caught include the number and type of peer reviews performed, the size and complexity of the system, and the frequency of defects better caught by execution (e.g., concurrency and algorithm defects). Our studies have provided evidence that peer reviews, analysis tools, and testing catch different classes of defects at different points in the development cycle. Further empirical research is needed to help choose the best mixed strategy for defect reduction investments.

7. Perspective-based reviews catch 35% more defects than non-directed reviews.

A scenario based reading technique (Basili, V. R., Evolving and Packaging Reading Technologies, Journal of Systems and Software, vol. 38, no. 1, pp. 3-12, July 1997) offers a reviewer a set of formal procedures for defect detection based upon varying perspectives. The union of several perspectives into a single inspection offers broad, yet focused coverage of the document being reviewed. The goal is to generate focused techniques aimed at specific defect detection goals, taking advantage of the existing defect history in an organization.

Scenario-based reading techniques have been applied in requirements and object oriented design inspections, as well as user interface inspections. Improvement results vary from 15% to 50% in fault detection rate. Further benefits of focused reading techniques are that they facilitate training of inexperienced personnel, better communication about the process, and continual improvement over time.

8. Disciplined personal practices can reduce defect introduction rates by up to 75%.

Several disciplined personal processes have been introduced into practice. These include Harlan Mills’ Cleanroom software development process and Watts Humphrey’s Personal Software Process (PSP). Data from both of them support the concept that personal discipline can greatly reduce the introduction of defects into software products. Data from the use of Cleanroom at NASA have shown failure rates during test reduced by 25% to 75%. Use of Cleanroom also showed a reduction in rework effort, i.e., only 5% of the fixes took more than an hour to fix as opposed to the standard of over 60% of the fixes taking over an hour to fix.

PSP's strong focus on root-cause analysis of an individual's software defects and overruns, and on developing personal checklists and practices to avoid future reoccurrence, has a significant effect on personal defect rates. Reductions of 10:1 are common between exercises 1 and 10 of the PSP training course.

Effects at the project level are more scattered. They depend on such factors as the organizations' existing software maturity level and the people's and organizations' willingness to operate within a highly structured software culture. When PSP is coupled with the strongly compatible Team Software Process (TSP), defect reduction rates can be factors of 10 or higher for organizations operating at modest maturity levels, but less if organizations already have highly mature processes. The June 2000 special issue of CrossTalk, "Keeping Time with PSP and TSP," has a good set of relevant discussions, including experience showing that adding PSP and TSP to a CMM Level 5 organization reduced acceptance test defects by about 50% overall, and about 75% for high-priority defects.

All other things being equal, it costs 50% more per source instruction to develop high-dependability software products than to develop low-dependability software products. However, the investment is more than worth it if significant operations and maintenance costs are involved.

The analysis of 161 project data points for the COCOMO II model referenced above resulted in an added cost of 53% for its "Required Reliability" factor, while normalizing for the effects of 22 other factors. Does this mean that Philip Crosby's landmark book, Quality Is Free (Mentor, 1980), had it all wrong? Maybe for some low-criticality, short-lifetime software, but not for the most important cases.

First, in the COCOMO II maintenance model, low-dependability software is about 50% per instruction more expensive to maintain than to develop, while high-dependability software is about 15% less expensive to maintain than to develop. For a typical life cycle cost distribution of 30% development and 70% maintenance, low-dependability software becomes about the same in cost per instruction as high-dependability software (again, assuming all other factors are equal).

Second, in the COCOMO II-related quality model, high-dependability software removes about 4 times as many defects as average-dependability software, which in turn removes about 4 times as many defects as low-dependability software. Thus, if the operational cost of software defects (due to lost worker time, lost sales, recalls, added customer service costs, litigation costs, loss of repeat business, etc.) is roughly equal to life-cycle software development and maintenance costs for average-dependability software, the increased defect rate of low-dependability software will make its ownership costs roughly three times higher than the ownership costs of high-dependability software.

10. About 40-50% of user programs have nontrivial defects.

A landmark 1987 study in this area found that 44% of 27 spreadsheet programs produced by experienced spreadsheet developers had nontrivial defects: mostly errors in spreadsheet formulas. The developers were quite confident that their spreadsheets were accurate. Subsequent laboratory experiments have reported defective spreadsheet rates between 35% and 90%. Analysis of operational spreadsheets have reported defectiveness rates between 21% and 26%; the lower rates are probably due to corrections already made during operation.

Nowadays and increasingly in the future, user programs will escalate from spreadsheets to Web/Internet scripting languages capable of sending agents into cyberspace to make deals for you. And there will be many more "sorcerer's apprentice" user-programmers with tremendous power to create high-risk defects and little training or expertise in how to avoid or detect them. One of our studies for the COCOMO II book (page 6) estimated that there would be 55 million user-programmers in the U.S. by the year 2005. Including active Web-page developers as user-programmers, this prediction is basically on-track.

Thus, another challenge for the creators of web-programming facilities is to provide them with the equivalent of seat belts and air bags, plus safe-driving aids and rules of the road. This is one of several software engineering research challenges identified by a National Science Foundation study, "Gaining Intellectual Control of Software Development," which we recently summarized in Computer (May 2000, pp. 27-33).

There is a great need to refine and expand this top-10 list and related empirical research on defect reduction.

Clearly, much of the data reported above does not take into account the interaction of many of the variables. Some further things you would like to know, for example, are, “If I invest in peer reviewing, Cleanroom, and PSP, am I paying for the same defects to be removed three times? Will this enable me to avoid doing (some) testing?” Further empirical research in defect reduction is needed to be able to answer questions like these.

We hope to involve the software community in a process of expanding the top-10 defect reduction list and other currently-available data into a continually evolving, open-source, Web-accessible handbook of empirical results on software defect reduction strategies. We also plan to initiate counterpart handbooks for COTS-based systems and other future software areas. We would welcome your participation in this effort; please see the CeBASE web site ( for further information and ways of participating.

Box. Summary: Software Defect Reduction Top-10 List

Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase.

About 40-50% of the effort on current software projects is spent on avoidable rework.

About 80% of the avoidable rework comes from 20% of the defects.

About 80% of the defects come from 20% of the modules and about half the modules are defect free.

About 90% of the downtime comes from at most 10% of the defects.

Peer reviews catch 60% of the defects.

Perspective-based reviews catch 35% more defects than non-directed reviews.

8. Disciplined personal practices can reduce defect introduction rates by up to 75%.

All other things being equal, it costs 50% more per source instruction to develop high-dependability software products than to develop low-dependability software products. However, the investment is more than worth it if significant operations and maintenance costs are involved.

10. About 40-50% of user programs have nontrivialdefects.