Improved Requirements Engineering

Based on Defect Analysis

Otto Vinter

Brüel & Kjær Sound & Vibration Measurement A/S, DK-2850 Naerum, Denmark

Email:

Søren Lauesen, Jan Pries-Heje

Copenhagen Business School, DK-2000 Frederiksberg, Denmark

Email: ,

Abstract

The basis for this paper has been a thorough analysis of error reports from actual projects. Error reports which were requirements related have been studied in detail with the aim of finding effective prevention techniques, and try them out in practice. The paper will cover some of the analysis results, a set of effective prevention techniques, and also some practical experiences from using some of these techniques on real-life development projects.

1.1.Introduction

At SQW’96 and QWE’97 Brüel & Kjær reported the experiences of a software process improvement (SPI) project where we demonstrated that the introduction of static and dynamic analysis in our software development process had a significant impact on the quality of our products.

The basis for this project was a thorough analysis of error reports from previous projects which showed the need to perform a more systematic unit test of our products. However, the analysis also showed that the major cause of bugs stemmed from requirements related issues.

We have now conducted another SPI project where we have analysed the requirements related bugs in order to find and introduce effective prevention techniques in our requirements engineering process.

The analysis of the requirements related bugs led to a set of techniques that would have been effective on the analysed projects. Some of these techniques were selected for experimentation (validation) on a real-life project. This project is now complete, and the product released. We have analysed the error reports from this project and the practical experiences and major impacts of the techniques used will be reported.

2.2.Analysis of Requirements Issues

There is no generally accepted way of making a requirements specification. Recommendations like the IEEE Guide [1] and Davis [3]are definitely helpful, but most developers have great troubles following them. What should be included and what not? How can you formulate functional requirements without committing to a specific user interface? How can you formulate "what" without describing "how"?

Through our analysis of error reports in a previous SPI project aimed at improving the efficiency of our testing process [4][5]we have found that requirements related bugs are the major cause for bugs. We therefore decided to conduct this SPI project [6] aimed specifically at these type of bugs in order to find and introduce effective prevention techniques for requirements problems in our development process.

In both projects we have classified bugs according to a taxonomy described by Boris Beizer [2]. For the current SPI project, however, we have limited the study to those bugs which can be related to requirements issues. We found that requirements related bugs represented 51% of all the bugs analysed.

Furthermore we have found that requirements issues are not what is expected from the literature. Usability issues dominate (64%). Problems with understanding and co-operating with 3rd party software packages and circumventing their errors are also very frequent (28%). Functionality issues that we (and others) originally thought were the major requirements problems only represent a smaller part (22%). Other issues account for 13%. The sum of these figures adds up to more than 100% because one bug may involve more than one issue.

Usability errors also seem to be rather easy to correct even when found late in the development process, e.g. in or after the integration phase. Problems with 3rd party products, however, are generally very costly to correct/circumvent.

This has had an impact on our methodology, tools and training. It had to be much more focused on usability problems, and early verification and validation techniques, rather than correctness, and completeness of requirements documents.

3.3.Potential Prevention Techniques

When we classified the bugs, we tried to imagine what could have prevented each bug. We started out with a list of known techniques and added to it when no technique seemed to be able to prevent the bug in question. Later when we discussed hit-rates for each technique, we improved and specified each technique further.

Many well-known techniques were considered but dropped, because we could see no use for them in relation to the actual bugs. Initially, for instance, we thought that argument-based techniques could be useful, but the error reports did not show a need for them. Others like formal (mathematical) specifications, seemed of little value. Techniques with focused early experiments on different kinds of prototypes, seemed much better suited to catch real-life bugs.

Many of the proposed techniques are “common sense” techniques that are moved up from the design phase to the requirements phase and formalised. When they are used in this context they will ensure the right quality of the product.

The result was a detailed list of some 40 prevention techniques grouped under the following major subjects:

- 1xx Demand analysis (including scenarios)

- 2xx Usability techniques (including prototypes)

- 3xx Validation and testing of external software

- 4xx Requirements tracing

- 5xx Risk analysis

- 6xx Improved specification techniques (e.g. formal/mathematical)

- 7xx Checking techniques (including formal inspections)

- 8xx Other techniques (including performance specifications)

4.4.Determining the Optimum Set of Techniques

Each error report was then assigned an estimated hit-rate for each technique, and the estimated effectiveness of each technique was calculated. We also assigned a benefit for preventing each error report, so that we were able to calculate the cost/benefit ratio of each technique, and then select the optimum set of techniques to be employed in our real-life experiment on a baseline project.

The results are shown in the figure below for the top seven techniques with respect to savings. The hit-rates are shown as a percentage of the total number of bugs in the project. The savings are shown as a percentage of the total development effort.

We have chosen to show only the top scorers with respect to savings. Other techniques had hit-rates comparable to the ones shown above, but with lower savings (even negative) because of high costs.

When more than one technique is used at the same time one must be aware, that combining two techniques does not simply add their hit-rates, because the first technique "filters" away some problems, leaving fewer problems for the second technique to detect. And this has an effect on the savings too. In general it will be better to combine techniques that find different kinds of problems.We have applied the principle of dynamic programming to calculate these combined hit-rates and savings and have found the best combinations with respect to savings.

The combination of the four best techniques above on the analysed project would have resulted in a combined hit-rate of 19% of all error reports (37% of the requirements related), and a combined saving of 6 % on the total development effort, which would have been approximately 1 month on the 18 month schedule of the project.

5.5.The Requirements Engineering Methodology

The results of the analysis were presented to the members of a new development project. Based on the detailed list of techniques and hit-rates/savings, the team took part in the final decision on the techniques of the methodology.

The techniques of the methodology are:

  • Requirements Elicitation and Validation

•Scenarios (101)

-Relate demands to use situations. Describe the essential tasks in each scenario.

Navigational Prototype Usability Test, Daily Tasks (220)

-Check that the users are able to use the system for daily tasks based on a navigational prototype of the user interface.

  • Verification of the Requirements Specification

Let Product Expert Review Screens (280)

-Let a product expert check screens for deviations from earlier product styles.

•External Software Stress Test (301)

-Test that the external software fulfills the expectations in the requirements, with special emphasis on extreme cases.

•Orthogonality Check (721)

-Check the requirements specification to see whether an operation or feature can be applied whenever it is useful.

•Performance Specifications (820)

-Check the requirements specification to see whether it contains performance goals for the requirements.

The analysis of error reports had not found that technique 101 (Scenarios) was effective in itself. However, since usability was such an important issue, we needed a technique to define the tasks that users should perform during the usability tests. We therefore included 101 as one of the techniques in the methodology, also because other sources indicated that scenarios were quite effective to improve developer understanding of the domain.

Originally the team focused on technique 230 (Functional Prototype Usability Test, Daily Tasks) as the choice of prototype for usability tests, because they were worried that they would not get enough out of a paper mockup (210) or a navigational prototype (220). On the other hand we were worried that a functional prototype would not be available for usability tests until too late to allow for the changes to requirements that would be uncovered by these tests.

What actually happened was that after the use situations (scenarios) had been described, the team could not wait for a functional prototype to be developed. In only two weeks they developed a first prototype with navigational facilities (screen mockup). Both VisualBasic and the Bookmark feature in Word 6 was used to develop further prototypes. These navigational prototypes were immediately subjected to usability tests, and the amount of issues found and corrected convinced the team that a navigational prototype would be sufficient.

Thus, the technique that is included in our methodology is not 230 (Functional prototype usability test, daily tasks), but another of the analysed techniques: 220 (Screen mockup usability test, daily tasks). Even though this technique is estimated to have only half the hit-rate and much lower savings than technique 230.

6.6.Experiences with the Techniques During the Experiment

The development team picked up the scenario and usability test techniques with great enthousiasm. In interviews, statements like the following were heard:

Scenarios (101):

“I am in fact deeply surprised. The scenarios made it possible for us to see the flow through the system in a very concrete way.”

“In the beginning of the project I was quite sceptic. I thought it would take too long time. But now I think we get a much more live and exciting requirements specification as a result of the scenarios. It will also make it much easier to make a prototype”.

“It has been an exciting experience to use scenarios. When you had the scenarios, then the requirements popped up by themselves”

Screen Mockup Usability Test, Daily Tasks (220):

“It only took a week to develop the original prototype in Visual Basic, and the modifications from the first set of tests to the next were performed overnight in the hotel room”

“The closer we got to the real users, the clearer became the actual tasks that they performed”

“We got more information out of the tests than we are able to incorporate in the product. We found features that we had never thought about, as well as features that were irrelevant to the users”

The other techniques of the methodology were never applied by the team in practice. Introducing so many new techniques at the same time on a project turned out to be too ambitious.

The requirements engineering process took longer than expected, but the specification and design phases were reduced, so the resulting delay was not considered critical. The rest of the development process was conducted in the ususal fashion, following the normal procedures for development of software at Brüel & Kjær. The baseline project was completed in December 1997 and the product was released.

7.7.Results of the Experiment

We have analysed the bugs from the completed project in the same manner as we did in the original analysis reported in chapter 2-4. We compare the results to another project previously developed by the same team on the same platform and under similar circumstances. The actual number of person months on the two projects is within 10%.

The major difference between the two projects is that the project used for experimentation was expected to be very user interface intensive. Actually it contains almost 4 times as many new screens as the project we compare it to.

The application it was intended to support had previously resulted in at lot of “shelfware” products both from B&K and our competitors, because the customers were unable to grasp the intricacy of the measurement standard that should be followed. The usability techniques to be experimented with would seem especially suited for the experiment.

7.1.7.1.Effect on Requirements Related Issues

We have found an overall reduction in error reports of 27% from the previous generation of the product to the experimented product. The reduction in the number of requirements related error reports was 11%. According to our analysis, the actually used techniques (101 and 220) were estimated to achieve a combined hit-rate of 8% of all error reports and 15% of the requirements related.

When we study the distribution of requirement issues according to quality factors, we see a slight increase in usability issues (5%), whereas other requirements issues (functionality etc.) have been reduced by 36%. The immediate reaction to this is that the usability techniques employed have not reduced usability issues.

However, the impact of usability techniques is closely linked to the complexity of the user interface. The baseline project had almost 4 times as many new screens as the previous project we compare it to, all of comparable complexity. If we adjust for this difference, we actually have achieved a 72% reduction in usability issues per new screen, which is quite extraordinary.

Furthermore, the baseline project only spent 33% more person months to deliver almost 4 times as many new screens of comparable complexity. This almost 3 times difference in productivity can be explained by the design and development of the user interface being a stable process once the navigational prototype (screen mockup) had been validated in usability tests. In the previous project the new screens were constantly subject to change all through to the end of the project.

Finally, we have analysed the error reports from the baseline project to study hit-rates and savings in order to find further techniques that could have been employed with effect on the remaining bugs. We have found that none of the usability test techniques on prototypes are any longer among the top 7 candidates with respect to savings.

This shows that the usability test techniques have been effective in preventing requirements related bugs, and that using a navigational prototype (screen mockup) instead of a functional prototype seems to be adequate to prevent this type of bugs. This is important since the cost to build a navigational prototype is lower than building a functional prototype and can be performed much earlier in the development life-cycle.

Furthermore the requirements verification techniques of our methodology are still on the top of the list with respect to savings, and ranked relatively as follows: 280 (Let Product Expert Review Screens), 820 (Performance Specifications), and 721 (Orthogonality Check). This shows that these verification techniques of the methodology could have been an important supplement to the validation techniques actually used.

The baseline project did not use external software, so technique 301 (External Software Stress Test) has not been validated. Our original analysis showed that this technique would have been very effective in preventing problems that are difficult to fix on projects with new external software.

7.2.7.2.Other Effects

What is also surprising is that not only did we experience a reduction in bugs related to requirements issues, we found an even higher reduction in other bug categories (37%). We have been very puzzled about this unexpected result. We have thought of several causes that might have influenced the result.

  1. The primary effect of the used techniques
  2. Derived effects of the used techniques
  3. Focus on a team improves their productivity no matter what else is changed.
  4. Random effects
  5. A change in the experimenters’ evaluation of the reports
  6. Differences in the team/culture/domain/project

Since we are comparing the results with another project previously developed by the same team, within the same domain, and under similar circumstances, we can eliminate cause 6.

Two out of three persons on the evaluation team (cause 5) have taken part in all the analyses and comparisons of error reports. The analysis reported in chapter 2-4 took place two years ago, but we have had to revisit of some of the original error reports during the present comparison of results, and we found a reasonable agreement with our previous analysis.

We cannot completely rule out random effects (cause 4). However, the observed differences are within standard confidence limits so the reduction cannot be attributed to random effects only.

Nor can we rule out the Hawthorne effect (cause 3), which states that merely focusing on a team improves their productivity no matter what else is changed. But statements from developers suggest that the primary and derived effects of the techniques (cause 1 and 2) are the main causes for the reduction in error reports.

The derived effect on other types of bugs than the requirements related can be explained by the fact that most of the developers achieved a deep understanding of the domain in which the product was going to be used from describing use situations (scenarios) and taking part in the usability tests.

This invariably leads to reduced uncertainty and indecision among the developers on what features to include and how they should be implemented and work. As one developer said during an interview: