Fundamentals of Measurement

Introduction

In all aspects of our day to day lives, things are measured. Whether it be the volume on the television, the size of a file, or the temperature of a cup of tea, we as a species use an enormous variety of different measurements to help us understand our world around us. Through defining a relative scale, almost anything can be measured and this is shown frequently, often in unexpected ways (Shampoo advertisements claim an increase in hair's radiance with surprising accuracy).

It comes as no surprise, then, that in our goal of perfect software we seek to measure the quality of systems being developed. But just how do you measure the quality of a software system? In fact, how is a measurement made at all?

Beginning to Measure

Before we attempt to devise a method of measuring a subject, we must first look at more general terms of measurement. Consider the size of objects in a room. It is easy to say that one object is larger than another or, conversely, that one is smaller. This is comparison.

Comparison is the basis for all measurement, and in itself the most basic form of measurement. When looking at a set of objects comparative measures can be taken, often with ease, between objects. However, the problem with such comparison is that it requires at least two objects, one as a frame of reference against which to measure the other. When you take away the other object, you cannot say whether it is larger, or smaller, than anything else. Along side this is the issue of small changes: if two objects are very similar then it is difficult to rank them accurately against each other.

Similarly basic is counting. We can quantify any collection of objects by counting them. The drawback of this, however, is that it reveals little, if anything, about the objects being counted. Consider counting the number of chairs in a house. The only piece of information this provides is the number of individual chair objects; there is no way, for instance, to divine the number of people these chairs could seat (a two-person sofa could be considered a chair), or the amount of space these chairs take up.

It is this need to quantify that brings units of measurement into account.

Units

In any measurement it is important to define a unit in which the subject can be measured; simply defining one quantity as greater than another is not useful. A unit can be used to define a quantity on a numerical scale, and gives an easily comparable value.

A unit must have conform to certain rules in order for it to be of any use, most of which come down to consistency.

The first of these is the unit rule. This states that some arbitrary amount should be attributed to a numerical value of one. Take, for instance, the metre. This is a unit of measurement of distance or length.

The second rule is the additive rule. This means that if one object has a greater value of a quantity than another, then the first objects value should be able to be comprised of multiples of the second's. In more simple terms consider measurement of time intervals half an hour, and ten minutes. We know that half an hour is longer than ten minutes, and we know that half an hour is three times ten minutes. This does not, of course, require integer multiplications; half an hour is one and a half times twenty minutes, so the rule still holds.

The third rule to consider when defining a measurement is the equality rule. This rule requires that two quantities deemed of equal value must be equivalent in that value. A perfect example of this would be measuring mass. If two objects are both considered to have a mass of one kilogramme, then when placed on opposite sides of a perfect set of scales the scales should balance.

Finally, the zero rule needs to be considered. This exists so that measurement of any quantity has a basis point against which a measurement can be made. This can be confusing, since zero would imply a lack of the quality being measured, but it is also possible to enter the realms of negative values. If we consider the previous example of mass, a mass of zero implies a complete lack of mass, and therefore non-existence of the object. A negative mass implies antimatter, however that is beyond the scale of this document. However, for a more sensible explanation we can consider temperature scales. The Celsius scale has a zero value defined as the triple point of water (the lowest temperature at which water can exist in all three physical states), however we know that this is not the lowest physical temperature available, and hence the scale goes down to around -273 Celsius.

So, What Is Software Quality?

· Speed? - Time critical software needs to have a very fast response time.

· Usability? - Software features should be found in common locations (Print is in the File menu, for example)

· Accuracy? - A missile guidance system needs to be accurate to within metres.

· Reliability? - An operation should produce the same result every time

· Correctness? - Can the software be proven correct?

In fact, software quality is all of these things! The relevance of these is dependant on the users and environment of use of the software. For instance, a home user of a word processor would look for a simple and intuitive user interface (something Microsoft have failed to provide for many years), whereas speed is not such a concern (saving a file could take a few seconds for a large file: this is not a problem. However, an investment bank would be looking for a fast, accurate system for an environment in which milliseconds can make the difference.

How Is It Measured?

In simple terms, the quality of a piece of software can be measured in a few different ways. The volume of code and the number of errors present in the code are used to find the error density of the program. However, it is also important to note the percentage of errors removed from the code during testing and debugging, as well as the number of errors that come to light after release of the product. The development time is also worth considering. If the development process overruns, this can either cause or highlight problems with respect to budget and manpower. If development is wildly faster than expected, it is likely that something has been missed in the requirements stage, or the estimation technique needs rethinking.

Unfortunately, not all of the elements mentioned above are measurable in any meaningful way. Usability, for example, is an entirely subjective affair, and what one user might find to be an easy to use piece of software, another may find completely incomprehensible. With respect to proving correctness, once a system starts to become reasonably large it becomes incredibly time consuming to prove it's correctness.

Software Volume Metrics

The first step in measuring the complexity of a program is to measure the volume, based on the idea that the volume of code needed to create the program is directly proportional to the complexity of the program.

KLOC – Thousand Lines of Code

Arguably the simplest method of measuring complexity of a piece of software, this is simply defined as how many thousand lines of source code a program is compiled from. This, of course, suffers greatly in reliability thanks to the myriad of different programming styles. For instance, one person may use the java style of function enclosure, that is the starting curly brace on the same line as the function definition, whereas another might use the C++ style, that is a new line for the opening curly brace. While this might not seem like much, in a large piece of software containing hundreds, or even thousands, of functions this could potentially make the difference of a few thousand lines of code. Hence, this is not a particularly reliable method of measuring software complexity. It is also worth noting that this is a retrospective measure, not for use in estimating a projects complexity during the design stage.

Function Point Analysis

A more comprehensive method of determining a programs complexity is given in the Function Points scheme. Function point analysis is based on the functional user requirements, that is the requirements which map to functions performed by the user of the system, for instance, saving a file. These functional user requirements are categorised into one of five types of function. Once this is complete, each requirement is then assigned a number of “crude”, or unadjusted, function points, depending on whether the function is of simple, average, or high complexity.

Once the crude function points have been assigned, the total is then multiplied by an adjustment factor, known as the relative complexity adjustment factor. While this method is relatively simple to use and can be prepared at the design stage, the results can be somewhat subjective, and results can be different depending on function point counting methods.

Error Volume Metrics

The number of errors in a piece of software is one way of measuring the quality of that software. However, errors often go by undetected, and a piece of software can be released with undiscovered errors. Several stages of bug finding are undergone to find errors in code before a program is released, and the number of undiscovered released errors can be estimated from this.

Errors can be split into two main categories, code errors and design errors. Code errors are errors in the implementation of the system, and are found in the source code. Design errors are errors in the design of the system.

Errors can be used in a variety of formulae to describe the quality of the software, but ultimately it comes down to the number of errors in the program. Errors are assigned a severity, and weighted accordingly. How errors are weighted is really up to the developing house, as is classification of severity, and this makes measuring the error density of a system difficult. This is still useful, however, since the average severity of errors can be used.

The quantity of errors, however, is a simple unit which can be used for calculating error detection and removal. By comparing the number of errors found and fixed before release with the total number of errors found before and after release, a measure of effectiveness of error detection and removal can be calculated in the form of a percentage.

The Software Process Timetable

During, and even after, development it is important to know whether there have been delays in the development of the system and this is found by analysis of the software process timetable.

In order for this to make any sense, development of the system needs to be divided into sections, categorised as milestones. Having divided the system thus, a timetable can be created to estimate how long each milestone should take to attain, and how long the entire development process should take. The Timetable Observance can then be calculated by dividing the number of milestones that were completed on time with the total number of milestones. If any milestones were not completed on time, the average delay for milestone completion can be easily calculated.

While the classification of what constitutes as a milestone can be somewhat subjective, it is reasonable to say that the end of each phase of the project lifecycle can be considered a milestone, as can the end of each iteration of an iterative methodology. Whether or not any other milestones are considered is usually up to the project manager.

Finally

Measurement is a very powerful tool, and essential in quantifying anything. In defining a measurement we simply need to comply to four simple rules, and this tool becomes available wherever it is needed. Through measuring we strive to gain an understanding of our world and everything it contains. It is this understanding that allows us to improve software quality in areas that might not have been distinguishable without it.

References

Wikipedia: Function Point Analysis

Notes on the Fundamentals of Measurement and Measurement as a Production Process

Activity: Lifecycle Review - Rational Software Corporation

Milestone Reviews – Inseris Project Management Consulting