Software Project Cost Estimating

uNIVERSITY OF aLASKA AT aNCHORAGE
Software Project Cost Estimating
Russell Frith
4/12/2011

Abstract

Software engineering cost estimation is the process of predicting the effort required to develop a software system. Cost estimation techniques involve distinctive steps, tools, algorithms and assumptions. Many estimation models have been developed since the 1980’s due to the dyanmic nature of software engineering practices. Despite the evolution of new cost estimation techniques, fundamental economic principles underlie the overall structure of the software engineering life cycle, and its primary refinements of prototyping, incremental development, and advancement. This paper provides a general overview of software cost estimation methods, including recent advances in the field. Many of the models rely on a software project size estimate as input and this paper provides details for common size metrics. The primary economic driver of the software life-cycle structure is the significantly increasing cost of making software changes or fixing software problems as a function of a development phase in which the change or fix is made. Software engineering models are classified into two major categories: algorithmic and non-algorithmic. Each has its own strengths and weaknesses with regard to implementing modifications to software projects. A key factor in selecting a cost estimation model is the accuracy of its estimates, which can be very problematical.

1. Introduction

In recent years, software has become the most expensive component of computer system projects. The cost of software development is mostly sourced in human efforts, and most estimation efforts focus on this aspect and give estimates in terms of person-months. If one considers economics as the study of how people make decisions in resource-limited situations, then economics category of macroeconomics is the study of how people make decisions in resource-limited situations on a national or global scale. Macroeconomic decisions are influenced by tax rates, interest rates, foreign policy, and trade policy. Conversely, microeconomics is the study of how people make decisions in resource-limited situations on a more personal scale and it treats decisions that individuals and organizations make on such issues as how much insurance to buy, which word software development systems to procure, or what prices to charge for products and services.

Software engineering is an exercise in microeconomics in that it deals with limited resources. There is never enough time or money to encompass all the essential features software vendors would like to put into their products. Even with cheap hardware, storage, memory, and networks, software projects must always operate within a world of limited computing and network resources. Subsequently, accurate software cost estimates are critical to both developers and customers. Those estimates can be used for generating requests for proposals, contract negotiations, scheduling, monitoring, and control. Understimating software engineering costs could result in management approving proposed systems that potentially exceed budget allocations, or underdeveloped functions with poor quality, or a failure to complete a project on time. Conversely, overestimating costs may result in too many resources committed to a project, or, during contract bidding, result in losing a contract and loss of jobs.

Accurate cost estimation is important because:

It can help to classify and prioritize development projects with respect to an overall business plan,
It can be used to assess the impact of changes and support replanning,
It can be used to determine what resources to commit to the project and how well those resources will be used,
Projects can be easier to manage and control when resources are better matched to real needs, and
Customers expect actual development costs to be in line with estimated costs.

Three fundamental estimates typically comprise a software cost estimate and these are effort in person-months, project duration, and cost. Most cost estimation models attempt to generate an effort estimate which is then converted into a project duration time-line and cost. The relation between effort and cost may be non-linear, although effort is measured in person-months of programmers,analysts, and project managers and effort estimates can be converted to a dollar cost figure by calculating an average salary per unit time of the staff involved and multiplying that number by the estimated effort required.

In constructing a software cost engineering estimate, three basic questions arise:

Which software cost estimation model should be used?
Which software size measurement should be used – lines of code (LOC), function points (FP), or feature points?
What is a good estimate?

A widely used practice of cost estimation method is that of using expert judgment. Using this technique, project managers rely on experience and prevailing industry norms as a basis to develop cost estimates. Basing estimates on expert judgement can be somewhat error prone however:

The approach is not repeatable and the means of deriving an estimate is subjective.
The pool of experienced estimators of new software projects is very small.
In general, the relationship between cost and system size is not linear. Costs tend to increase exponentially with size, which subsequently confines expert judgment estimates to those new projects with anticipated sizes of past projects.
Budget alterations by management aimed at avoiding cost overruns make experience and data from previous projects questionable.

There exist alternatives to expert judgements, some theoretical and not very useful, others having more pragmatic value and they are presented in the software engineering cost estimation section.

In the last four decades, many quantitative software cost estimation models have been developed and they range from empirical models such as Boehm’s COCOMO models [4] to analytical models such as those in [7,22, 23]. An empirical model uses data from previous projects to evaluate the current project and derives the basic formulae from analysis of the particular database available. Alternative analytical models use formulae based on global assumptions, such as the rate at which developers solve problems and the number of problems available.

A well-constructed software cost estimate should have the following properties [24]:

It is conceived and supported by the project manager and the development team.
It is accepted by all stakeholders as realizable.
It is based on a well-defined software cost model with a credible basis.
It is based on a database of relevant project experience (similar processes, similar technologies, similar environments, similar people and similar requirements).
It is defined in enough detail so that its key risk areas are understood and the probability of success is objectively assessed.

Hindrances to developing a reliable software engineering cost estimate include the following:

Lack of an historical database of cost measurement,
Software development involving many interrelated factors, which affect development effort and productivity, and which relationships are not well understood,
Lack of trained estimators with the necessary expertise, and
Little penalty is often associated with a poor estimate.

2. Process of Software Engineering Estimation

Throughout the software life cycle, there are many decision situations involving limited resources in which software engineering techniques provide useful assistance. See Figure II in the appendix for elements of a computer programming project cycle. To provide a feel for the nature of these economic decision issues, an example is given below for each of the major phases in the software life cycle. In addition, refer to Figure III in the appendix for the loopback nature of computer programming process steps.

Feasibility Phase: How much should one invest in information system analyses (user questionnaires and interviews, current-system analysis, workload characterizations, simulations, scenarios, prototypes) in order to obtain convergence on an appropriate definition and concept of operation for the system to be implemented?
Plans and Requirements Phase: How rigorously should requirements be specified? How much should be invested in requirements validation activities (automated completeness, consistency, traceability checks, analytic models, simulations, prototypes) before proceeding to design and develop a software system?
Product Design Phase: Should developers organize software to make it possible to use a complex piece of existing software which generally but not completely meets requirements?
Programming Phase: Given a choice between three data storage and retrieval schemes which are primarily execution time-efficient, storage-efficient, and easy-to-modify, respectively; which of these should be implemented?
Integration and Test Phase: How much testing and formal verification should be performed on a product before releasing it to users?
Maintenance Phase: Given an extensive list of suggested product improvements, which ones should be implemented first?
Phaseout: Given an aging, hard-to-modify software product, should it be replaced with a new product, should it be restructured, or should it be left alone?

Software cost engineering estimation typically involves a top-down planning approach in which the cost estimate is used to derive a project plan. Typical steps in a planning process include:

The project manager develops a characterization of the overall functionality, size, process, environment, people, and quality required for the project.
A macro-level estimate of the total effort and schedule is developed using a software cost estimation model.
The project manager partitions the effort estimate into a top-level work breakdown structure. In addition, the schedule is partitioned into major milestone dates and a staffing profile is configured.

The actual cost estimation process involves seven steps [4]:

establish cost-estimating objectives;
generate a project plan for required data and resources;
pin down software requirements;
work out as much detail about the software system as feasible;
use several independent cost estimation techniques to capitalize on their combined strengths;
compare different estimates and iterate the estimation process; and
once the project has started, monitor its actual cost and progress, and feedback results to project management.

Regardless of which estimation model is selected, consumers of the model must pay attention to the following to get the best results:

Since some models generate effort estimates for the full software life-cycle and others do not include effort for the requirements stage, coverage of the estimate is essential.
Model calibration and assumptions should be decided beforehand.
Sensitivity analysis of the estimates to different model parameters should be calculated.

The microeconomics field provides a number of techniques for dealing with software life-cycle decision issues such as the ones mentioned early in this section. Standard optimization techniques can be used when one can find a single quantity such as rupees or dollars to serve as a “universal solvent” into which all decision variables can be converted. Or, if nonmonetary objectives can be expressed as constraints (system availability must be 98%, throughput must be 150 transactions per second), then standard constrained optimization techniques can be used. If cash flows occur at different times, then present-value techniques can be used to normalize them to a common point in time.

Inherent in the process of software engineering estimation is the utilization of software engineering economics analysis techniques. One such technique compares cost and benefits. An example involves the provisioning of a cell phone service in which there are two options.

Option A: Accept an available operating system that requires $80K in software costs, but will achieve a peak performance of 120 transactions per second using five $10K minicomputer processors, because of high multiprocessor overhead factors.
Option B: Build a new operating system that would be more efficient and would support a higher peak throughput, but would require $180 in software costs.

In general, software engineering decision problems are even more complex as Options A and B and will have several important criteria on which they differ such as robustness, ease of tuning, ease of change, functional capability, and so on. If these criteria are quantifiable, then some type of figure of merit can be defined to support a comparative analysis of the preference of one option over another. If some of the criteria are unquantifiable (user goodwill, programmer morale, etc.), then some techniques for comparing unquantifiable criteria need to be used.

In software engineering, decision issues are generally complex and involve analyzing risk, uncertainty, and the value of information. The main economic analysis techniques available to resolve complex decisions include the following:

Techniques for decision making under complete uncertainty, such as the maximax rule, the maximin rule and the Laplace rule [19]. These techniques are generally inadequate for practical software engineering decisions.
Expected-value techniques, in which one estimates the probabilities of occurrence of each outcome; i.e., successful development of a new operating system, and complete the expected payoff of each option: EV = Prob(success)*Payoff(successful OS) + Prob(failure) *Payoff(unsuccessful OS). These techniques are better than decision making under complete uncertainty, but they still involve a great deal of risk if the Prob(failure) is considerably higher than the estimate of it.
Techniques in which one reduces uncertainty by buying information. For example, prototyping is a way of buying information to reduce uncertainty about the likely success or failure of a multiprocessor operating system; by developing a rapid prototype of its high-risk elements, one can get a clearer picture of the likelihood of successfully developing the full operating system.

Information-buying often tends to be the most valuable aid for software engineering decisions. The question of how much information-buying is enough can be answered via statistical decision theoretic techniques using Bayes’ Law, which provides calculations for the expected payoff from a software project as a function of the level of investment in a prototype. In practice, the use of Bayes’ Law involves the estimation of a number of conditional probabilities which are not easy to estimate accurately. However, the Bayes’ Law approach can be translated into a number of value-of-information guidelines, or conditions under which it makes good sense to decide on investing in more information before committing to a particular course of action.

Condition 1: There exist attractive alternatives which payoff varies greatly, depending on some critical states of nature. If not, engineers can commit themselves to one of the attractive alternatives with no risk of significant loss.

Condition 2: The critical states of nature have an appreciable probability of occurring. If not, engineers can again commit without major risk. For situations with extremely high variations in payoff, the appreciable probability level is lower than in situations with smaller variations in payoff.

Condition 3: The investigations have a high probability of accurately identifying the occurrence of the critical states of nature. If not, the investigations will not do much to reduce the risk of loss incurred by making the wrong decision.

Condition 4: The required cost and schedule of the investigations do not overly curtail their net value. It does one little good to obtain results which cost more than those results can save for us, or which arrive too late to help make a decision.

Condition 5: There exist significant side benefits derived from performing the investigations. Again, one may be able to justify an investigation solely on the basis of its value in training, team-building, customer relations, or design validation.

3. Software Engineering Project Sizing

During the 1950’s and the 1960’s, relatively little progress was made in software cost estimation, while the frequency and magnitude of software cost overruns was becoming critical to many large systems employing computers. In 1964, the U.S. Air Force contracted with System Development Corporation for a landmark project in software cost estimation. The project collected 104 attributes of 169 software projects and treated them to extensive statistical analysis. One result was the 1965 SDC cost model which was the best possible statistical 13-parameter linear estimation model for the sample data:

MM (man-months) = -33.63 + 9.15(Lack of Requirements) + 10.73(Stability of Design) + 0.51(Percent Math Instructions) + 0.46*(Percent Storage/Retrieval Instructions) + 0.4(Number of Subprograms) + 7.28(Programming Language) – 21.45(Business Application) + 13.53(Stand-Alone Program) + 12.35(First Program on Computer) + 58.82(Concurrent Hardware Development) + 30.61(Random Access Device Used) + 29.55(Difference Host, Target Hardware) + 0.54(Number of Personnel Trips) -25.2(Developed by Military Organization) [5]. When applied to its database of 169 projects, this model produced a mean estimate of 40 MM and a standard deviation of 62MM; not a very accurate predictor. The model is also counterintuitive; a project will all zero values for variables is estimated at -33 MM; changing language from a higher order language to assembly adds 7 MM, independent of project size. One can conclude that there were too many nonlinear aspects of software development for a linear cost-estimation model to work.

Today, software size is the most important factor that affects software cost. There exist five fundamental software size metrics used in practice. Two of the most commonly used metrics include the “Lines of Code” and “Function Point” metrics. The Lines of Code metric is the number of lines of delivered source code for the software and it is known as LOC [9], and is programming language dependent. Most models relate this measurement to the software cost, but the exact LOC can only be obtained after the project has completed. Thus, estimating project costs becomes substantially more difficult.

One method for estimating code size is to use experts’ judgment together with a technique called PERT (Program Evaluation and Review Technique)[2]. The model is based upon three possible code-sizes: S1, the lowest possible size; Sh the highest possible size; and Sm, the most likely size. An esitmate of the code-size S may be computed as .This formula is valid for modular code components and can be summed with other code components size values.