INTEGRATION OF SOFTWARE COST ESTIMATES ACROSS COCOMO, SEER-SEM, AND PRICE-S MODELS
W. Thomas Harwick, Engineering Specialist
Northrop Grumman Corporation, Integrated Systems
El Segundo, California
October 18, 2004
Abstract
Some of the classical challenges using a single software cost model include determining software project size, normalizing size to equivalent new source lines of code, selecting appropriate environmental factors, calculating and communicating the cost results to the software customer.
When different software cost models are used by different suppliers, understanding how one model translates to another model complicates the task to integrate the software estimates and to communicate them to the buyer.
This paper will attempt to make model integration easier by showing the basic economic properties for each of these models:
- COCOMO
- SEER-SEM
- PRICE-S.
Economic properties to be explored will include economies of scale, productivity (cost) drivers, process maturity, integration complexity, and schedule compression. The productivity (cost) drivers to be examined include staff, product complexity, market, development tools, process maturity, and schedule compression. The method used will be the “ceterus parabus” tool from the field of economics. This is a way to examine each important parameter one at a time, while holding the other parameters constant about a specified baseline.
The results of the “ceterus parabus” analysis will be used to show the differences between the software cost models. (The SEER-SEM and PRICE-S models are not as open as COCOMO).
A notional example (baseline) will be generated that includes sensor software using SEER-SEM, air vehicle software using PRICE-S, and mission control station software using COCOMO. A project cost risk result will be integrated via a summary model that captures normalized sizing data, standardized productivity data as determined from each model’s economic properties. A consistency check will also be performed across these three models using COCOMO.
Last, this paper will also endeavor to address the impact of the Unified Modeling Language (UML) and Object Oriented Analysis (OOA) on software design productivity.
Introduction
Integrators of software development activity have a challenge when integrating the software cost estimates, often from different software cost models used by major subcontractors. This paper will attempt to make model integration easier.
This will be accomplished through the following steps. First, by classifying each of the cost drivers from all three models. Second, show the economic properties of each of the software development cost models. Third, a baseline example will be developed for a notional air vehicle, sensor manager, and control station. Lastly, the cost models will be integrated by using the model sizing definitions, and economic properties. The cost/risk will be calculated using the Monte Carlo risk simulation method with the sizing probability distribution and the productivity probability distribution.
Overview of the Software Cost Models
A first step is the identification of independent variables that “explain” changes in cost. These will be called “cost drivers” throughout this paper. In mathematical form:
Cost = f(x1,x2,x3,...).
Each of the “x” terms is called a cost driver. The terms that explain cost drivers are called “root cost drivers” in the sense that they explain the cost drivers. In terms of the models, root cost drivers form the basis for the knob settings used in selecting the cost drivers values.
It is assumed that the reader has access to the detailed cost driver definitions of each of these models. [1] A list of cost drivers can be seen by reviewing the cost models COCOMO II, SEER-SEM, and PRICE-S. A partial list, grouped, is included below.
Team & Process / COCOMO II / SEER-SEM / PRICE-STeam / ACAP, PCAP, APEX, PLEX, LTEX / Analyst Capabilities; Analyst Experience; Programmer Capabilities; Programmers Language Experience / INTEGI
Process / TOOL, PCON, PVOL / Development Method –
KnowledgeBase; and
Practices and Methods Experience / CPLX1
Market/ Customer / COCOMO II / SEER-SEM / PRICE-S
Market / RUSE, SITE, SECU * / Requirements Definition Normality ; Development Method –
Application and
Acquisition Method-
KnowledgeBase; Multiple Site Development / CPLXM, CPLX1
Schedule / SCED / Required Schedule; Start Date / DSTART, PEND
Reliability / RELY
Certification Requirements / DOCU / Development Standard - KnowledgeBase / Standard
Complexity / COCOMO II / SEER-SEM / PRICE-S
Complexity / CPLX, DATA, STOR / Application –
KnowledgeBase / UTIL, APPL
Operating Environment & Technology / TIME / Platform; and
DevelopmentMethod - KnowledgeBase / PLTFM
Sizing & Exponent / COCOMO II / SEER-SEM / PRICE-S
Size / Raw KSLOC / New Lines of Code; Pre-existing SLOC. / Raw SLOC
Normalized size / Equiv. new KSLOC / New Lines of Code / NEWD, NEWD, SLOC
Exponent / PREC, FLEX, RESL, TEAM,
PMAT- or SEI rating
Scope / COCOMO II / SEER-SEM / PRICE-S
Hardware dev. in parallel with software / Not in the default model / CPLX2
System Integration
Table 1: Cost Drivers by Software Model
Model Sensitivity and Baseline Productivity Range
A CSC of 50,000 source lines of code (SLOC) is used to iterate each of the models through their (major) cost drives to show the sensitivity analysis.
Each of the models is analyzed using the Ceteris Paribus method practiced in economics. The Ceteris Paribus method involves varying the cost variable under consideration and noting the changes in the cost estimate - while holding all other cost drivers constant. The cost sensitivities are recorded for each major cost driver. [2] This yields the data to plot the ranked histograms shown below. A mathematically oriented person can alternatively think of the concept of “partial derivative” instead.
Next, we take the top several cost drivers, and their derived ranges (local variation from the ceterus parabus baseline), and calculate the hours/SLOC range about the model baseline. This will yield a hours/SLOC range of high productivity, baseline productivity, and low productivity scenario for each model.
Figure 1: Local Cost Driver Sensitivity - COCOMO II Model
The COCOMO baseline cost driver settings are listed next to the bar plots. The local sensitivity ranges are shown in the chart above. For the COCOMO “local” sensitivity analysis, the settings were moved +/- one setting from the COCOMO baseline. Thus, the COCOMO settings will differ somewhat from the (global) “Software productivity range” shown on the cover of Boehm’ “Software Cost Estimation with COCOMO II”.
These Environmental Factors were derived about the “Baseline” settings listed on the left hard column. The impact factors are +/- one COCOMO setting. [3]
In COCOMO, the Environmental Factors are multipliers.
The COCOMO hours/SLOC data is summarized in the table below.
The major Environmental Factor cost drivers, in this example, are:
- Staff Capability (ACAP + PCAP) = 1.94
- Complexity (project complexity (CPLX) = 1.90
- Personnel Continuity staff (PCON) = 1.68. [4]
Cost Driver (group) for COCOMO II / High Productivity / Baseline Productivity / Low Productivity
Staff Capability / ACAP=”VHigh”;
PCAP=”HIgh” / ACAP=”High”;
PCAP=”Nom” / ACAP=”Nom”;
PCAP=”Low”
Complexity / CPLX= “High” / CPLX= “XHigh” / CPLX= “VHigh”
Personnel Continuity / “High” / “Nom” / “Low”
EAF= 1.95 / EAF = 3.32 / EAF = 5.79
Person Months (excludes security, HW & SW integ, & system integ) / 439 / 748 / 1,305
Hours/SLOC / 1.33 / 2.27 / 3.97
Table 2: Hours per Source Lines of Code - COCOMO
The Environmental Factor result in COCOMO II is a multiplicative result of the 17 cost drivers. In symbols:
SEER-SEM Sensitivity analysis:
For the SEER-SEM “local” sensitivity analysis, the settings were moved +/- two setting from the SEER-SEM baseline. (There are about 9 settings available in SEER-SEM and only 5 discrete settings available in COCOMO).
Figure 2: Local Cost Driver Sensitivity – SEER-SEM Model
The major SEER-SEM Factor cost drivers, in this example, are:
Security Requirements = 1.75
Staff Capability (Analyst & Programmer) = 1.47
Volatility (Requirements, Modern Practices) = 1.32.
The hours/SLOC data for SEER-SEM is summarized in the table below:
Cost Driver (group) for SEER-SEM / High Productivity / Baseline Productivity / Low ProductivitySecurity Requirements / “Nom-” / “Nom+” / “Hi”
Staff Capability / “Nom” / “Nom+” / “Hi”
Volatility / “Nom+” / “Hi” / “VH-”
Hours / 75,200
/ 03,900
/ 82,700
Hours/SLOC / 1.50 / 2.08 / 3.65
Table 3: Hours per Source Lines of Code – SEER-SEM
PRICE-S Sensitivity analysis:
For the PRICE-S “local” sensitivity analysis, the settings were moved +/- one setting from the SEER-SEM baseline.
Figure 3: Local Cost Driver Sensitivity – PRICE-S Model
The major PRICE-S Factor cost drivers, in this example, are:
- Operating Environment (PLTFM) and Productivity (PROFAC) (1.75)
- Staff Capability (INTEGI, INTEGE – crew; Team famil. W prod. Line (CPLX1) (1.55)
- Complexity (INTEGI, INTEGE integ. Cplx).
Cost Driver (group) for PRICE-S / High Productivity / Baseline Productivity / Low Productivity
Operating Environment / PLTFM=1.7;
PROFAC=7.0 / PLTFM=1.8;
PROFAC=6.5 / PLTFM=1.9;
PROFAC=5.5
Staff Capability / INTEGI=0.5; CPLX1
(crew= -0.2) / INTEGI=0.7; CPLX1
(crew= -0.1) / INTEGI=1.0; CPLX1
(crew= 0.0)
Hours / 141,056
/ 220,400
/ 330,904
Hours/SLOC / 2.82 / 4.41 / 6.62
Table 4: Hours per Source Lines of Code – PRICE-S
The hours/SLOC data for PRICE-S is summarized in the table above.
Inter-Model Comparison of Cost Drivers
The Inter-model comparison will include both sizing and the other cost drivers. We first start with the sizing and scope question. We will attempt a brief introduction to sizing for each of the models.
Software Sizing by Model:
The COCOMO II model uses equivalent new source lines of code (SLOC) for sizing the software effort. Re-used code is reduced to equivalent new code by determining the ratio of effort relative to a new line of code for design, code and unit test, and integration and test phases. These relative efforts are multiplied by the respective phase weights, and then summed, to yield an equivalent new SLOC count for the reused code.
Two additional parameters are added by COCOMO II that apply to software re-used code. They are Software Understanding and Assessment and Assimilation. [5] The resulting equivalent new re-used code is added to the actual new code to yield the total new SLOC. There are also special development cases that are handled differently, such as auto-generated code. [6]
The SEER-SEM model has three major categories for determining the overall SLOC count: (1) New Lines of Code, (2) Pre-existing lines of code (not designed for reuse), (3) Pre-exists, Designed-for-reuse.
It appears from using the model that the calculations for the “Pre-existing” lines (not designed for reuse) is basically a multiplicative and additive combination (linear combination) of the amount of effort in Redesign, Re-implementation, and Retest ratios multiplied by the respective phase weight. This result is then multiplied by the Pre-existing SLOC. This yields the “Pre-exists”, Designed-for-reuse code size.
The “Pre-exists” (designed-for-reuse) would be calculated in a similar fashion.
That is, the phase ratios are multiplied by the respective phase weights, and are then multiplied by the respective Pre-existing SLOC - designed for reuse.
The PRICE-S model includes a ratio for percent of new code design and new SLOC that is coded. There is also a “Language” parameter that normalizes SLOC size between languages and a “FRAC” factor that normalizes out comment lines.
Scope of the Models
It is also critical to understand the scope of the software cost estimate made by each model. We discuss the scope of each model briefly. The major point is to investigate the model differences when comparing estimates. What may seem to be large differences in productivity rates can be due to assumptions differences within each of the cost models.
COCOMO
There are four major phases in COCOMO I and COCOMO II. They retain the assumption of a “waterfall” software development approach. These four phases are:
- Plans and Requirements
- Product Design
- Programming
- Integration of test of the software product.
COCOMO does not include hardware and software integration. The author does not believe that it includes software developed in parallel with hardware. This is due to the differences between the software project sizes cited in the COCOMO I database (prior to 1981) and the size of the larger software projects of today. [7]
The “Plans & Requirements phase includes plans for required functions, interfaces and performance. These requirements are used to define the capabilities of the software product.
The “Product Design” phase includes defining the hardware/software architecture, control structure and data structure for the product. Test plans are also created in this phase.
In the Programming phase software components are created per the design and interface, performance requirements.
For the Integration & Test phase, the software components are brought together to construct a properly functioning software product. The product is composed of loosely coupled modules. The requirements are used to determine the usefulness of the delivered software product.
The SEER-SEM and PRICE-S phase outputs were determined by running each model with the Baseline of 50,000 SLOC. The percentages are listed in the table below.
Model / COCOMO I(Embedded Mode) [8] / COCOMO II [9] / SEER-SEM / PRICE-S
Plans & Requirements / 7 / 7 / 4 / 13
Product Design / 16 / 17 / 6 / 13
Programming
(detailed design & code/ unit test / 53 / 58 / 60 / 30
Integration & Test / 25 / 25 / (part of programming) / 22
HW and SW Integration / 0 / 0 / (part of System Integration) / 11
System Integration / 0 / 0 / 30 / 11
Table 5: Percent Phase Effort by Model
Inter-Model Comparisons – Environmental Factors
This table shows the approximate percentages that apply to each factor for the following categories:
- Team & Process
- Market
- Complexity
- Scope (between models)
Some additional cost drivers that are not present in the COCOMO model have been labeled called scope (for beyond the default model) parameters. These parameters include (1) hardware developed in parallel with software, and
(2) system integration of large software code sections. Both of these factors are beyond the scope of COCOMO I and likely COCOMO II.
The size of most of the software projects shown in Boehm’s Software Engineering Economics are not large enough to include system integration across larger CSCs. Since the author has not seen the database for COCOMO II this observation may only hold for COCOMO I).
Team & Process / COCOMO II / SEER-SEM / PRICE-STeam / ACAP (1.41) / Analyst Capabilities (1.20) / INTEGI (crew) (1.58)
PCAP (1.31) / Programmer Capabilities (1.18) / PROFAC (1.35)
APEX (1.23) / Analyst Experience (1.21) / CPLX1 – Product familiarity (1.25)
PLEX (1.20)
LTEX (1.20) / Programmers Language Experience (1.05)
Process & Tools / PVOL (1.99) / Development Method –
KnowledgeBase; / CPLX1 (1.10)
PCON (1.24) / Process & Reqmts. Volatility (1.32) / CPLX!-Reqmts. Volatility (1.13)
TOOL (1.21) / Practices and Methods Experience / CPLX1-Tools (1.23)
Table 6: Baseline - Team & Process Cost Factors
Market / COCOMO II / SEER-SEM / PRICE-SMarket / RUSE (1.31)
SITE (1.22) / CPLXM (1.2)
SECU (1.10) / Security Requirements(1.75)
Schedule / SCED (1.14)
/ Required Schedule; Start Date / DSTART, PEND
Reliability / RELY (1.15)
/ Requirements Definition Formality; (part of Platform, Application -Kbase) / (part of PLTFM)
Certification requirements / DOCU (1.22)
/ Development Standard - Kbase
Table 7: Baseline - Market & Customer, Safety Cost Factors
Complexity / COCOMO II / SEER-SEM / PRICE-SComplexity / CPLX (1.30)
/ Application (Kbase) / APPL
DATA (1.28) / UTIL
STOR (1.05) / INTEGI (Timing. Coupling) (1.55)
Operating Environment & Technology / TIME (1.26) / Platform; and
DevelopmentMethod - KnowledgeBase / PLTFM (1.25)
Table 8: Baseline – Complexity & Operating Environment Cost Factors
Scope b/t Models / COCOMO II / SEER-SEM / PRICE-SHardware developed in parallel w/ software / Not included -added 1.11 / Hardware Integration Level (1.09) / CPLX2 (1.20)
System Integration / Not included -added (1.12) / Programs Currently Integrating (1.04) / INTEGE (1.13)
Table 9: Cost Model Normalization Factors
Economic Properties of These Cost Models
Cost models can be evaluated for their economic properties. This allows for another method to compare cost model properties. The categories we shall use are (1) diseconomies/economies of scale, (2) impact of schedule (rate of development), (3) productivity associated with the non-exponent cost drivers.
Diseconomies/Economies of Scale (Size)
In this section, we ask what happens to cost as the size of the product is doubled or quadrupled. This addresses the development economics of the size of product or proposed product.
The exponent of the sizing parameter determines a model’s economies of scale. In software development, economies of scale have not existed until very recently. Historically, there has only been dis-economies of scale that means that the exponents are greater than one.
The COCOMO II Model allows for the possibility of economies of scale if the 5 scale factors have a Very High or Extra High rating. [10] The COCOMO model has a value range from 0.91 to approximately 1.26. [11]
Figure 3: Economies of Scale for Software Cost Models
Economies of scale (exponent with respect to size of project less than one) may be possible. This would require preliminary and detailed design strategies that have a lot of code commonality. This could be done, in principle, by making use of parent-child classes (and corresponding objects) as well in “inheritance” properties of classes (and corresponding objects). Some types of applications may lend themselves to economies of scale in design, and perhaps, also code implementation. It is noted that the Unified Modeling Language (UML) can be used to architect systems at the analysis step. UML can also be used to generate class skeletons and basic structure. It remains to be seen what will be the impact upon software design productivity and decreases in system design defects. [12], [13]