21st International Symposium on Military Operational Research September 2nd, 2004
An Introduction to the Development, Application, Uses and Limitations of
STAMPER: Systematic Task Analysis for Measuring Performance and Evaluating Risk
Eugenia Kalantzis
Operations Research Analyst
Director General Land Combat Development
FortFrontenac, PO Box 17000 Station Forces
Kingston, Ontario, CanadaK7K 7B4
(613) 541-5010 x 2469
Abstract
A central activity within the Canadian Army's combat development process is a series of seminar wargames used toevaluate experimental force structures and capabilities. In support of this goal, the Director General Land Combat Development (DGLCD) operational research team was tasked with providing analytical support and guidance to the seminar process and to develop a robust methodology to measure the performance of competing systems.
Working closely with the client, the team developed a methodology based on the evaluation of a system’s performance of a predetermined list of tasks. This methodology is referred to as STAMPER, or Systematic Task Analysis for Measuring Performance and Evaluating Risk. The methodology involved the development of a set of task lists, as well as software-supported tools for the measurement of performance and risk, and for the computation of overall measures of performance for each of the five operational functions of Command, Sense, Act, Shield and Sustain. These measures of performance are based on a simple model that, although useful in providing indications of general trends, must be interpreted with extreme caution.
This paper describes the steps involved in the development and application of the STAMPER process. In particular, it explains the crucial role of the client-analyst collaboration, it describes the simple models used to assign measures of risk and to calculate overall performance measures, and it lists the strengths and limitations of these models along with examples of possible misuse and misinterpretation of the results.
BACKGROUND
In June 2003, the Director of Army Doctrine (DAD) was tasked with the development of a force employment concept for the Interim Army. In support of this initiative, DAD directed the design and execution of a series of wargames to assess the evolving concepts. Warfighting was selected as the focus of the initial seminar wargames; however, in due course, the seminar series will expand to include peace support, non-combatant evacuation and emergency domestic operations.
The first seminar wargame, Force Employment Wargame 0401, was held in February 2004 and examined the performance of a Main Contingency Force brigade group and battle group. The Canadian force structure for the first seminar was based on equipment available at the time, including Leopard tanks and M109 medium guns. The aim of the first wargame was to establish a baseline of performance against which the impact of proposed organizational changes may be assessed, with the intent of furthering the force employment concept for the Interim Army.
The second seminar wargame, Force Employment Wargame 0402, was held in May 2004 and was modelled on the baseline seminar wargame. The purpose of this wargame was to evaluate the performance of a proposed Interim Army force structure and to compare these results with those obtained in the baseline wargame. To support this aim, the scenarios and the Red Force remained essentially unchanged; however, the Blue Force structure was modified to include weapons systems and equipment that are programmed to be in place for the Interim Army. Of particular note was the removal of tanks and M109s and the insertion of the Mobile Gun System, the TOW missile system on a LAV chassis and the Multi-Mission Effects Vehicle Version 1, all part of the direct fire system. A mobile artillery vehicle using a 105 mm gun mounted on the bed of a variant of the Mobile Support Vehicle System was also introduced.
SPONSOR OBJECTIVES
The ongoing objectives of the sponsor are to assess the impact of changes in structures, equipment and capabilities that will come into effect during the Interim Army timeframe with a view to further refining the force employment concept in preparation for field trials at Canadian Manoeuvre Training Centre and eventual incorporation into doctrine, including Tactics, Techniques and Procedures (TTPs) and Standard Operating Procedures (SOPs).
Insights and judgments resulting from these seminar wargames are intended to guide follow-on seminar wargame iterations, to further the Interim Army Force Employment Concept development process, and to prioritize operational research activities by identifying spin-off issues that would be most effectively dealt with using more traditional computer-assisted wargaming techniques.
SEMINAR WARGAME SCENARIOS
The scenario used for the wargame series was based on the Department of National Defence’s Force Planning Scenario 11, but in a time frame situated 10 years after the original conflict. The scenario included an emphasis on urban operations and the ‘three block war’. Six vignettes were wargamed - three by the brigade group and three by the battle group. On Day 1, conducted in open terrain, the Canadian brigade group and battle group were tasked with capturing objectives and destroying enemy forces that were established in a fairly conventional defensive position. On Day 2, conducted in urban terrain, operations involved seizing key nodes within the city core. On Day 3, set three days after the cessation of formal hostilities, operations included a mission in mountainous terrain, concurrent with stability operations over a large sector.
OPERATIONAL RESEARCH TEAM ROLES AND OBJECTIVES
The operational research (OR) team was responsible for defining the problem and scope of the exercise, selecting and designing the appropriate methodologies and criteria for investigating the issues, balancing the methodologies against constraints or time and resources, implementing the data collection plan, and collecting, extracting and presenting the results. In addition to these traditional OR responsibilities, the specific objectives for this exercise were to develop a comprehensive and robust methodology to collect quantitative and qualitative measures of performance, and to design a process by which performance and risk may be evaluated quantitatively and compared across vignettes and seminar wargames. This later objective included the design of a final product that was both simple and easy to interpret by a diverse audience within the Canadian Army.
DESIGN OF THE DATA COLLECTION METHODOLOGY
The importance of the client-analyst collaboration was evident in the first stage of the project, i.e. the design of the data collection methodology. The design of an appropriate methodology required a perfect understanding of the sponsor’s objectives and requirements, as well as an understanding of the constraints imposed by the seminar wargame process itself. Given the very tight timelines, the design of this methodology was performed in parallel with the sponsor’s design of the seminar wargame process itself. Daily interaction and close collaboration was essential to ensure the two activities converged to form a well-knit process.
In the design of the data collection methodology, the following points were of particular importance:
- The nature of a seminar wargame is such that there is much hidden complexity in the verbal interplay. Essentially, every discussion potentially contained an argument related to strengths or limitations buried within it. If not actively recorded, these points risked being lost.
- Particularly in the case of the Interim Army, there is imperfect or incomplete knowledge related to capabilities, equipment, and structures. As such, any model designed must not require input that is more in depth than what is available. Stated simply, the model should not be over designed.
- The transient nature of the wargaming participants, particularly the combat developers that make up the core team, requires that the methodology be simple and quick to adopt by new players.
- Participants are classified as either part of the core team, as subject matter experts, or as observers. Different mechanisms must be put into place to record and classify observations from each of these groups.
- The task loading for the core team, i.e. the combat developers, and the time constraints imposed by the seminar wargame process necessitate a data collection scheme that is relatively quick and easy.
In consideration of the aforementioned points, a two-tier approach was taken in the development of the analysis methodology to address both the quanlitative nature of the seminar wargame format and the requirement for quantitative results. First, the qualitative data collection process involved the conduct of formal judgment and insight session, the submission of observation sheets from all participants, as well as the compilation of strengths, weaknesses and issues matrices by the combat developers for each of the five operational functions. Qualitative data was collected and categorized in an ACCESS database designed to allow the analyst to selectively filter the observations, and to produce reports that could be channelled to an appropriate member of the staff for further action. Qualitative data collection will not be discussed further in this report. Quantitative data analysis was conducted using the STAMPER methodology, as described herein.
STAMPER – Systematic Task Analysis for Measuring Performance and Evaluating Risk
The purpose of the STAMPER methodology was to provide a systematic framework to measure performance and identify risk factors, and to compare variations in system performance between different vignettes, as well as from one seminar wargame iteration to the next. This was done with a series of survey instruments used to elicit the subjective judgements of a team of assessors in the evaluation of the system’s performance of essential tasks. In addition, these task lists provided a framework for discussion during the seminar wargame sessions, as each combat developer assessed the performance of the tasks under his operational function.
In the selection of the models to calculate overall performance and to assess risk, the use of a simple model was deemed most appropriate due to the incomplete knowledge of the innovative, and often exploratory concepts and equipment introduced during the course of this seminar wargame series. The expected model fidelity was matched to the question at hand, and the model was designed to provide a level of detail that was equal to and supported by the level of fidelity of the inputs available to the model itself.
The development of the STAMPER methodology began three months prior to the baseline seminar wargame, and consisted of the following exercises:the development of the DGLCD Task Lists, the creation of the survey instruments and automated tools, and the development of a measurement methodology to identify risk factors and assign appropriate scores to individual tasks, and to the overall operational functions.
DGLCD Task List
During the task list development stage, a comprehensive task list analysis exercise was performed by the DAD combat developers and their staff. The objective of this exercise was to produce a complete list of tasks under the responsibility of the Army, divided into the five operational functions of Command, Sense, Act, Shield and Sustain. These are referred to as the DGLCD Task Lists (DTLs).
The DTLs draw their origins from recognized task lists such as the Canadian Joint Task List, the subsequently developed Canadian Army Task List, the Canadian Brigade Battle Task Standards, as well as other informally developed task lists. For each operational function, Level 1, Level 2, and Level 3 tasks were identified, with each level depicting finer granularity than the previous. Figure 1 depicts a sample of the Command task list breakdown.
Figure 1Sample of the Command Task List Breakdown
Survey Instruments
The DTLs form the foundation of the survey instruments used to facilitate the elicitation of expert opinion, and to measure performance and evaluate risks factors observed during each of the seminar wargames. In the completion of the survey instruments, assessors were asked to evaluate tasks along two dimensions: performance level, and impact/importance of the task on the completion of the higher-level task. Table 1 presents the two questions that appeared on the survey instruments, as well as the response options available to the participants. Additionally, the table specifies the category of tasks applicable to each questions. Of particular note is that participants were required to score the performance of Level 3 tasks only. The performance of Level 2 and Level 1 tasks were automatically calculated from the performance scores of Level 3 tasks in combination with the impact scores of Level 3, Level 2 and Level 1 tasks.
Table 1 Survey Instrument Questions and Response Options
Calculating Performance Scores and Assigning Risk
For each of the Level 2 and Level 1 tasks, as well as the overall operational functions, a measure of performance was calculated based on Question 1 and Question 2 results for Level 3 tasks, in combination with Question 2 results for Level 2 and Level 1 tasks.
The calculation of performance is based on an inner-product rule in which each task is attributed points as a function of performance and impact. Points are attributed as per the Point Allocation Matrix in Figure 2. Essentially, the measure of performance of a Level 2 task is the weighted average of the performance scores assigned to the Level 3 tasks belonging to it. Weights were assigned to tasks based on the survey responses to Question 2 on impact; tasks estimated to be of a higher importance were assigned a higher weight. As such,
Performance score of a Level 2 task = Sum of points for Level 3 tasks belonging to the Level 2 task
Sum of weights for Level 3 tasks belonging to the Level 2 task
In a similar fashion, the measure of performance of a Level 1 task is the weighted average of the performance of the Level 2 tasks belonging to it. And finally, the measure of performance of an operational function is the weighted average of the performance of the Level 1 tasks belonging to it. However, in the case of the roll-up for Level 1 tasks and for overall operational function performance scores, the points assigned to each sub-task are calculated as the product of the sub-task performance score, as calculated in the previous step, and the appropriate weight assigned as a function of the response
to Question 2 on impact.
Figure 2 Point Allocation Matrix
For each of the Level 3 tasks, a measure of risk was obtained using the results of Question 1 and Question 2. A risk indicator was assigned to each task following the Risk Assessment Matrix in Figure 3. The lower the score, the higher the risk associated with that task. Low scores, colour-coded in red and yellow, identify tasks that would endanger mission success, whereas high scores, colour-coded in green and blue, identify tasks that would contribute to mission success.
The assigned risk indicators were not rolled up as in the analysis of the overall performance measures. This was done to ensure that visibility of high-risk tasks is maintained throughout the exercise, and that this information was not obscured when results are merged to obtain measures of performance at higher levels. Instead, these scores would be compared with those collected in future iterations, and changes would be examined at this level of granularity.
Figure 3 Risk Assessment Matrix
Automated Tool to Calculate Performance, Risk and Deltas
In preparation for the seminar wargame, an Excel-based analysis tool was designed to automatically assemble responses from participants, to assign a measure of risk as a function of performance and impact scores, to calculate a score of overall performance for Level 2 and Level 1 tasks, and to roll-up these scores into an overall measure of performance at the operational function level. In addition, following the second seminar wargame, the tool was updated to automatically display the change is performance and impact/importance ratings, as well as the change in calculated performance and risk scores. This analysis was performed for each of the six vignettes, and for each of the five operational functions. Figures 4 and 5 depict snapshots of the tool as it is used to automatically calculate Level 2 and Level 1 scores, and to display changes in performance of scores from one seminar wargame to the next, respectively.
Strengths
Among others, the use of the STAMPER methodology presented following benefits:
The development of the process itself required a close collaboration between the sponsor and the analyst. This collaboration engaged the participants fully, and instilled a sense of ownership in the process that was essential to the success of the exercise.
The process provided a framework for a structured evaluation of performance across a wide and complete range of tasks.
The quantitative results complemented the results extracted from the qualitative data collection.
As a visualization tool, the methodology was successful in providing a quick identification of performance and risk results, as well as changes in these measures from one iteration to the next.
The automated tool ensured the availability of near real-time results. These results were then available at end-of-day judgments & insights sessions, to be used as required.