Generation of Human Reliability Data for the Air Traffic Industry

Generation of Human Reliability Data for the Air Traffic Industry

Generation of Human Reliability Data for the Air Traffic Industry

Barry Kirwan, EUROCONTROL / Brian Hickling, EUROCONTROL
Eric Perrin, EUROCONTROL / Huw Gibson, Birmingham University / Ed Smith, DNV Consulting

1Copyright © #### by ASME

SUMMARY

Air Traffic Management (ATM) deals with the safe and efficient passage of aircraft across national and international airspace. In Europe, ATM, as with other industries, must now comply with formalized risk assessment procedures, for example those embodied in EUROCONTROL Safety Regulatory Requirements (ESARR) 4 [1]. In order to demonstrate that systems are acceptably safe, a Safety Assessment Methodology (SAM) has been proposed by EUROCONTROL [2] and is being applied by many countries in Europe. Safety cases for existing or new systems, or significant system changes, can utilize fault and event tree (or equivalent) approaches to model and quantify risk as is done in other industries such as nuclear power, chemical, process, and petrochemical. However, there has been little emphasis to date on Human Reliability Assessment (HRA) in the world of ATM, although it is recognized that the high degree of safety evident in this industry is mainly due to the human element (in particular the air traffic controller). The context of this paper therefore concerns the feasibility of HRA in air traffic risk assessments. As a first step towards HRA in ATM, this paper focuses on the degree to which quantitative human error data can be generated to substantiate or calibrate an ATM HRA approach.

Two separate exercises are reported. The first concerns collection of human error data from a real-time simulation involving air traffic controllers and pilots. This study focused on communication errors between controllers and pilots. The second relates to a formal expert judgment study using direct numerical estimation (also called Absolute Probability Judgment) and Paired Comparisons protocols to elicit and structure the controller and pilot expertise.

The results showed that stable HEPs can be provided from real time simulations, at least with respect to communications activities, and to a lesser extent from expert judgment approaches. These results suggest that the approach of HRA can be adapted to ATM safety case methodologies and frameworks. An example of a recent developing air traffic safety case which has utilized the HRA approach is briefly discussed.

The conclusion is that HRA is feasible, but that more data do need to be collected, since ATM dynamics and safety scenario timings, as well as its operational culture and performance shaping factors, are different to other industries where HRA application is ‘the norm’.

1Copyright © #### by ASME

introduction

Human Reliability Assessment (HRA) has been around for some time, notably since the Three Mile Island nuclear power plant accident in 1979, when the Technique for Human Error Rate Prediction (THERP) [3] became the predominant technique in use. Since that time, the usage of HRA has spread to other process-control-related industries (petrochemical; chemical and process). Recently it has also spread to the transportation sector, notably the rail industry, and the medical and air traffic management domains are increasingly focusing on the management and assessment of human error [4].

HRA has several main attributes [3, 5], principally determining what can go wrong (human error identification), determining the risk significance of errors (or correct performance: the kernel of this function being human error quantification), and identifying how to mitigate human-related risks or assure safe human performance (error reduction). HRA can be seen as an engineering function (belonging principally to the sub-discipline of Reliability Engineering) or as a Human Factors function. In truth it belongs to both domains, and normally people carrying out HRA are ‘hybrid’ practitioners with a mixture of reliability engineering and Human Factors/psychology expertise. HRA is however usually most clearly associated with its second main function, namely the quantification of human error likelihood (i.e. how frequently will a particular error actually occur?). This likelihood is usually expressed as a probability, called a Human Error Probability (HEP), which is effectively the probability of an error per demand. In theory, and in practice, HRA rests upon a fundamental premise that HEPs can be quantified, i.e. that they exist as stable quantitative values.

The quantitative expression for an HEP is straightforward and is shown below, based on the idea that a HEP can be measured by observation[1]:

HEP = number of errors observed / number of opportunities for error

Whenever HRA is being applied to a new industry, it must map onto the risk or safety management framework and indeed the organizational ‘culture’ of that industry. Where such an industry already has a quantitative approach to risk (e.g. nuclear power), HRA may ‘find its niche’ quickly. Conversely, where risk is managed more qualitatively, it may encounter resistance and incredulity from those who need to be convinced in order for it to be utilized effectively. Thus, HRA in the medical domain at the moment is mainly being used qualitatively (for error identification and reduction purposes). But in ATM, the whole industry in Europe has been undergoing a paradigm shift in safety assessment, moving from a more ‘implicit’ and qualitative approach to a formalized methodological framework with quantitatively stated target levels of safety that safety cases need to reach. This leads to formal use of techniques such as fault and event trees, and in some cases dynamic risk assessment approaches. The former approaches in particular will need HEPs if human error plays a significant role in ATM safety. In fact, human performance dominates safe system performance in ATM (controllers literally control where aircraft go, in real time, using radar and radio-telephony), and so should be a major component in any safety case. Nevertheless there is some skepticism in the industry as to whether HRA is meaningful, and in particular, whether HEPs are a stable entity. Since ATM is significantly different to industries where HRA is accepted (ATM is very dynamic and its human elements operate in a much faster timeframe than, say, nuclear power), the argument that ‘HRA works and is accepted elsewhere’ is not sufficient.

As a preliminary step to developing a HRA methodology for ATM, it was decided to explore whether HRA’s basic premise that HEPs can be quantified and remain stable, is true also in ATM. However, ATM’s culture is more judgment-based than scientifically based (again, unlike nuclear power, for example). Many of those in key management positions today were once controllers themselves and would be more convinced by controllers saying that HRA’s estimates were reasonable, than by reading a scientific or academic report. Therefore a two-pronged approach was taken to explore the applicability of HRA to ATM. The first was to determine if HEPs could be collected in realistic (high fidelity) real-time simulations of ATM operations[2]. The second approach was to carry out expert judgment procedures to quantify HEPs for an ongoing ATM safety case, using controllers and pilots as subjects. The hypotheses were simple, and capable of being rejected (i.e. scientifically they are ‘falsifiable’):

  • Can robust HEPs be elicited for ATM safety related tasks?
  • Are HEPs as may be used in safety cases credible to controllers and pilots?

If the answer to either is ‘no’, then this bodes badly for HRA in this work domain, and perhaps other approaches to manage human related risk must be investigated. If the answer to both is ‘yes’, then it means HRA in ATM is a feasible proposal. It does not of course mean the road to full acceptance and implementation will be easy, or even acceptable, but that is beyond the scope of this paper, which seeks only to establish feasibility at this early stage of HRA in ATM. The remainder of this paper therefore presents the abridged results of the two studies, showing indeed that the premise for HRA in ATM appears to be supported. References to the full studies are given. A final concluding section outlines a way forward for developing HRA as an effective approach in ATM.

Study 1: HEP data collection in real-time simulation (Co-Space) [6]

The primary principle in air traffic management is to keep aircraft separate, by certain minimum distances both vertically and horizontally. This is currently generally the controllers’ task when dealing with civil/commercial traffic. This task can lead to high workload in certain high density areas, as different ‘streams’ of aircraft are approaching busy airports. An option is therefore to allow the crew of one aircraft some degree of autonomy in separating their aircraft from the one in front, for example, via the use of specialized cockpit equipment. In the EUROCONTROL Co-space project, therefore, a new allocation of spacing tasks between controller and flight crew is envisaged as one possible option to improve air traffic management. It relies on a set of new “spacing” instructions, whereby the flight crew can be tasked by the controller to maintain a given spacing to a target aircraft. The motivation is neither to “transfer problems” nor to “give more freedom” to flight crew, but really to identify a more effective task distribution beneficial to all parties, without modifying responsibility for separation provision. In Co-Space the airborne spacing assumes availability of airborne Automatic Dependent Surveillance (ADS-B) along with cockpit automation (Airborne Separation Assistance System; ASAS). ASAS is a set of new ATC instructions, to allow, under the right conditions, the delegation of separation from ATC to pilots. No significant change on ground systems is initially required. These procedures and systems are under development in the Co-Space project, and in parallel a number of extensive real-time simulations (RTS) are being conducted to evaluate the adequacy of the resulting system performance. These RTS are carried out to assess usability and usefulness of time-based spacing instructions in TMA under very high traffic conditions, with and without the use of spacing instructions. In the pursuit of HRA feasibility, and also because one day the airborne separation assurance concept may be the subject of its own quantified safety case, it was agreed by the project team that HEP data collection could be attempted during the simulation.

The Real Time Simulation (RTS) involved Approach controllers from Gatwick, Orly and Roma. They employed two generic approach sectors derived from an existing environment (Paris Terminal Maneuvering Area or TMA). Each air traffic control ‘sector’ (an area of airspace) was feeding into a single landing runway airport and was controlled by a unique Approach position manned with an executive and a planning controller. The role of each executive was to integrate two flows onto the final approach, and to transfer them to Tower controllers. Seven air traffic controllers (ATCO) positions, including the TMA one, were used for each one hour simulated session. The traffic was pre-sequenced when entering each approach sector via two initial approach ‘fixes’. The traffic followed standard trajectories. No departure traffic and no ‘stacks’ (vertical holding points, as used in some airports) were simulated. The RTS utilized paper (rather than the new electronic) strips and a separate arrival manager tool. Controllers talked to pilots using standard radio-telephony headsets.

Results

During the simulation, a total of 613 communication ‘transactions’ between controllers and plots were analyzed, which contained 3,411 communication elements, and a number of errors. Tables 1 & 2 show the types of error made and by whom (controller or pilot), and how the errors were recovered (Table 2). This typology is in the context of air traffic operations. Table 3 shows the types of error that occurred in more general terms – such information is useful when trying to determine how to improve human performance. Table 3 shows in particular that simple numerical errors are common. This is seen in practice when two aircraft having a similar call sign occur in the same controller’s airspace in real life, leading to what is called a ‘call sign confusion’ error. Such an error can lead to a loss of safe standard separation distance between aircraft if the controller gives the right message to the wrong aircraft. A number of airlines today work hard with the ATM community to try to prevent similar call signs occurring in the same sector of airspace, to reduce the frequency of this type of error and so avoid its potential consequences.

Table 4 shows errors with and without ASAS. Chi-square analysis was undertaken to identify if there were any significant differences between sessions where ASAS was used and those where ASAS was not used. No significant differences were identified. An equivalent analysis was also undertaken to identify if transactions which contained an ASAS instruction were more susceptible to communication errors than those which did not. Again, the analysis did not identify any significant differences between the two cases. This suggests that ASAS usage does not impact on the likelihood of communication errors. Table 5 shows errors by controllers and pilots (in this case ‘pseudo-pilots’ – however these are nevertheless actual qualified pilots, but working in a computer simulator workstation rather than a cockpit or cockpit simulator). The error rates are in this case strikingly similar even though the task environment and training is very different.

Table 1. Errors during the ASAS Real Time Simulation

Error type / Controller / Pseudo-pilot / Total / Percentage
Slip / 30 / 31 / 61 / 52%
No read back / 4 / 4 / 3%
No response / 1 / 1 / 2 / 2%
Contradict previous instruction / 2 / 2 / 2%
Query / 1 / 10 / 11 / 9%
Context required / 4 / 8 / 12 / 9%
Use of non-English language / 1 / 1 / 2 / 2%
Change of plan / 17 / 17 / 14%
Break / 2 / 2 / 2%
Station calling / 3 / 3 / 3%
Expedite / 2 / 2 / 2%
Total / 63 / 55 / 118 / 100%

Table 2. How errors were recovered

Recovery
Error type / None / Self / Other / Later / Not Identified / Total
Slip / 8 / 44 / 5 / 4 / 61
No read back / 1 / 2 / 1 / 4
No response / 2 / 2
Contradict previous instruction / 2 / 2
Query / 11 / 11
Context required / 2 / 10 / 12
Use of non-English language / 2 / 2
Change of plan / 17 / 17
Break / 2 / 2
Station calling / 3 / 3
Expedite / 2 / 2
Total / 9 / 81 / 13 / 5 / 10 / 118

Table 3. Details of error nature

Slip type / Frequency / Percentage
Incorrect numeric element within a numeric (e.g. 516 for 515) / 41 / 67%
Whole numeric substituted for another (e.g. say 56 7 for 123) / 1 / 2%
Numeric omission (e.g. say 1233 for 12335) / 1 / 2%
Phonetic alphabet / 3 / 5%
Company identifier (e.g. Britannia for Ryan air) / 5 / 8%
Pilot read back of controller use of 'please' / 1 / 2%
Repetition of phrases or call signs / 4 / 7%
Errors in words/sequences in a standard phrase (e.g. ‘its er two nine er flight level two nine zero’) / 5 / 8%
Total / 61 / 100%

Table 4: Communication errors and use of ASAS

Session used ASAS / Session did not use ASAS
All errors / 71 / 47
Slips / 39 / 22
Number of elements / 1921 / 1490
Likelihood of error / 0.0370 / 0.0315
Likelihood of slips / 0.0203 / 0.0148

Table 5: Controller versus pseudo-pilot slip rates

Controllers / Pseudo-Pilots
Slips / 30 / 31
Number of elements / 1705 / 1706
Likelihood of slips / 0.018 / 0.018

How do the data compare with data collected in the field?

Table 6 presents equivalent data from this study as the final column against data from actual studies for different UK and US airspace types (i.e. studies which have measured human error rates in the field). The data do provide very similar human error probabilities to the other studies. This suggests that the communication performance in the trial is similar to that experienced during live Air Traffic Control.

This study as a whole found that HEPs can be collected, and that they appeared to exhibit the property of stability. As a key finding for example, the likelihood of communication slips (e.g. say 5 4 6 when 5 6 4 was intended), were shown to be constant across a range of conditions. Differences were not identified between error rates with/without ASAS, between ASAS and other communications, between different instruction types, between different controllers or between controllers and pseudo-pilots. These results would therefore support the required premise for HRA, at least for the task of communication, which is itself a safety critical one in the ATM industry.

A separate study [7] presents unrecovered readback error rates for ATC communication of 0.006 from a number of field studies. The Co-Space simulator data provides an unrecovered readback error rate of 0.003. While low cell counts prohibit statistical comparisons, these data are certainly in the same 'ball park' and a tentative conclusion is that the performance in the simulation is comparable with data collected in the field.

Study 2: HEP data generation using expert judgment: The GBAS CAT-1 Safety Case

GBAS: Ground-Based Augmentation System

CAT-I/II/III operations at European airports are presently supported by an Instrument Landing Systems (ILS). The continued use of ILS-based operations as long as operationally acceptable and economically beneficial is promoted by the European Strategy for the planning of All Weather Operations (AWO). However, in the ECAC (European Civil Aviation Conference) region, the forecast traffic increase will create major operational constraints at all airports, in particular in Low Visibility Conditions (LVC) with the decreased capacity of runways. Consequently, the technical limitations of ILS such as Very High Frequency (VHF) interference, multipath effects due to, for example, new building works at and around airports, and ILS channel limitations will be a major constraint to its continued use. Within this context GBAS is expected to maintain existing all weather operations capability at CATI/II and III airports. GBAS CAT-I (ILS look-alike operations) is seen as a necessary step in order to extend its use to the more stringent operations of CAT-II/III precision approach and landing. Initial implementation of GBAS could be achieved in ECAC as early as 2008.

A safety case is therefore being prepared for GBAS to see if it can be implemented. Within this developing safety case a number of potentially critical human errors were identified in the associated fault and event trees. No real-time simulation for GBAS has yet occurred, and its use is different from ILS. Furthermore, since ILS was implemented a long time ago, prior to the current safety paradigm, there was no safety case for ILS with which to compare identified human errors. Consequently, since there were no prior identified HEPs and none available from the real world (GBAS is not yet implemented) and few relevant ones from ILS operation, it was decided to attempt to use expert judgment approaches to quantify some of the key HEPs for the GBAS safety assessment.