Safety Informing Design
Barry Kirwan
Eurocontrol Experimental Centre, Bois des Bordes BP 15, Breigny sur Orges, F-91222 CEDEX, France.
Abstract
Evidence suggests that the roots of many accidents occur at the early system design stages. This paper gives an account of one organisation’s attempt to consider safety at the concept exploration or research stage – very early on in the design process. The industry of concern is Air Traffic Management, and the nature of the work involves the development of new concepts to make ATM more effective (e.g. handling more traffic in line with increasing public demand) while maintaining or improving safety. The scope of the safety work at this early stage is more qualitative than quantitative, focusing in particular on hazard and human error assessment, and gaining safety insights from real-time simulations. Examples are given of the detailed safety approach. The safety management framework and organisational commitment necessary to sustain and channel such activities and safety results are also discussed.
Keywords:Safety, safety assessment, human error, air traffic management, safety management
1. Introduction
1.1 The ATM Context
Air Traffic Management (ATM) involves air traffic controllers in control towers and air traffic control centres controlling civil air traffic through a sophisticated network of airspace, using radar and voice communication as the primary media to know where the aircraft are, to help them get to where they are going, and to keep the individual aircraft apart.
The fact that ATM does this so well has earned it a reputation as a ‘high reliability’ operation, with very few accidents. The accident rate in European aviation due to civil air traffic management is approximately 1.6 x 10-8 fatal accidents per aircraft flight hour, which equates to 0.6 accidents per year given current air traffic levels. ATM as a direct contribution accounts for only about 4% of all civil aviation accidents, therefore most such accidents are caused by a host of other events outside the current remit of air traffic management. ATM is therefore very safe compared to other industries and other modes of transport. However, to put this into perspective, there have been three recent fatal accidents with ATM causal contribution in Europe since 2000 – a runway collision in Paris CDG killing one person (flight crew), a runway collision in Milan Linate Airport killing 116 people, (ANSV, 2001) and a mid-air collision over Germany (Lake Constance) in Swiss-managed airspace killing 74 people (BFU, 2004).
Figure 1 shows that for some time ATM-induced accidents were rare, but with these three recent accidents there is cause for concern – either because a trend is perhaps occurring (it is too early to say statistically), or because three such accidents in a short space of time is simply unacceptable. These accidents have indeed triggered significant additional safety efforts aimed at reinforcing the safety of the current European ATM system (Eurocontrol, 2003). There is therefore no room for complacency in ATM safety, especially since the number of passengers and aircraft flying is increasing, and is set to double in the next 10-15 years. This capacity increase means that ATM as an industry must work hard to maintain its current level of safety, and even harder to improve on that level.
Figure 1 – European ATM accident occurrences
ATM in Europe is also undergoing fundamental changes, partly to adapt to the increasing capacity demands, and partly to remove the in-built hindrances to an already complex and near-saturated system. New designs and approaches have therefore been conceived to enhance ATM system performance. Some of these aim to support the controller, the human responsible for maintaining a safe, orderly and efficient flow of traffic, whether in a control tower, or an en route air traffic control centre, via new sensors, procedures, displays, warning devices, and other tools to support efficient communication and control of traffic. Other approaches are concerned with how the airspace is organized and managed, in particular moving away from traditional national boundaries to ‘functional airspace blocks’ which are based instead on predominant traffic flows and routes. Rather than having forty or so different countries each with their own style of procedures and ways of working traffic, and each with their own national rules and regulations, the aim is for a ‘Single Sky’ for Europe, with more collaborative approaches and harmonisation of air traffic control practices and procedures. Therefore, the whole ATM system in Europe is undergoing a paradigm shift, with key dates for its transformation being the 2012 – 2017 period, when many new tools, procedures, and practices will be implemented. Such a change offers both challenges and opportunities for safety, and of course for the designers of this new ATM system paradigm.
1.2 The Design Process & Safety
1.2.1 Eurocontrol[1]
Eurocontrol arose some time ago as an organisation from a European Convention, and from a realisation that different member states needed to have a body working for the common interest. Eurocontrol is therefore non-profit making, and does not produce ATM ‘products’: it is not a manufacturer. Eurocontrol’s mission is the development and harmonisation of the European ATM system. Hence it aims to show the way, to help different European States towards mutually beneficial consensus, and also to promote best practice in ATM. At the kernel of its mission is safety, since ATM is centrally concerned with the travelling public.
Eurocontrol headquarters is in Brussels; it has its own air traffic control centre in Maastricht, a controller training centre in Luxembourg, and a research centre in Brétigny south of Paris, France, where real-time simulations or ‘experiments’ are carried out on future ATM concepts.
1.2.2 Eurocontrol and Design
At the Eurocontrol Experimental Centre (EEC), the research arm of the Eurocontrol Agency, a number of future concepts for ATM are being explored, both for the mid-term (2012 – 2017) and beyond (e.g. 2025). Typically, concepts are explored together with interested European Member States as primary stakeholders, until such concepts are either discarded or stabilised for further development and ultimate deployment. The further development is carried out by the main body of Eurocontrol, whilst final development and deployment is passed over to the stakeholders in industry. This handover from the EEC to Eurocontrol in general, and from Eurocontrol to its stakeholders, is represented in figure 2.
In practice however it is not such a ‘clean’ picture: some concepts stay at the EEC for a long time, reaching a relatively advanced stage of maturity before being considered ready for development towards industrial systems or tools to help the controllers do their job better. Thus, sometimes tools may be developed to the stage that they can even be tested in real ATM centres in Europe under closely controlled conditions (usually called ‘shadow-mode’ trials). However, such tools will still be handed over to other stakeholders at a certain point in order to carry out formal requirements engineering and software building etc. to generate true products that could be licensed to work in such ATM centres. This is to say that whilst the safety approaches embodied in this paper are aimed at early concept design, sometimes the examples herein will appear far beyond the ‘concept’ stage. This state of affairs is not unique to the ATM industry, but may contrast with certain others such as nuclear power and offshore petrochemical, where the handover from concept to detailed design is likely to be a more black-and-white picture with clear handover points.
New ATM concepts are developed and designed at the EEC, whether these are ideas on how to improve the capacity of the whole ATM system via better procedures, or tools to help the controller handle more traffic effectively, or warning tools to support safety. Once developed, these new concepts are tested either in small-scale simulations with real controllers to gain early feedback and improve the concept (figure 3), or in large-scale simulations to test a more mature concept with a larger and more representative sample of controllers from various member states.
ANSPAir Navigation Service Providers
EEC Eurocontrol Experimental Centre
FHAFunctional Hazard Assessment
HHuman
HWHardware
PISCPre-Implementation Safety Case
POSCPost-Operational Safety Case
PSSAPreliminary System Safety Assessment
STDStandardisation
SWSoftware
Figure 2: EEC design and safety assessment
Figure 3 – Small-scale simulation investigating a new ATM concept at the EEC
The people developing these new concepts come from a range of backgrounds, including a number of controllers from different national Air Navigation Service Providers (ANSPs) in many European countries and, relevant to this paper, a small team of safety assessors.
1.2.3 ATM & Safety Assurance
ATM safety assessment is mandated in European member states by the ESARR 4 (Eurocontrol Safety and Regulatory Requirement 4) guidance on risk assessment (Eurocontrol 2003b). This states that any new systems or significant changes must not lead to exceeding the agreed target level of safety (TLS) for ATM in Europe, which is currently set at 1.55 x 10-8 accidents per flight hour. The means of demonstrating compliance with this TLS is via a three-stage safety assessment approach, as shown in figure 4.
The first of these stages, Functional Hazard Assessment (FHA), entails a consideration of the potential hazards of the new design. The second stage, Preliminary System Safety Assessment (PSSA) entails a detailed qualitative and quantitative demonstration that the system is tolerably safe (i.e. within the TLS). In particular this stage identifies the safety requirements for the proposed system architecture, specifying in effect what will keep the system safe, and the required integrity of the safety attributes and properties of the system, including its human elements. The third stage, System Safety Assessment (SSA) ensures that the system, as implemented, achieves tolerable safety i.e. is still compliant with the safety objectives, requirements and integrity levels, and is therefore safe to go live. This includes implementing, verifying and monitoring of risk mitigation measures, together with demonstrating that the level of risk has been reduced as low as reasonably practicable (ALARP), i.e. all the measures have been taken to reduce the risk unless their cost is grossly disproportionate to the reduction in risk they achieve. The ultimate deliverable is the safety case itself, where the service provider assembles and documents evidence which justify the adequacy of the safety provisions at his facilities.
Figure 4: Three-Step Safety Assessment Process (ESARR 4 Safety Assessment Methodology [SAM])
In practice, this explicit and pro-active safety assurance approach is fairly new to ATM, and is still undergoing refinement. The three safety assessment steps have only been applied in total to a few ATM systems within Eurocontrol at the time of writing, although a number of future system elements have passed the FHA and PSSA stages, and many more are entering this overall assessment process, whether such assessments are carried out by Eurocontrol or member states. Additionally, because the approaches are relatively new to ATM, the actual means of compliance, the techniques applied to achieve the three steps, are varied, and still to an extent finding their way. This is because at the time ESARR 4 was developed, there was no off-the-shelf assessment system and associated toolkit sitting waiting to be used for ATM, although considerable information already existed. Also, although ATM has borrowed approaches from other industries where for example Probabilistic Safety Assessment (PSA) is more commonplace and mature (e.g. the nuclear and oil and gas industries), ATM has some notable differences which mean that such borrowing is insufficient. Four of these intrinsic differences are noteworthy, both for the designer and the safety assessor:
- There is high reliance on the human, both the controller who makes critical decisions every minute (critical in that if they are wrong an incident may well occur), and the pilot of the aircraft being controlled.
- The situation is live and highly dynamic, and events can escalate rapidly, with a shift from normal operations to a serious event and risk of accident often within minutes or even seconds, rather than tens of minutes or hours as in many other industries.
- ATM is a truly open system, literally a global system, so that problems in one area can impinge on other areas and other transport sectors (e.g. an event causing an airport to close, particularly a large airport, will cause significant perturbation to a very large segment of the airspace and will increase the loading at other airports and cause increased demand in other transport domains (principally rail and road) that must attempt to take up the load in terms of people travelling). This means that the system elements are highly interactive and coupled to other elements of the system and outside its intended boundaries of operations and responsibility. Consequently, since aviation system designers do not (and cannot) attempt to construct the whole system simultaneously at a detailed level, reductionism – breaking the overall system down into components - is generally applied. However, this obviously requires the subsystem interactions to be well-understood and possible operational and safety consequences of such interactions to be determined. Design and safety assessment therefore have to work in this increasingly complex system environment.
- There is no easy way to shut down the system quickly. There is not the equivalent of an emergency shutdown function, since the aircraft have to land in order for the system to reach an intrinsically safe shutdown state.
- Many of the current developments within Eurocontrol cut across traditional divisions within aviation, i.e. they impact airworthiness, route spacing standards, ground systems, space infrastructure, procedure design, etc. This in particular exposes projects to the full and varied set of target levels of safety.
Nevertheless, ATM has borrowed significantly from other industries, and its safety technique toolbox looks similar to other technologies, except for its extensive use of real-time simulations when evaluating new concepts such as new airspace designs, controller tools, etc.
1.2.4 Safety Assurance & Concept Design: Safety Opportunism, Design Enhancement
Concept exploration and associated research is typically the province of the EEC (other member states and research centres do of course explore concepts independently of Eurocontrol). Once concepts are mature and are taken on as part of the developing programme of work for implementation, they will be subject to a safety assessment process as dictated by the appropriate safety regulatory authority (e.g. compliant with ESARR 4), as would be expected in any mature industry. However, before such maturity, when concepts are not stable but are still being explored, there is still the opportunity to consider safety, even at this very early stage. Yet traditionally this has not been done in any formal way, either in ATM or in other industries with a more mature and explicit safety assessment process.
The two papers by Kinnersley & Roelen and Taylor in this special issue have shown that accidents often have their roots in the design process. This appears to be a common fact across a range of different industries. Furthermore, it is clear that the roots of accidents are sometimes at the early design stages. But there are several problems with trying to address safety in the concept exploration or concept research phases:
- There is often little detail on the procedures or controller working practices proposed for the concept. This amounts to a lack of a mature operational concept, one that is sufficiently detailed to allow safety hypotheses (e.g. ‘what would happen if…?’) to be answered (other than – ‘well, it depends how we operate it’) or even asked.
- The way in which the specific research concept is developed (e.g. a new tool for the controller or a new means of controller-pilot communication) will interface with other parts and dimensions of the system (e.g. other tools and airspace design and procedural constraints) that may not yet be known or developed. Research often explores new system elements or element replacements rather than entire system architectures, and work on integration into the full system concept will come later. This makes consideration of hazards due to interactions with other parts of the system, and determination of the impact of the new system element on the overall ATM target level of safety difficult, to say the least.
- Safety assessment of new concepts requires incorporating expert judgements where data are not available or not representative. This requires identifying hazards that might have never occurred in the past. Moreover, experience has shown that it can be difficult to find experts with sufficient ATM expertise, who are also able to relate to a risk model.
- People in aviation have a legendary ‘can-do’ attitude, which contributes to the success of the industry, but some people may have difficulty admitting that something ‘can’t’ or ‘should not’ be done; that the margin has been cut too short. This can apply to controllers involved in the design process as domain experts.
- The people developing new concepts are trying to find better ways of optimising the system, and do not necessarily want to be too constrained at an early stage with burdensome safety assessment procedures and processes. Indeed, some promising concepts could be deterred at an early stage by too much safety stringency, whereas perhaps later on the safety issues could be resolved, and the positive benefits of the proposed system change could still be realised.
The question is therefore one of whether there is value in doing safety at an early concept research stage. The potential advantages, however, are significant: