Upload
jlgchico
View
1.103
Download
0
Embed Size (px)
DESCRIPTION
Automation is not the solution to eliminate the ill-defined term "human error". Incident investigation gives contextual situation to understand errors. A study case of TCAS is presented. Presented at UPM summer courses
Citation preview
Automation to overcome human error:
true or illsion? A Case Study: TCAS Alerts
Towards higher levels of automation in ATM HALA! Summer School Cursos de Verano Politécnica de Madrid La Granja, July 2011
Guest Lecturer: José L Garcia-Chico CRIDA [email protected]
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
! Automation (=technology) ! Any mechanical or electronic
replacement of human labor (either physical or mental)
! Sensing environmental variables ! Data processing and decision making
(by computers) ! Action, either on the environment or
by communicating information
! Aviation systems are socio-technological systems ! Computers/machines ! Software ! Humans ! Procedures & Organizational
processes 23-24/02/11
Use of automation is pervasive in safety-critical domains
Improve System Performance • Fast and consistent (reliable) • Multichannel for data sensing • Fatigue free
Reduce costs • Fast • Available cheap computing
Enhanced human abilities • Offload skill based tasks (easy and repetitive) • Limitations (e.g., memory, slow, variable)
The cool factor • Available computing power • Technological imperative (cutting edge)
Why automate?
Most common given reasons
Reduce human error • Fast and accurate (variability ↓↓ ) • Power of huge data processing • Emotions/stress free
(Hollnagel , 2004)
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
Some definitions of human error... refered to expectations and the context.
23-24/02/11 Aspectos fundamentales de la Validación en SESAR
Error will be taken as a generic term to encompass all those occasions in which a planned sequence of mental or physical activities fails to achieve its intended outcome, and when these failures cannot be attributed to the intervention of some change agency. (Reason, 1990)
An inappropriate action, or intention to act, given a goal and the context in which one is trying to reach that goal. (Ramon, 1995)
Actions by human operators can fail to achieve their goal in two different ways: the actions can go as planned, but the plan can be inadequate, or the plan can be satisfactory, but the performance can still be deficient. (Hollnagel, 1993)
What is important to know about human error?
23-24/02/11
“To err is human…”(Cicero, I century BC)
“…to understand the reasons why humans err is science” (Hollnagel, 1993)
! Erroneous acts are inevitable ! It is in our nature ! Can happen to anyone any time in multiple contexts ! Some are preventable: effort to design error-tolerant and error-recovery
! Human error is what happens ! It is the “what” but not the “why”. Need to understand the whole system. ! Not all have disastrous consequences. Errors and accidents are remotely
related ! It takes many jointly factors to lead systems to failure
! Human error cannot be the aim of investigations ! It can not be used as blaming tool towards operator at the sharp end (Fitt
´s and Champanis´ early work on design error) ! Study human error may increase understanding of the system. Source of
lessons learnt ! Identification of errors is influence by many factors (e.g., investigator
biases, external pressures)
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
Cognitive science helps to classify error • Definition of taxonomic systems • Interpretation of underlying psychological
mechanisms • Operator focused
Three major taxonomies • Based on schema activation theory (Norman,
1981) • Based on cognitive control (SRK Theory.
Rasmussen, 1986) • Based on generic error modeling (Reason, 1990)
23-24/02/11
Human error taxonomies
Norman (1981) classification of errors based on schema activation
23-24/02/11
! Schemas (Neisser, 1976) ! schema are sensory-motor knowledge
structures stored in memory used to guide behavior: efficient and low energy
! Hierarchy of schemas are triggered if particular conditions are satisfied
! Mental models & information guide behavior ! Knowledge or the world directs behavior
and search of information ! Pieces of information are sampled to act ! Information serves to update internal
cognitive schemas
Norman (1981) classification of errors is apparent to describe skilled behavior
• Misinterpretation of situation → wrong schema activation
• e.g., Mode error
Error in formation of
intention
• Due to similar trigger conditions → wrong schema activation
• E.g., similar sequence of actions, external data-driven wrong activation
Error in faulty activation of
schema
• Activation too early or too late • E.g., timing in execution
Error in faulty triggering of
schemas
23-24/02/11
Rasmussen (1986) behaviour account for experience , skill and familiarity
23-24/02/11
Characteristics of SRK behavior
23-24/02/11
Rasmussen (1986) errors account for experience , skill and familiarity
• Attentional failure to monitor progress • Forgetfulness • Misrecognition of events (perceptual)
Skilled-based errors
• Misapplication of good rules (tend to apply rules after pattern matching)
• Application of bad rules (use of inadequate shortcats) Ruled-based
errors
• Lack of knowledge and high load of memory • Incomplete/incorrect mental models of the problem • Confirmation Bias (People tend to seek information that confirms the chosen course of action and to avoid test to disconfirm the choice)
Knowledge- based errors
23-24/02/11
Reason (1990) generic error modeling
23-24/02/11
Skill-based level
an action not in accord with the your intentions: a good plan but poor execution
failure to carry out any action at all, tied to failure of memory
Action intentionally goes as human plans, but the plan is wrong. This is a planning failure. These are errors of judgment, inference, and the like
Error distribution according to Reason
23-24/02/11
! Humans are prone to slip & lapses with familiar tasks ! 61% of errors are skill-based ! Increased skills do not guarantee error-free
performance, just different type of errors
! Humans are prone to mistakes when tasks become difficult ! 28% of errors are rule-based ! 11% of errors are knowledge-based that require novel
reasoning from principles.
Approximate data obtained averaging three studies (Reason, 1990)
Humans are error prone, but…is that all?
! Operator (sharp end) are not responsible of system disasters, just because they are the last one and more visible.
! Distinction between ! Active errors: error associated with the performance of
the front-line operators, i.e. pilots, air traffic controllers, control rooms crews, etc
! Latent errors: related to activities removed in time and space form the direct control interface, i.e. designers, managers, maintenance, supervisors
• (Reason 1997)
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
Treatment of human error depends on the accident model in use and biases of analyst
23-24/02/11
In searching for a cause, investigators tend to ! focus on the sequence of events in accidents (accident model) ! explain why operators missed actions to prevent the accident (hindsight) ! Stop in the most unreliable and least understood part (e.g., human) ! Be guided by confirmation bias
(Hollnagel 2004, Dekker 2006)
Assume: System is basically safe,
and major contributor is
human
Analyze where humans were
involved
Find one error (or variable
performance) and assign that
as a cause
Sequential model (simple linear) • Decomposable in parts (probability of parts-fault tree) • Chain of events (domino effect) • Humans are another chain • Route cause
Epidemiological model (complex linear) • Decomposable in parts • Latent failures (management and/or design) • Pathogens activated by other factors/errors • Degradation of barriers/defenses
Systemic model • Decomposition does not work for socio-technical systems → emergent properties
• Socio-technical systems are non-linear • Accidents result from unexpected combinations (resonance) of normal performance variability
• Safety requires constant ability to anticipate to future events avoiding resonance
23-24/02/11
Three main types of accident models
Chees model (Reason, 1997)
Domino Theory (Heinrich, 1931)
(Hollnagel 2004, Dekker 2006)
FRAM (Hollnagel, 2000)
System are too complex • Not all situations are predictable and specified • Accidents are due to unexpected combination
of performance (coincidence rather than chain)
• Safety is built through constant ability to anticipate future events (monitoring and dumping)
Performance variability
• Human adds variability but it is the essence of success of systems (adaptable to unknown), but the failure as well (expectations built in the design)
23-24/02/11
Systemic view of socio-technical systems highlights humans as assets
Learning from past accident/incident
! Great source of lessons to be learnt…not of facts to blame
! Careful considerations to keep in mind: ! Most people involved in accidents are not stupid nor
reckless. They do what it makes sense at that time. ! Be aware of possible influencing situational factors ! Be aware of concurrences of factors (variability of
performance) ! Be aware of the hindsight bias of the retrospective analyst.
Hindsight bias: Possession of output knowledge profoundly influence the way we analyze and judge past events. It might impose a deterministic logic on the observer about the unfolding events that the individual at the incident time would have not had.
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
Mid-air collision in Uberlienguen
! Uberlinguen (2002) ! B757 and Tu-154 collided
! German airspace, under Zurich control. 71 people were killed.
! Only one controller was in charge of two positions during a night shift ! Two separated displays ! Telephone and STCA under maintenance.
! ATC clearance in opposition to what TCAS indicates and pilot followed ! ATC detected late the conflict between both
aircraft, and instructed T-154 to descend. ! The TCAS on board the T-154 and B757
instructed the pilots to climb and descend respectively.
! The T-154 pilot opted to obey controller orders and began a descent to FL 350 where it collided with the B757. B757 had followed its own TCAS advisory to descend.
Slide 25
Motivation of case study (Operational Errors co-occurring with TCAS RA)
• Classify operational errors and contextual factors in ATC in search of trends and consistency in the classification.
• Assumption: classification of errors provides understanding of system performance and organization context → better understanding of work possible variability limits
• Caution: concept of cause, as raised by Dekker (2004) is not addressed
• Focus on one circumstance associated to OE: presence of TCAS RA
• TCAS is an effective safety system, but…
It might disrupt the controller’s SA (Brooker, 2004; Wickens et al, 1998)
Amplified by the fact that changes FL (number in datablock)
It might create inconsistent pilot and controller responses (Rome et al., 2006, Wickens et al, 1998)
Understand procedural and informational context of OEs co-occurring with TCAS RAs.
TCAS – expected behavior
! For TCAS to work as designed, immediate and correct crew response to TCAS advisories is essential.
! Regulation of TCAS: operational procedures and practices (FAA AC 120-55B)
Pilots: ! Should follow TCAS RA, unless doing so would jeopardize the safe
of operation. ! During an RA, do not maneuver contrary to the RA based solely
upon ATC instructions. ! S/he has to report any deviation from ATC clearance, as soon as
practicable after responding to the RA, and resume previous clearance after “clear of conflict”
Controllers: ! Will not knowingly issue instructions that are contrary to RA
guidance when they are aware that a TCAS maneuver is in progress.
TCAS events timeline (“desired”)…a hints that humans have to adapt to technology
Adapted from: Brooker, P. (2004). Thinking about downlink of resolution advisories from airborne collision avoidance systems. Human Factors and Aerospace Safety 4 (1), 49-65.
Timeline ATC SA Impaired ATC aware deviation
Pilot notifies deviation
Pilot notifies Return to clearance
RA
Pilot follows RA & deviates from clearance
Chances to receive ATC clearance in opposition to RA
Controller provides traffic info, If workload permits
Controller is not responsible to provide separation
Clear of conflict
Methods
! Exploratory Study: mapping relationships in the data ! Tentative results pending of larger studies.
! Analysis of errors based on preliminary and final Air Traffic Controller Reports (FAA Operational Error Detection Program)
! Comprising two studies/datasets: ! Taxonomic Study: classification of OE incident initial reports (Jan-
Jun 04 period: 480 OE reports) ! Classification of OEs based on FAA investigation. ! Relevance of coordination, training, proximity, time on position.
! Case Study: OEs with presence of TCAS RAs. Final reports (Jan-Jun 2004 & 2005: 62 reports) ! Use of same classification. ! Characterization of the TCAS RA events. Human response.
Slide 29
Results Study 1: Taxonomic Study
Operational Errors Reports Operational Error Classification
Terminal Radar 162 (33.96%) 250 (30.9%)
ARTCC 318 (66.04%) 560 (69.1%)
TOTAL 480 810
Slide 30
No effect on proximity due to experience of operator
• No statistical support to claim that proximity is higher with developmental (i.e., trainee)
Slide 31
Error Severity and Frequency by Time on Shift
• Not found statistical significance in the distribution of frequencies (60 min.)
• Not been able to claim that errors are more likely after change of shift. • No evidence that errors were more severe in the first minutes after
taking over control (X2 (10,N=373)=7.27, p=0.700)
(Chi-square X2 (11,N=388)=6.575, p=0.832)
Slide 32
OEs co-occurring with TCAS
Jan-Jun 04 Jan-Jun 05
Terminal Radar 8 (30.8%) 34 (43.6%)
ARTCC 18 (69.2%) 44 (56.4%)
TOTAL 26 78
Slide 33
Controller communication in TCAS situations varies from what is “expected”
Slide 34
Controller´s commands in TCAS situations were given in the vertical plane
Slide 35
ATC vertical commands after RA and flight deck report
Slide 36
Deviations from “expected” controller behavior
Incomplete = missing any pilot’s message, missing callsign, TCAS direction or excessive delay Before and after refers to the action of controller in relation to the TCAS RA event. Traffic, heading, or altitude mean ATCO gave traffic info, or change heading, or altitude
Highlights on the chain of events during TCAS RA encounters in OE reports
! Controllers issued clearances after TCAS RA in the vertical plane in 13 situations (21 %).
! Controllers received incomplete information in 26 situations (43.5%) and no information in 3 (5%). Opportunities for wrong decisions.
! Controllers issued vertical clearances after TCAS RA and incomplete pilot’s reports in 12 situations (19,4 %).
! opposite altitude clearance in 3 reports (4.8%) ! Pilot reports were all late after TCAS RA and controller clearance
! Data suggests that it is more likely to receive an opposite clearance if the controller receive incomplete pilot information.
Conclusions on study
! Value of systematic characterization of errors ! OE classification would allow prioritization of actions ! Help to understand system behavior (together with humans)
! Contextual factors: ! Potentially staffing organizational issue (Planner controller)
! Error reports concurrent with TCAS RA: ! OEs with similar patterns to full dataset ! Not consistent pilot-controller behavior (deficient information/
actions) → variability of performance ! Incomplete/late information increases chances of vertical
clearances incompatibles with RA direction
Potential Actions that are being considered
! Increase training recreating TCAS RA situations ! Under stress situation, abnormal events trigger more familiar
responses (i.e., issue vertical clearance). Traditional solution.
! Revisit downlinking RAs (more automation) ! Not obvious solution, with important implications
! Draw too much controller attention ! TCAS RA is not the most relevant information, but the pilot deviation
from clearance ! Controller’s responsibility and liability implications
! Aircraft following RAs without pilot intervention (more automation) ! Not obvious solution either: Potential for mode errors
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
! Change the role of the operators in the system: new opportunities for error
! Sources for error are distributed/changed, but not eliminated ! Increase opeartor´s demand (system faster, complex, workload,
memory) ! Rely on capabilities where human are not good at (e.g.,monitoring
without being in the loop) ! Deskill people because the opportunities to practice decrease
! Sometimes ill-adapted ! Require new skills and more knowledge on the operator ! Remove operator, thus facing unexpected situations (e.g.,
adapting procedures) are difficult/imposible to cope. ! Operator is skillful at judging when and how to adapt performance/
procedure
Automation is not bad per se, but potential issues with automation may lead to…
Automation of socio-technical systems
What´s human error?
Taxonomies of human error
A case study: OEs with TCAS interjection
Accident models
Take home messages
References and readings
Readings and references (1)
23-24/02/11
! Besnard, D. Greathead, D., & Baxter, G. (2004). When mental models go wrong. Co-occurrences in dynamic, critical systems. International Journal of human Computer Studies, 60, 117-128.
! BFU (2004). Uberlingen midair collision. Investigation Report AX001-1.2/02. Braunschweig, Germany: German Federal Bureau of Aircraft Accidents Investigation.
! Brooker, P. (2004). Thinking about downlink of resolution advisories from airborne collision avoidance systems. Human Factors and Aerospace, 4 (1), 49-65.
! Brooker, P. (2005). STCA, TCAS, airproxes and collision risk. The Journal of Navigation, 58, 389-404.
! Dekker, S. W. A. (2002) Reconstructing human contributions to accidents: the new view on error and performance. Journal of Safety Research, 33, pp. 371-385.
! Dekker, S. (2006). The field guide to understanding human error. Brookfield, VT: Ashgate.
! Endsley, M & Rodgers, M (1997). Distribution of Attention, Situation Awareness, and Workload in a Passive Air Traffic Control Task: Implications for operational errors and automation. DOT/FAA/AM-9713.
! Eurocontrol (2003). Review of ACAS RA downlink: An assessment of the technical feasibility and operational usefulness of providing ACAS RA awareness on CWP. Brussels, Belgium.
! FAA (2000). Introduction to TCAS II version7. Washington, DC: US Department of Transportation Garcia-Chico, J. L. (2006). A Human Factors Analysis of Operational Errors in ATC. The TCAS Case Study. Master’s thesis, San Jose State University, CA.
Readings and references (2)
23-24/02/11
! Hollnagel, E. (1993). The phenotype of erroneous actions. International Journal of Man-Machines Studies, 39, 1-32.
! Hollnagel, E. (2004) Barrieres and accident prevention. Brookfield, VT: Ashgate. ! Neisser, U. (1976) Cognition and Reality: Principles and Implications of Cognitive
Psychology. Freeman, San Francisco. ! Norman, A. D. (1981). Categorization of slips. Psychological review, 88 (1), 1-15. ! Nunes, A. & Laursen, T. (2004). Identifying the factors that led to the Ueberlingen mid-
air collision: implications for overall system safety. Proceedings of the 48th Annual Meeting of the Human Factors and Ergonomics Society (pp. 20-24). New Orleans, LA.
! Parasuraman, R. (1997). Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39 (2), 230-253.
! Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction. IEEE Transactions on Systems, Man, and Cybernetics, 30 (3), 286-297
! Pounds, J., & Ferrante, A. S. (2005). FAA strategies for reducing operational error causal factors. In B. Kirwan, M. Rodgerds, & D. Schafer (Eds.), Human factors impact in air traffic management (pp. 89-105). Aldershot, UK: Ashgate.
! Pritchett, A. R. (2001). Reviewing the role of cockpit alerting systems. Human Factors and Aerospace Safety, 1 (1), 5-38.
! Rasmussen, J. (1982). Human errors: A taxonomy for describing human malfunction in industrial installations. Journal of Occupational Accidents, 4, 311-33.
! Rasmussen, J. (1986) Information Processing and Human–Machine Interaction. North-Holland, Amsterdam.
Readings and references (3)
23-24/02/11
! Reason, J. T. (1990). Human Error. Cambridge University Press, Cambridge U.K., 1990. ! Reason, J. T. (1997). Managing the risks of organizational accidents. Aldershot,
England: Ashgate Publishing Company. ! Shorrock, S.T. & Kirwan, B. (1988). The development of TRACEr Technique for
Retrospective Analysis of Cognitive Errors in ATM. 2nd Conference on Engineering Psychology and Cognitive Ergonomics (pp. 28-30), Oxford.
! Wickens, C. D., & Hollands, J. G. (2000). Engineering Psychology and Human Performance (3rd ed.). New Jersey, NJ: Prentice Hall
! Wickens, C. D., Mavor, M., Parasuraman, R., & McGee, J. (1998). The future of air traffic control: Human operators and automation. Washington, DC: National Academy of Science.
! Wiegmann, D. A., & Shappell, S. A. (2003). A human error approach to aviation accident analysis: The human factors analysis and classification system. Aldershot, UK: Ashgate.
! Woods, D.D. & Cook, R.I. (2002). Nine steps to move forward from error. Cognition, Technology, and Work, 4, 137-144
Centro de Referencia I+D+i ATM
23-24/02/11 Aspectos fundamentales de la Validación en SESAR