20
HUMAN FACTORS, 1995,37(1),65-84 Measurement of Situation Awareness in Dynamic Systems MICA R. ENDSLEY, 1 Texas Te~h University, Lubbock, Texas Methodologies for the empirical measurement of situation awareness are re- viewed, including a discussion of the advantages and disadvantages of each method and the potential limitations of the measures from a theoretical and prac- tical viewpoint. Two studies are presented that investigate questions of validity and intrusiveness regarding a query-based technique. This technique requires that a simulation of the operational tasks be momentarily interrupted in order to query operators on their situation awareness. The results of the two studies indicate that the query technique is not intrusive on normal subject behavior during the trial and does not suffer from limitations of human memory, which provides an indi- cation of empirical validity. The results of other validity studies regarding the technique are discussed along with recommendations for its use in measuring situation awareness in varied settings. INTRODUCTION In the preceding paper in this issue (Endsley, 1995), I presented a definition and theory of sit- uation awareness (SA), outlining its role in dy- namic decision making and establishing a model that depicts the relevant factors and un- derlying mechanisms affecting the process of achieving SA. In this paper I explore the mea- surement of operator SA within particular sys- tem contexts. A measure of SA is valuable in the engineering design cycle because it assures that prospective designs provide operators with this critical commodity. For this reason I review various methods that have been proposed or used for measuring SA and address the veracity of each technique in comparison with standard criteria. By presenting experiments evaluating the Situ- ation Awareness Global Assessment Technique 1 Requests for reprints should be sent to Mica R. Endsley, Department ofIndustrial Engineering, Texas Tech University, Lubbock, TX 79409. (SAGAT).I will establish its validity for measur- ing SA. A brief example of the use of the tech- nique will be presented. which demonstrates the type of data the technique generates and its con- tribution in a display design evaluation effort. In addition, considerations for the use of the tech- nique and its applicability within various sys- tem domains will be discussed. SA is an understanding of the state of the en- vironment (including relevant parameters of the system). It provides the primary basis for subse- quent decision making and performance in the operation of complex, dynamic systems. SA is formally defined as a person's "perception of the elements of the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future" (Endsley, 1988, p. 97). At the lowest level of SA, a person needs to perceive relevant information (Level 1 SA). Integrating various pieces of data in conjunction with operator goals provides an understanding of the meaning of that information, forming Level 2 SA. Based on © 1995. Human Factors and Ergonomics Society. All rights reserved.

1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

Embed Size (px)

Citation preview

Page 1: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

HUMAN FACTORS 199537(1)65-84

Measurement of Situation Awareness inDynamic Systems

MICA R ENDSLEY 1 Texas Te~h University Lubbock Texas

Methodologies for the empirical measurement of situation awareness are re-viewed including a discussion of the advantages and disadvantages of eachmethod and the potential limitations of the measures from a theoretical and prac-tical viewpoint Two studies are presented that investigate questions of validityand intrusiveness regarding a query-based technique This technique requires thata simulation of the operational tasks be momentarily interrupted in order to queryoperators on their situation awareness The results of the two studies indicate thatthe query technique is not intrusive on normal subject behavior during the trialand does not suffer from limitations of human memory which provides an indi-cation of empirical validity The results of other validity studies regarding thetechnique are discussed along with recommendations for its use in measuringsituation awareness in varied settings

INTRODUCTION

In the preceding paper in this issue (Endsley1995) I presented a definition and theory of sit-uation awareness (SA) outlining its role in dy-namic decision making and establishing amodel that depicts the relevant factors and un-derlying mechanisms affecting the process ofachieving SA In this paper I explore the mea-surement of operator SA within particular sys-tem contexts

Ameasure of SA is valuable in the engineeringdesign cycle because it assures that prospectivedesigns provide operators with this criticalcommodity For this reason I review variousmethods that have been proposed or used formeasuring SA and address the veracity of eachtechnique in comparison with standard criteriaBy presenting experiments evaluating the Situ-ation Awareness Global Assessment Technique

1 Requests for reprints should be sent to Mica R EndsleyDepartment ofIndustrial Engineering Texas Tech UniversityLubbock TX 79409

(SAGAT)I will establish its validity for measur-ing SA A brief example of the use of the tech-nique will be presented which demonstrates thetype of data the technique generates and its con-tribution in a display design evaluation effort Inaddition considerations for the use of the tech-nique and its applicability within various sys-tem domains will be discussed

SA is an understanding of the state of the en-vironment (including relevant parameters of thesystem) It provides the primary basis for subse-quent decision making and performance in theoperation of complex dynamic systems SA isformally defined as a persons perception of theelements of the environment within a volume oftime and space the comprehension of theirmeaning and the projection of their status in thenear future (Endsley 1988 p 97) At the lowestlevel of SA a person needs to perceive relevantinformation (Level 1 SA) Integrating variouspieces of data in conjunction with operator goalsprovides an understanding of the meaning ofthat information forming Level 2 SA Based on

copy 1995 Human Factors and Ergonomics Society All rights reserved

66-March 1995

this understanding future events and systemstates can be predicted (Level 3) allowing fortimely and effective decision making Severalprocessing mechanisms have been hypothesizedto be related to SA including attention andworking memory limitations attention distribu-tion current goals mental models schemataand automaticity (Endsley 1995 this issue) Inaddition to characteristics of individuals the de-sign of a system and the operator interface canaffect SA

The enhancement of SA has become a majordesign goal for developers of operator interfacesautomation concepts and training programs jna variety of fields To evaluate the degree towhich new technologies or design concepts ac-tually improve (or degrade) operator SA it isnecessary to systematically evaluate them basedon a measure of SA which can determine thoseideas that have merit and those that have un-foreseen negative consequences Only in thisway can a design concepts utility be establishedin accordance with the design goal

In addition to evaluating design concepts ameasure of SA is also useful for evaluating theeffect of training techniques on SA conductingstudies to empirically examine factors that mayaffect SA such as individual abilities and skillsor the effectiveness of different processes andstrategies for acquiring SA and investigatingthe nature of the SA contruct itself (This type ofquantitative measurement-that is evaluatingan operators level of SA in light of conceptsunder experimental evaluation-differs signifi-cantly from research efforts that seek to directlyassess the processes operators use in acquiringSA as discussed by Adams Tenney and Pew[1995 this issue])

Evaluation of potential design concepts rou-tinely takes place within the context of rapidprototyping or part-task simulation Several dif-ferent methods for measuring SA can be consid-ered during this type of testing In additionmany of these techniques could be used to eval-uate the SA of operators working with actualsystems For the most part efforts at measuringSA thus far have been concentrated in the air-

HUMAN FACTORS

craft environment however most techniquesare also applicable to other domains

To establish the validity and reliability of anSA measurement technique it is necessary toestablish that the metric (a) measures the con-struct it claims to measure and is not a reflec-tion of other processes (b) provides the requiredinsight in the form of sensitivity and diagnostic-ity and (c) does not substantially alter the con-struct in the process which would provide bi-ased data and altered behavior In addition itwould be useful to establish the existence of arelationship between the measure and otherconstructs as would be predicted by theory Inthis case the measure of SA should be predictiveof performance and sensitive to manipulationsin workload and attention Against these crite-ria several potential measurement techniqueswill be explored along with an assessment ofthe advantages and disadvantages of each

Physiological Techniques

P300 and other electroencephalographic mea-surements have shown promise for determiningwhether information is registered cognitivelyAlthough these techniques allow researchers todetermine whether elements in the environmentare perceived and processed by subjects theycannot determine how much information re-mains in memory whether the information isregistered correctly or what comprehension thesubject has of those elements For the same rea-sons eye-tracking devices appear to fall short intheir ability to measure SA as a state of knowl-edge Furthermore these devices do not tellwhich elements in the periphery of the subjectsvision are observed or if the subject has evenprocessed what was seen (The devices may beuseful for exploring the processes operators usein achieving SA however as was recently inves-tigated by Smolensky [1993]) Known physio-logical techniques though providing useful datafor other purposes are not very promising forthe measurement of SA as a state of knowledge

Perfonnance Measures

In general performance measures have theadvantage of being objective and are usually

SITUATION AWARENESS MEASUREMENT

nonintrusive Computers for conducting systemsimulations can be programmed to record spec-ified performance data automatically makingthe required data relatively easy to collect How-ever several limitations exist in using perfor-mance data to infer SA

Global measures Global measures of perfor-mance suffer from diagnosticity and sensitivityproblems If overall operatorsystem perfor-mance is the only criterion used for evaluatingdesign concepts important system differencescan be masked despite its being a useful bot-tom-line measure Performance measures giveonly the end result of a long string of cognitiveprocesses providing little information aboutwhy poor performance may have occurred in agiven situation (assuming it can be detected re-liably during testing at all) Poor performancecould occur from a lack of information poorsampling strategies improper integration orprojection heavy workload poor decision mak-ing or action errors among other factors manyof which are not SA related (For a discussion ofthe distinction among SA decision making andperformance see the previous Endsley paper inthis issue)

As stated earlier overall system performanceis often masked by other factors In a tacticalaircraft environment for instance much of pilotperformance is by nature highly variable andsubject to the influence of many other factorsbesides SA A new system may provide the pilotwith better SA but in evaluation testing thisfact can be easily masked by excessive work-loads the intentional use of varied tactics orpoor decision making if overall mission perfor-mance is used as the only dependent measure Itwould be desirable therefore to measure SAmore directly

External task measures One type of perfor-mance measure that has been suggested in-volves artificially changing certain informationor removing certain pieces of information fromoperator displays and then measuring theamount of time required for the operator to re-act to this event (Sarter and Woods 1991) Asidefrom the fact that such a manipulation is heavily

March 1995-67

intrusive requmng the subject to undertaketasks involved with discovering what happenedto the changed or missing data while attemptingto maintain satisfactory performance on othertasks this technique may provide highly mis-leading results It assumes that an operator willact in a certain way when in fact operators of-ten employ compensating schemes to functionunder such circumstances For instance if a dis-played aircraft suddenly disappears the opera-tor may assume that equipment has malfunc-tioned the aircraft was destroyed or landed orthe aircrafts emitting equipment was turnedoff rendering it more difficult to detect (a notinfrequent occurrence) In any case the operatormay choose to ignore the disappearance assum-ing it will come back on the next several sweepsof the radar worry about it but not say any-thing or put off dealing with the disappearanceuntil other tasks are complete Any of these ac-tions will yield highly misleading results for theexperimenter who expects SA to be reflected bythe operators overt behavior

This measurement technique not only has in-valid assumptions but it alters the subjects on-going tasks possibly affecting attention and thusSA itself Anytime one artificially alters the re-alism of the simulation it can fundamentallychange the way the operator conceptualizes theunderlying information (see Manktelow andJones 1987) thus altering both SA and decisionmaking In addition such a manipulation wouldcertainly interfere with any concurrent work-load or performance measurement undertakenduring testing

Imbedded task measures Some informationabout SA can be determined from examiningperformance on specific operator sub tasks thatare of interest For example when evaluating analtitude display deviations from prescribed al-titude levels or time to reach a certain altitudecan be measured This type of detailed perfor-mance measure can provide some inferencesregarding the amount of SA that a display re-veals about a specific parameter Such measureswill be more meaningful than global perfor-mance measures and will not suffer from the

68-March 1995

same problems of intrusiveness as external taskmeasures Although selecting finite task mea-sures for evaluating certain kinds of systemsmay be easy determining appropriate measuresfor other systems may be more difficult An ex-pert system for instance may influence manyfactors in a global not readily predictedmanner

The major limitation of the imbedded taskmeasure approach stems from the interactivenature of SA subcomponents A system that pro-vides SAon one element may simultaneously re-duce SAon another unmeasured element In ad-dition it is easy for subjects to have biasedattention on the issue under evaluation in a par-ticular study (eg altitude) if they figure out thepurpose of the study Therefore because im-proved SA on some elements may result in de-creased SA on others relying solely on perfor-mance measures of specific parameters canyield misleading results

Researchers need to know how much SA op-erators have when taxed with multiple compet-ing demands on their attention during systemoperations To this end a global measure of SAthat simultaneously depicts SA across many el-ements of interest is desirable To improve SAdesigners need to be able to evaluate the entireimpact of design concepts on operator SA

Subjective Techniques

Self-rating One simple technique is to ask op-erators to subjectively rate their own SA (eg ona I-to-lO scale) Researchers in the AdvancedMedium-Range Air-to-Air Missile OperationUtility Evaluation Test (AMRAAMOUE) study(McDonnell Douglas Aircraft Corporation 1982)used this method Pilot (and overall flight) SAwas subjectively rated by the participants and atrained observer The main advantages of sub-jective estimation techniques are low cost andease of use In general however the subjectiveself-rating of SA has several limitations

If the ratings are collected during a simulationtrial the operators ability to estimate their ownSA will be limited because they do not knowwhat is really happening in the environment

HUMAN FACTORS

(they have only their perceptions of that reality)Operators may know when they do not have aclue as to what is going on but will probably notknow if their knowledge is incomplete orinaccurate

If operators are asked to subjectively evaluateSA in a posttrial debriefing session the ratingmay also be highly tainted by the outcome of thetrial When performance is favorable whetherthrough good SA or good luck an operator willmost likely report good SA and vice-versa In areevaluation of the AMRAAMOUE study Ven-turino Hamilton and Dvorchak (1990) foundthat posttrial subjective SA ratings were highlycorrelated with performance supporting thispremise In addition when ratings are gatheredafter the mission operators will probably be in-clined to rationalize and overgeneralize abouttheir SA as has been shown when informationabout mental processes is collected after the fact(Nisbett and Wilson 1977) Thus detailed infor-mation will be lost or misconstrued

What do such subjective estimates actuallymeasure I would speculate that self-ratings ofSA most likely convey a measure of subjectsconfidence levels regarding that SA-that ishow comfortable they feel about their SA Sub-jects who know they have incomplete knowledgeor understanding would subjectively rate theirSA as low Subjects whose knowledge may notbe any greater but who do not subjectively havethe same concerns about how much is notknown would rate their SA higher In otherwords ignorance may be bliss

Several efforts have been made to developmore rigorous subjective measures of SA Taylor(1990) developed the Situation Awareness Rat-ing Technique (SART) which allows operatorsto rate a system design based on the amount ofdemand on attentional resources supply of at-tentional resources and understanding of thesituation provided As such it considers opera-tors perceived workload (supply and demandon attentional resources) in addition to theirperceived understanding of the situation Al-though SART has been shown to be correlatedwith performance measures (Selcon and Taylor

SITUATION AWARENESS MEASUREMENT

1990) it is unclear whether this is attribu-table to the workload or the understandingcomponents

In a new application the Subjective WorkloadDominance (SWORD) metric (Vidulich 1989)has been applied as a subjective rating tool forSA (Hughes Hassoun and Ward 1990) SWORDallows subjects to make pairwise comparativeratings of competing design concepts along acontinuum that expresses the degree to whichone concept entails less workload than the otherThe resultant preferences are then combined us-ing the analytic hierarchy process technique toprovide a linear ordering of the design conceptsIn the Hughes et al study SWORD was modi-fied to allow pilots to rate the degree to whichpresented display concepts provided SA insteadof workload Not surprisingly the display thatwas subjectively rated as providing the best SAusing SWORD was the display for which pilotsexpressed a strong preference It is difficult toascertain whether subjective preference led tohigher SWORD ratings of SA or vice-versa Fur-ther research is needed to determine the locus ofsuch ratings

Observer-rating A second type of subjectiverating involves using independent knowledge-able observers to rate the quality of a subjectsSA A trained observer might have more infor-mation than the operator about what is reallyhappening in a simulation (through perfectknowledge gleaned from the simulation com-puter) The observer would have limited knowl-edge however about what the operators con-cept of the situation is The only informationabout the operators perception of the situationwould come from operator actions and imbed-ded or elicited verbalizations by the operator(eg from voice transmissions during the courseof the task or from verbal protocols explicitlyrequested by the experimenter) Although thisknowledge can be useful diagnostically by forexample detecting overt errors in SA (statedmisperceptions or lack of knowledge) it does notprovide a complete representation of thatknowledge The operator may-and in all like-lihood will-store information internally that is

March 1995-69

not verbalized For instance a pilot may discussefforts to ascertain the identity of a certain air-craft but knowledge about ownship system sta-tus heading other aircraft and so on might notbe mentioned at all An outside observer has noway of knowing whether the pilot is aware ofthese variables but is not discussing them be-cause they are not of immediate concern orwhether the pilot has succumbed to attentionalnarrowing and is unaware of their status Assuch the rating of SA by outside observers isalso limited

Avariation on this theme is to use a confeder-ate who acts as an associate to the operator (eganother crew member or air traffic controller inthe case of a commercial aircraft) and who re-quests certain information from the operator toencourage further verbalization as has beensuggested by Sarter and Woods (1991) In addi-tion to this technique succumbing to the samelimitations encumbering observer ratings itmay also serve to alter SA by artificially direct-ing the subjects attention to certain parame-ters Because the distribution of the subjects at-tention across the elements in the environmentlargely determines SA this method probablydoes not provide an unbiased assessment of op-erator SA

Questionnaires

Questionnaires allow for detailed informationabout subject SA to be collected on an element-by-element basis that can be evaluated againstreality thus providing an objective assessmentof operator SA This type of assessment is a moredirect measure of SA (ie it taps into ratherthan infers the operators perceptions) and doesnot require subjects or observers to make judg-ments about situational knowledge on the basisof incomplete information as subjective assess-ments do In a review of such questionnairesHerrmann (1984) concluded that when percep-tions can be evaluated on the basis of objectiveknowledge this method has been found to havegood validity Several methods of administra-tion are possible

Posttest A detailed questionnaire can be

70-March 1995

administered after the completion of eachsimulated trial This allows ample time for sub-jects to respond to a lengthy and detailed list ofquestions about their SA during the trial pro-viding needed information about subject per-ceptions Unfortunately people are not good atreporting detailed information about past men-tal events even recent ones there is a tendencyto overgeneralize and overrationalize Recall isstilted by the amount of time and interveningevents that occur between the activities of inter-est and the administration of the questionnaire(Nisbett and Wilson 1977) Earlier mispercep-tions can be quickly forgotten as the real pictureunfolds during the course of events Therefore aposttest questionnaire will reliably capture thesubjects SA only at the very end of the trial

On-line One way of overcoming this defi-ciency is to ask operators about their SA whilethey are carrying out their simulated tasks Un-fortunately this too has several drawbacksFirst of all in many situations the subject willbe under very heavy workload precluding theanswering of additional questions Such ques-tions would constitute a form of ongoing second-ary task loading that may alter performance onthe main task of operating the system Further-more the questions asked could cue the subjectto attend to the requested information on thedisplays thus altering the operators true SAAn assessment of time to answer as an indicatorof SA is also faulty as subjects may employvarious time sharing strategies between thedual tasks of operating the system and answer-ing the questions Overall this method will behighly intrusive on the primary task of systemoperation

Freeze technique To overcome the limitationsof reporting on SA after the fact several re-searchers have used a technique in which thesimulation is frozen at randomly selected timesand subjects are queried as to their perceptionsof the situation at the time (Endsley 1987 19881989a Fracker 1990 Marshak KupermanRamsey and Wilson 1987) With this techniquethe system displays are blanked and the simula-tion is suspended while subjects quickly answer

HUMAN FACTORS

questions about their current perceptions of thesituation Thus SA data can be collected imme-diately which reduces the problems incurredwhen collecting data after the fact but does notincur the problems of on-line questioning Sub-jects perceptions are then compared with thereal situation according to the simulationcomputer database to provide an objective mea-sure of SA

In a study evaluating competing aircraft dis-play concepts Marshak et al (1987) queried sub-jects about navigation threats and topographyusing this technique The resulting answerswere converted to an absolute percentage errorscore for each question allowing scores acrossdisplays to be compared along these dimen-sions Fracker (1990) in another aircraft studyrequested information on aircraft location andidentity for specifically indicated aircraft in thesimulation Mogford and Tansley (1991) col-lected data relevant to an air traffic control taskIn each study a measure of SA on only selectedparameters was obtained

The Situation Awareness Global AssessmentTechnique is a global tool developed to assessSA across all of its elements based on a compre-hensive assessment of operator SA requirements(Endsley 1987 1988 1990a) As a global mea-sure SAGAT includes queries about all SA re-quirements including Level I 2 and 3 compo-nents and considers system functioning andstatus and relevant features of the externalenvironment

Computerized versions of SAGAT have beendeveloped for air-to-air tactical aircraft (Ends-ley 1990c) and advanced bomber aircraft (Ends-ley 1989a) These versions allow queries to beadministered rapidly and SA data to be col-lected for analysis The SAGAT tool features aneasy-to-use query format that was designed tobe as compatible as possible with subject knowl-edge representations An example of a SAGATquery is shown in Figure 1 SAGATs basic meth-odology is generic and applicable to other typesof systems once a delineation of SA require-ments has been made in order to construct thequeries

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 2: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

66-March 1995

this understanding future events and systemstates can be predicted (Level 3) allowing fortimely and effective decision making Severalprocessing mechanisms have been hypothesizedto be related to SA including attention andworking memory limitations attention distribu-tion current goals mental models schemataand automaticity (Endsley 1995 this issue) Inaddition to characteristics of individuals the de-sign of a system and the operator interface canaffect SA

The enhancement of SA has become a majordesign goal for developers of operator interfacesautomation concepts and training programs jna variety of fields To evaluate the degree towhich new technologies or design concepts ac-tually improve (or degrade) operator SA it isnecessary to systematically evaluate them basedon a measure of SA which can determine thoseideas that have merit and those that have un-foreseen negative consequences Only in thisway can a design concepts utility be establishedin accordance with the design goal

In addition to evaluating design concepts ameasure of SA is also useful for evaluating theeffect of training techniques on SA conductingstudies to empirically examine factors that mayaffect SA such as individual abilities and skillsor the effectiveness of different processes andstrategies for acquiring SA and investigatingthe nature of the SA contruct itself (This type ofquantitative measurement-that is evaluatingan operators level of SA in light of conceptsunder experimental evaluation-differs signifi-cantly from research efforts that seek to directlyassess the processes operators use in acquiringSA as discussed by Adams Tenney and Pew[1995 this issue])

Evaluation of potential design concepts rou-tinely takes place within the context of rapidprototyping or part-task simulation Several dif-ferent methods for measuring SA can be consid-ered during this type of testing In additionmany of these techniques could be used to eval-uate the SA of operators working with actualsystems For the most part efforts at measuringSA thus far have been concentrated in the air-

HUMAN FACTORS

craft environment however most techniquesare also applicable to other domains

To establish the validity and reliability of anSA measurement technique it is necessary toestablish that the metric (a) measures the con-struct it claims to measure and is not a reflec-tion of other processes (b) provides the requiredinsight in the form of sensitivity and diagnostic-ity and (c) does not substantially alter the con-struct in the process which would provide bi-ased data and altered behavior In addition itwould be useful to establish the existence of arelationship between the measure and otherconstructs as would be predicted by theory Inthis case the measure of SA should be predictiveof performance and sensitive to manipulationsin workload and attention Against these crite-ria several potential measurement techniqueswill be explored along with an assessment ofthe advantages and disadvantages of each

Physiological Techniques

P300 and other electroencephalographic mea-surements have shown promise for determiningwhether information is registered cognitivelyAlthough these techniques allow researchers todetermine whether elements in the environmentare perceived and processed by subjects theycannot determine how much information re-mains in memory whether the information isregistered correctly or what comprehension thesubject has of those elements For the same rea-sons eye-tracking devices appear to fall short intheir ability to measure SA as a state of knowl-edge Furthermore these devices do not tellwhich elements in the periphery of the subjectsvision are observed or if the subject has evenprocessed what was seen (The devices may beuseful for exploring the processes operators usein achieving SA however as was recently inves-tigated by Smolensky [1993]) Known physio-logical techniques though providing useful datafor other purposes are not very promising forthe measurement of SA as a state of knowledge

Perfonnance Measures

In general performance measures have theadvantage of being objective and are usually

SITUATION AWARENESS MEASUREMENT

nonintrusive Computers for conducting systemsimulations can be programmed to record spec-ified performance data automatically makingthe required data relatively easy to collect How-ever several limitations exist in using perfor-mance data to infer SA

Global measures Global measures of perfor-mance suffer from diagnosticity and sensitivityproblems If overall operatorsystem perfor-mance is the only criterion used for evaluatingdesign concepts important system differencescan be masked despite its being a useful bot-tom-line measure Performance measures giveonly the end result of a long string of cognitiveprocesses providing little information aboutwhy poor performance may have occurred in agiven situation (assuming it can be detected re-liably during testing at all) Poor performancecould occur from a lack of information poorsampling strategies improper integration orprojection heavy workload poor decision mak-ing or action errors among other factors manyof which are not SA related (For a discussion ofthe distinction among SA decision making andperformance see the previous Endsley paper inthis issue)

As stated earlier overall system performanceis often masked by other factors In a tacticalaircraft environment for instance much of pilotperformance is by nature highly variable andsubject to the influence of many other factorsbesides SA A new system may provide the pilotwith better SA but in evaluation testing thisfact can be easily masked by excessive work-loads the intentional use of varied tactics orpoor decision making if overall mission perfor-mance is used as the only dependent measure Itwould be desirable therefore to measure SAmore directly

External task measures One type of perfor-mance measure that has been suggested in-volves artificially changing certain informationor removing certain pieces of information fromoperator displays and then measuring theamount of time required for the operator to re-act to this event (Sarter and Woods 1991) Asidefrom the fact that such a manipulation is heavily

March 1995-67

intrusive requmng the subject to undertaketasks involved with discovering what happenedto the changed or missing data while attemptingto maintain satisfactory performance on othertasks this technique may provide highly mis-leading results It assumes that an operator willact in a certain way when in fact operators of-ten employ compensating schemes to functionunder such circumstances For instance if a dis-played aircraft suddenly disappears the opera-tor may assume that equipment has malfunc-tioned the aircraft was destroyed or landed orthe aircrafts emitting equipment was turnedoff rendering it more difficult to detect (a notinfrequent occurrence) In any case the operatormay choose to ignore the disappearance assum-ing it will come back on the next several sweepsof the radar worry about it but not say any-thing or put off dealing with the disappearanceuntil other tasks are complete Any of these ac-tions will yield highly misleading results for theexperimenter who expects SA to be reflected bythe operators overt behavior

This measurement technique not only has in-valid assumptions but it alters the subjects on-going tasks possibly affecting attention and thusSA itself Anytime one artificially alters the re-alism of the simulation it can fundamentallychange the way the operator conceptualizes theunderlying information (see Manktelow andJones 1987) thus altering both SA and decisionmaking In addition such a manipulation wouldcertainly interfere with any concurrent work-load or performance measurement undertakenduring testing

Imbedded task measures Some informationabout SA can be determined from examiningperformance on specific operator sub tasks thatare of interest For example when evaluating analtitude display deviations from prescribed al-titude levels or time to reach a certain altitudecan be measured This type of detailed perfor-mance measure can provide some inferencesregarding the amount of SA that a display re-veals about a specific parameter Such measureswill be more meaningful than global perfor-mance measures and will not suffer from the

68-March 1995

same problems of intrusiveness as external taskmeasures Although selecting finite task mea-sures for evaluating certain kinds of systemsmay be easy determining appropriate measuresfor other systems may be more difficult An ex-pert system for instance may influence manyfactors in a global not readily predictedmanner

The major limitation of the imbedded taskmeasure approach stems from the interactivenature of SA subcomponents A system that pro-vides SAon one element may simultaneously re-duce SAon another unmeasured element In ad-dition it is easy for subjects to have biasedattention on the issue under evaluation in a par-ticular study (eg altitude) if they figure out thepurpose of the study Therefore because im-proved SA on some elements may result in de-creased SA on others relying solely on perfor-mance measures of specific parameters canyield misleading results

Researchers need to know how much SA op-erators have when taxed with multiple compet-ing demands on their attention during systemoperations To this end a global measure of SAthat simultaneously depicts SA across many el-ements of interest is desirable To improve SAdesigners need to be able to evaluate the entireimpact of design concepts on operator SA

Subjective Techniques

Self-rating One simple technique is to ask op-erators to subjectively rate their own SA (eg ona I-to-lO scale) Researchers in the AdvancedMedium-Range Air-to-Air Missile OperationUtility Evaluation Test (AMRAAMOUE) study(McDonnell Douglas Aircraft Corporation 1982)used this method Pilot (and overall flight) SAwas subjectively rated by the participants and atrained observer The main advantages of sub-jective estimation techniques are low cost andease of use In general however the subjectiveself-rating of SA has several limitations

If the ratings are collected during a simulationtrial the operators ability to estimate their ownSA will be limited because they do not knowwhat is really happening in the environment

HUMAN FACTORS

(they have only their perceptions of that reality)Operators may know when they do not have aclue as to what is going on but will probably notknow if their knowledge is incomplete orinaccurate

If operators are asked to subjectively evaluateSA in a posttrial debriefing session the ratingmay also be highly tainted by the outcome of thetrial When performance is favorable whetherthrough good SA or good luck an operator willmost likely report good SA and vice-versa In areevaluation of the AMRAAMOUE study Ven-turino Hamilton and Dvorchak (1990) foundthat posttrial subjective SA ratings were highlycorrelated with performance supporting thispremise In addition when ratings are gatheredafter the mission operators will probably be in-clined to rationalize and overgeneralize abouttheir SA as has been shown when informationabout mental processes is collected after the fact(Nisbett and Wilson 1977) Thus detailed infor-mation will be lost or misconstrued

What do such subjective estimates actuallymeasure I would speculate that self-ratings ofSA most likely convey a measure of subjectsconfidence levels regarding that SA-that ishow comfortable they feel about their SA Sub-jects who know they have incomplete knowledgeor understanding would subjectively rate theirSA as low Subjects whose knowledge may notbe any greater but who do not subjectively havethe same concerns about how much is notknown would rate their SA higher In otherwords ignorance may be bliss

Several efforts have been made to developmore rigorous subjective measures of SA Taylor(1990) developed the Situation Awareness Rat-ing Technique (SART) which allows operatorsto rate a system design based on the amount ofdemand on attentional resources supply of at-tentional resources and understanding of thesituation provided As such it considers opera-tors perceived workload (supply and demandon attentional resources) in addition to theirperceived understanding of the situation Al-though SART has been shown to be correlatedwith performance measures (Selcon and Taylor

SITUATION AWARENESS MEASUREMENT

1990) it is unclear whether this is attribu-table to the workload or the understandingcomponents

In a new application the Subjective WorkloadDominance (SWORD) metric (Vidulich 1989)has been applied as a subjective rating tool forSA (Hughes Hassoun and Ward 1990) SWORDallows subjects to make pairwise comparativeratings of competing design concepts along acontinuum that expresses the degree to whichone concept entails less workload than the otherThe resultant preferences are then combined us-ing the analytic hierarchy process technique toprovide a linear ordering of the design conceptsIn the Hughes et al study SWORD was modi-fied to allow pilots to rate the degree to whichpresented display concepts provided SA insteadof workload Not surprisingly the display thatwas subjectively rated as providing the best SAusing SWORD was the display for which pilotsexpressed a strong preference It is difficult toascertain whether subjective preference led tohigher SWORD ratings of SA or vice-versa Fur-ther research is needed to determine the locus ofsuch ratings

Observer-rating A second type of subjectiverating involves using independent knowledge-able observers to rate the quality of a subjectsSA A trained observer might have more infor-mation than the operator about what is reallyhappening in a simulation (through perfectknowledge gleaned from the simulation com-puter) The observer would have limited knowl-edge however about what the operators con-cept of the situation is The only informationabout the operators perception of the situationwould come from operator actions and imbed-ded or elicited verbalizations by the operator(eg from voice transmissions during the courseof the task or from verbal protocols explicitlyrequested by the experimenter) Although thisknowledge can be useful diagnostically by forexample detecting overt errors in SA (statedmisperceptions or lack of knowledge) it does notprovide a complete representation of thatknowledge The operator may-and in all like-lihood will-store information internally that is

March 1995-69

not verbalized For instance a pilot may discussefforts to ascertain the identity of a certain air-craft but knowledge about ownship system sta-tus heading other aircraft and so on might notbe mentioned at all An outside observer has noway of knowing whether the pilot is aware ofthese variables but is not discussing them be-cause they are not of immediate concern orwhether the pilot has succumbed to attentionalnarrowing and is unaware of their status Assuch the rating of SA by outside observers isalso limited

Avariation on this theme is to use a confeder-ate who acts as an associate to the operator (eganother crew member or air traffic controller inthe case of a commercial aircraft) and who re-quests certain information from the operator toencourage further verbalization as has beensuggested by Sarter and Woods (1991) In addi-tion to this technique succumbing to the samelimitations encumbering observer ratings itmay also serve to alter SA by artificially direct-ing the subjects attention to certain parame-ters Because the distribution of the subjects at-tention across the elements in the environmentlargely determines SA this method probablydoes not provide an unbiased assessment of op-erator SA

Questionnaires

Questionnaires allow for detailed informationabout subject SA to be collected on an element-by-element basis that can be evaluated againstreality thus providing an objective assessmentof operator SA This type of assessment is a moredirect measure of SA (ie it taps into ratherthan infers the operators perceptions) and doesnot require subjects or observers to make judg-ments about situational knowledge on the basisof incomplete information as subjective assess-ments do In a review of such questionnairesHerrmann (1984) concluded that when percep-tions can be evaluated on the basis of objectiveknowledge this method has been found to havegood validity Several methods of administra-tion are possible

Posttest A detailed questionnaire can be

70-March 1995

administered after the completion of eachsimulated trial This allows ample time for sub-jects to respond to a lengthy and detailed list ofquestions about their SA during the trial pro-viding needed information about subject per-ceptions Unfortunately people are not good atreporting detailed information about past men-tal events even recent ones there is a tendencyto overgeneralize and overrationalize Recall isstilted by the amount of time and interveningevents that occur between the activities of inter-est and the administration of the questionnaire(Nisbett and Wilson 1977) Earlier mispercep-tions can be quickly forgotten as the real pictureunfolds during the course of events Therefore aposttest questionnaire will reliably capture thesubjects SA only at the very end of the trial

On-line One way of overcoming this defi-ciency is to ask operators about their SA whilethey are carrying out their simulated tasks Un-fortunately this too has several drawbacksFirst of all in many situations the subject willbe under very heavy workload precluding theanswering of additional questions Such ques-tions would constitute a form of ongoing second-ary task loading that may alter performance onthe main task of operating the system Further-more the questions asked could cue the subjectto attend to the requested information on thedisplays thus altering the operators true SAAn assessment of time to answer as an indicatorof SA is also faulty as subjects may employvarious time sharing strategies between thedual tasks of operating the system and answer-ing the questions Overall this method will behighly intrusive on the primary task of systemoperation

Freeze technique To overcome the limitationsof reporting on SA after the fact several re-searchers have used a technique in which thesimulation is frozen at randomly selected timesand subjects are queried as to their perceptionsof the situation at the time (Endsley 1987 19881989a Fracker 1990 Marshak KupermanRamsey and Wilson 1987) With this techniquethe system displays are blanked and the simula-tion is suspended while subjects quickly answer

HUMAN FACTORS

questions about their current perceptions of thesituation Thus SA data can be collected imme-diately which reduces the problems incurredwhen collecting data after the fact but does notincur the problems of on-line questioning Sub-jects perceptions are then compared with thereal situation according to the simulationcomputer database to provide an objective mea-sure of SA

In a study evaluating competing aircraft dis-play concepts Marshak et al (1987) queried sub-jects about navigation threats and topographyusing this technique The resulting answerswere converted to an absolute percentage errorscore for each question allowing scores acrossdisplays to be compared along these dimen-sions Fracker (1990) in another aircraft studyrequested information on aircraft location andidentity for specifically indicated aircraft in thesimulation Mogford and Tansley (1991) col-lected data relevant to an air traffic control taskIn each study a measure of SA on only selectedparameters was obtained

The Situation Awareness Global AssessmentTechnique is a global tool developed to assessSA across all of its elements based on a compre-hensive assessment of operator SA requirements(Endsley 1987 1988 1990a) As a global mea-sure SAGAT includes queries about all SA re-quirements including Level I 2 and 3 compo-nents and considers system functioning andstatus and relevant features of the externalenvironment

Computerized versions of SAGAT have beendeveloped for air-to-air tactical aircraft (Ends-ley 1990c) and advanced bomber aircraft (Ends-ley 1989a) These versions allow queries to beadministered rapidly and SA data to be col-lected for analysis The SAGAT tool features aneasy-to-use query format that was designed tobe as compatible as possible with subject knowl-edge representations An example of a SAGATquery is shown in Figure 1 SAGATs basic meth-odology is generic and applicable to other typesof systems once a delineation of SA require-ments has been made in order to construct thequeries

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 3: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT

nonintrusive Computers for conducting systemsimulations can be programmed to record spec-ified performance data automatically makingthe required data relatively easy to collect How-ever several limitations exist in using perfor-mance data to infer SA

Global measures Global measures of perfor-mance suffer from diagnosticity and sensitivityproblems If overall operatorsystem perfor-mance is the only criterion used for evaluatingdesign concepts important system differencescan be masked despite its being a useful bot-tom-line measure Performance measures giveonly the end result of a long string of cognitiveprocesses providing little information aboutwhy poor performance may have occurred in agiven situation (assuming it can be detected re-liably during testing at all) Poor performancecould occur from a lack of information poorsampling strategies improper integration orprojection heavy workload poor decision mak-ing or action errors among other factors manyof which are not SA related (For a discussion ofthe distinction among SA decision making andperformance see the previous Endsley paper inthis issue)

As stated earlier overall system performanceis often masked by other factors In a tacticalaircraft environment for instance much of pilotperformance is by nature highly variable andsubject to the influence of many other factorsbesides SA A new system may provide the pilotwith better SA but in evaluation testing thisfact can be easily masked by excessive work-loads the intentional use of varied tactics orpoor decision making if overall mission perfor-mance is used as the only dependent measure Itwould be desirable therefore to measure SAmore directly

External task measures One type of perfor-mance measure that has been suggested in-volves artificially changing certain informationor removing certain pieces of information fromoperator displays and then measuring theamount of time required for the operator to re-act to this event (Sarter and Woods 1991) Asidefrom the fact that such a manipulation is heavily

March 1995-67

intrusive requmng the subject to undertaketasks involved with discovering what happenedto the changed or missing data while attemptingto maintain satisfactory performance on othertasks this technique may provide highly mis-leading results It assumes that an operator willact in a certain way when in fact operators of-ten employ compensating schemes to functionunder such circumstances For instance if a dis-played aircraft suddenly disappears the opera-tor may assume that equipment has malfunc-tioned the aircraft was destroyed or landed orthe aircrafts emitting equipment was turnedoff rendering it more difficult to detect (a notinfrequent occurrence) In any case the operatormay choose to ignore the disappearance assum-ing it will come back on the next several sweepsof the radar worry about it but not say any-thing or put off dealing with the disappearanceuntil other tasks are complete Any of these ac-tions will yield highly misleading results for theexperimenter who expects SA to be reflected bythe operators overt behavior

This measurement technique not only has in-valid assumptions but it alters the subjects on-going tasks possibly affecting attention and thusSA itself Anytime one artificially alters the re-alism of the simulation it can fundamentallychange the way the operator conceptualizes theunderlying information (see Manktelow andJones 1987) thus altering both SA and decisionmaking In addition such a manipulation wouldcertainly interfere with any concurrent work-load or performance measurement undertakenduring testing

Imbedded task measures Some informationabout SA can be determined from examiningperformance on specific operator sub tasks thatare of interest For example when evaluating analtitude display deviations from prescribed al-titude levels or time to reach a certain altitudecan be measured This type of detailed perfor-mance measure can provide some inferencesregarding the amount of SA that a display re-veals about a specific parameter Such measureswill be more meaningful than global perfor-mance measures and will not suffer from the

68-March 1995

same problems of intrusiveness as external taskmeasures Although selecting finite task mea-sures for evaluating certain kinds of systemsmay be easy determining appropriate measuresfor other systems may be more difficult An ex-pert system for instance may influence manyfactors in a global not readily predictedmanner

The major limitation of the imbedded taskmeasure approach stems from the interactivenature of SA subcomponents A system that pro-vides SAon one element may simultaneously re-duce SAon another unmeasured element In ad-dition it is easy for subjects to have biasedattention on the issue under evaluation in a par-ticular study (eg altitude) if they figure out thepurpose of the study Therefore because im-proved SA on some elements may result in de-creased SA on others relying solely on perfor-mance measures of specific parameters canyield misleading results

Researchers need to know how much SA op-erators have when taxed with multiple compet-ing demands on their attention during systemoperations To this end a global measure of SAthat simultaneously depicts SA across many el-ements of interest is desirable To improve SAdesigners need to be able to evaluate the entireimpact of design concepts on operator SA

Subjective Techniques

Self-rating One simple technique is to ask op-erators to subjectively rate their own SA (eg ona I-to-lO scale) Researchers in the AdvancedMedium-Range Air-to-Air Missile OperationUtility Evaluation Test (AMRAAMOUE) study(McDonnell Douglas Aircraft Corporation 1982)used this method Pilot (and overall flight) SAwas subjectively rated by the participants and atrained observer The main advantages of sub-jective estimation techniques are low cost andease of use In general however the subjectiveself-rating of SA has several limitations

If the ratings are collected during a simulationtrial the operators ability to estimate their ownSA will be limited because they do not knowwhat is really happening in the environment

HUMAN FACTORS

(they have only their perceptions of that reality)Operators may know when they do not have aclue as to what is going on but will probably notknow if their knowledge is incomplete orinaccurate

If operators are asked to subjectively evaluateSA in a posttrial debriefing session the ratingmay also be highly tainted by the outcome of thetrial When performance is favorable whetherthrough good SA or good luck an operator willmost likely report good SA and vice-versa In areevaluation of the AMRAAMOUE study Ven-turino Hamilton and Dvorchak (1990) foundthat posttrial subjective SA ratings were highlycorrelated with performance supporting thispremise In addition when ratings are gatheredafter the mission operators will probably be in-clined to rationalize and overgeneralize abouttheir SA as has been shown when informationabout mental processes is collected after the fact(Nisbett and Wilson 1977) Thus detailed infor-mation will be lost or misconstrued

What do such subjective estimates actuallymeasure I would speculate that self-ratings ofSA most likely convey a measure of subjectsconfidence levels regarding that SA-that ishow comfortable they feel about their SA Sub-jects who know they have incomplete knowledgeor understanding would subjectively rate theirSA as low Subjects whose knowledge may notbe any greater but who do not subjectively havethe same concerns about how much is notknown would rate their SA higher In otherwords ignorance may be bliss

Several efforts have been made to developmore rigorous subjective measures of SA Taylor(1990) developed the Situation Awareness Rat-ing Technique (SART) which allows operatorsto rate a system design based on the amount ofdemand on attentional resources supply of at-tentional resources and understanding of thesituation provided As such it considers opera-tors perceived workload (supply and demandon attentional resources) in addition to theirperceived understanding of the situation Al-though SART has been shown to be correlatedwith performance measures (Selcon and Taylor

SITUATION AWARENESS MEASUREMENT

1990) it is unclear whether this is attribu-table to the workload or the understandingcomponents

In a new application the Subjective WorkloadDominance (SWORD) metric (Vidulich 1989)has been applied as a subjective rating tool forSA (Hughes Hassoun and Ward 1990) SWORDallows subjects to make pairwise comparativeratings of competing design concepts along acontinuum that expresses the degree to whichone concept entails less workload than the otherThe resultant preferences are then combined us-ing the analytic hierarchy process technique toprovide a linear ordering of the design conceptsIn the Hughes et al study SWORD was modi-fied to allow pilots to rate the degree to whichpresented display concepts provided SA insteadof workload Not surprisingly the display thatwas subjectively rated as providing the best SAusing SWORD was the display for which pilotsexpressed a strong preference It is difficult toascertain whether subjective preference led tohigher SWORD ratings of SA or vice-versa Fur-ther research is needed to determine the locus ofsuch ratings

Observer-rating A second type of subjectiverating involves using independent knowledge-able observers to rate the quality of a subjectsSA A trained observer might have more infor-mation than the operator about what is reallyhappening in a simulation (through perfectknowledge gleaned from the simulation com-puter) The observer would have limited knowl-edge however about what the operators con-cept of the situation is The only informationabout the operators perception of the situationwould come from operator actions and imbed-ded or elicited verbalizations by the operator(eg from voice transmissions during the courseof the task or from verbal protocols explicitlyrequested by the experimenter) Although thisknowledge can be useful diagnostically by forexample detecting overt errors in SA (statedmisperceptions or lack of knowledge) it does notprovide a complete representation of thatknowledge The operator may-and in all like-lihood will-store information internally that is

March 1995-69

not verbalized For instance a pilot may discussefforts to ascertain the identity of a certain air-craft but knowledge about ownship system sta-tus heading other aircraft and so on might notbe mentioned at all An outside observer has noway of knowing whether the pilot is aware ofthese variables but is not discussing them be-cause they are not of immediate concern orwhether the pilot has succumbed to attentionalnarrowing and is unaware of their status Assuch the rating of SA by outside observers isalso limited

Avariation on this theme is to use a confeder-ate who acts as an associate to the operator (eganother crew member or air traffic controller inthe case of a commercial aircraft) and who re-quests certain information from the operator toencourage further verbalization as has beensuggested by Sarter and Woods (1991) In addi-tion to this technique succumbing to the samelimitations encumbering observer ratings itmay also serve to alter SA by artificially direct-ing the subjects attention to certain parame-ters Because the distribution of the subjects at-tention across the elements in the environmentlargely determines SA this method probablydoes not provide an unbiased assessment of op-erator SA

Questionnaires

Questionnaires allow for detailed informationabout subject SA to be collected on an element-by-element basis that can be evaluated againstreality thus providing an objective assessmentof operator SA This type of assessment is a moredirect measure of SA (ie it taps into ratherthan infers the operators perceptions) and doesnot require subjects or observers to make judg-ments about situational knowledge on the basisof incomplete information as subjective assess-ments do In a review of such questionnairesHerrmann (1984) concluded that when percep-tions can be evaluated on the basis of objectiveknowledge this method has been found to havegood validity Several methods of administra-tion are possible

Posttest A detailed questionnaire can be

70-March 1995

administered after the completion of eachsimulated trial This allows ample time for sub-jects to respond to a lengthy and detailed list ofquestions about their SA during the trial pro-viding needed information about subject per-ceptions Unfortunately people are not good atreporting detailed information about past men-tal events even recent ones there is a tendencyto overgeneralize and overrationalize Recall isstilted by the amount of time and interveningevents that occur between the activities of inter-est and the administration of the questionnaire(Nisbett and Wilson 1977) Earlier mispercep-tions can be quickly forgotten as the real pictureunfolds during the course of events Therefore aposttest questionnaire will reliably capture thesubjects SA only at the very end of the trial

On-line One way of overcoming this defi-ciency is to ask operators about their SA whilethey are carrying out their simulated tasks Un-fortunately this too has several drawbacksFirst of all in many situations the subject willbe under very heavy workload precluding theanswering of additional questions Such ques-tions would constitute a form of ongoing second-ary task loading that may alter performance onthe main task of operating the system Further-more the questions asked could cue the subjectto attend to the requested information on thedisplays thus altering the operators true SAAn assessment of time to answer as an indicatorof SA is also faulty as subjects may employvarious time sharing strategies between thedual tasks of operating the system and answer-ing the questions Overall this method will behighly intrusive on the primary task of systemoperation

Freeze technique To overcome the limitationsof reporting on SA after the fact several re-searchers have used a technique in which thesimulation is frozen at randomly selected timesand subjects are queried as to their perceptionsof the situation at the time (Endsley 1987 19881989a Fracker 1990 Marshak KupermanRamsey and Wilson 1987) With this techniquethe system displays are blanked and the simula-tion is suspended while subjects quickly answer

HUMAN FACTORS

questions about their current perceptions of thesituation Thus SA data can be collected imme-diately which reduces the problems incurredwhen collecting data after the fact but does notincur the problems of on-line questioning Sub-jects perceptions are then compared with thereal situation according to the simulationcomputer database to provide an objective mea-sure of SA

In a study evaluating competing aircraft dis-play concepts Marshak et al (1987) queried sub-jects about navigation threats and topographyusing this technique The resulting answerswere converted to an absolute percentage errorscore for each question allowing scores acrossdisplays to be compared along these dimen-sions Fracker (1990) in another aircraft studyrequested information on aircraft location andidentity for specifically indicated aircraft in thesimulation Mogford and Tansley (1991) col-lected data relevant to an air traffic control taskIn each study a measure of SA on only selectedparameters was obtained

The Situation Awareness Global AssessmentTechnique is a global tool developed to assessSA across all of its elements based on a compre-hensive assessment of operator SA requirements(Endsley 1987 1988 1990a) As a global mea-sure SAGAT includes queries about all SA re-quirements including Level I 2 and 3 compo-nents and considers system functioning andstatus and relevant features of the externalenvironment

Computerized versions of SAGAT have beendeveloped for air-to-air tactical aircraft (Ends-ley 1990c) and advanced bomber aircraft (Ends-ley 1989a) These versions allow queries to beadministered rapidly and SA data to be col-lected for analysis The SAGAT tool features aneasy-to-use query format that was designed tobe as compatible as possible with subject knowl-edge representations An example of a SAGATquery is shown in Figure 1 SAGATs basic meth-odology is generic and applicable to other typesof systems once a delineation of SA require-ments has been made in order to construct thequeries

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 4: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

68-March 1995

same problems of intrusiveness as external taskmeasures Although selecting finite task mea-sures for evaluating certain kinds of systemsmay be easy determining appropriate measuresfor other systems may be more difficult An ex-pert system for instance may influence manyfactors in a global not readily predictedmanner

The major limitation of the imbedded taskmeasure approach stems from the interactivenature of SA subcomponents A system that pro-vides SAon one element may simultaneously re-duce SAon another unmeasured element In ad-dition it is easy for subjects to have biasedattention on the issue under evaluation in a par-ticular study (eg altitude) if they figure out thepurpose of the study Therefore because im-proved SA on some elements may result in de-creased SA on others relying solely on perfor-mance measures of specific parameters canyield misleading results

Researchers need to know how much SA op-erators have when taxed with multiple compet-ing demands on their attention during systemoperations To this end a global measure of SAthat simultaneously depicts SA across many el-ements of interest is desirable To improve SAdesigners need to be able to evaluate the entireimpact of design concepts on operator SA

Subjective Techniques

Self-rating One simple technique is to ask op-erators to subjectively rate their own SA (eg ona I-to-lO scale) Researchers in the AdvancedMedium-Range Air-to-Air Missile OperationUtility Evaluation Test (AMRAAMOUE) study(McDonnell Douglas Aircraft Corporation 1982)used this method Pilot (and overall flight) SAwas subjectively rated by the participants and atrained observer The main advantages of sub-jective estimation techniques are low cost andease of use In general however the subjectiveself-rating of SA has several limitations

If the ratings are collected during a simulationtrial the operators ability to estimate their ownSA will be limited because they do not knowwhat is really happening in the environment

HUMAN FACTORS

(they have only their perceptions of that reality)Operators may know when they do not have aclue as to what is going on but will probably notknow if their knowledge is incomplete orinaccurate

If operators are asked to subjectively evaluateSA in a posttrial debriefing session the ratingmay also be highly tainted by the outcome of thetrial When performance is favorable whetherthrough good SA or good luck an operator willmost likely report good SA and vice-versa In areevaluation of the AMRAAMOUE study Ven-turino Hamilton and Dvorchak (1990) foundthat posttrial subjective SA ratings were highlycorrelated with performance supporting thispremise In addition when ratings are gatheredafter the mission operators will probably be in-clined to rationalize and overgeneralize abouttheir SA as has been shown when informationabout mental processes is collected after the fact(Nisbett and Wilson 1977) Thus detailed infor-mation will be lost or misconstrued

What do such subjective estimates actuallymeasure I would speculate that self-ratings ofSA most likely convey a measure of subjectsconfidence levels regarding that SA-that ishow comfortable they feel about their SA Sub-jects who know they have incomplete knowledgeor understanding would subjectively rate theirSA as low Subjects whose knowledge may notbe any greater but who do not subjectively havethe same concerns about how much is notknown would rate their SA higher In otherwords ignorance may be bliss

Several efforts have been made to developmore rigorous subjective measures of SA Taylor(1990) developed the Situation Awareness Rat-ing Technique (SART) which allows operatorsto rate a system design based on the amount ofdemand on attentional resources supply of at-tentional resources and understanding of thesituation provided As such it considers opera-tors perceived workload (supply and demandon attentional resources) in addition to theirperceived understanding of the situation Al-though SART has been shown to be correlatedwith performance measures (Selcon and Taylor

SITUATION AWARENESS MEASUREMENT

1990) it is unclear whether this is attribu-table to the workload or the understandingcomponents

In a new application the Subjective WorkloadDominance (SWORD) metric (Vidulich 1989)has been applied as a subjective rating tool forSA (Hughes Hassoun and Ward 1990) SWORDallows subjects to make pairwise comparativeratings of competing design concepts along acontinuum that expresses the degree to whichone concept entails less workload than the otherThe resultant preferences are then combined us-ing the analytic hierarchy process technique toprovide a linear ordering of the design conceptsIn the Hughes et al study SWORD was modi-fied to allow pilots to rate the degree to whichpresented display concepts provided SA insteadof workload Not surprisingly the display thatwas subjectively rated as providing the best SAusing SWORD was the display for which pilotsexpressed a strong preference It is difficult toascertain whether subjective preference led tohigher SWORD ratings of SA or vice-versa Fur-ther research is needed to determine the locus ofsuch ratings

Observer-rating A second type of subjectiverating involves using independent knowledge-able observers to rate the quality of a subjectsSA A trained observer might have more infor-mation than the operator about what is reallyhappening in a simulation (through perfectknowledge gleaned from the simulation com-puter) The observer would have limited knowl-edge however about what the operators con-cept of the situation is The only informationabout the operators perception of the situationwould come from operator actions and imbed-ded or elicited verbalizations by the operator(eg from voice transmissions during the courseof the task or from verbal protocols explicitlyrequested by the experimenter) Although thisknowledge can be useful diagnostically by forexample detecting overt errors in SA (statedmisperceptions or lack of knowledge) it does notprovide a complete representation of thatknowledge The operator may-and in all like-lihood will-store information internally that is

March 1995-69

not verbalized For instance a pilot may discussefforts to ascertain the identity of a certain air-craft but knowledge about ownship system sta-tus heading other aircraft and so on might notbe mentioned at all An outside observer has noway of knowing whether the pilot is aware ofthese variables but is not discussing them be-cause they are not of immediate concern orwhether the pilot has succumbed to attentionalnarrowing and is unaware of their status Assuch the rating of SA by outside observers isalso limited

Avariation on this theme is to use a confeder-ate who acts as an associate to the operator (eganother crew member or air traffic controller inthe case of a commercial aircraft) and who re-quests certain information from the operator toencourage further verbalization as has beensuggested by Sarter and Woods (1991) In addi-tion to this technique succumbing to the samelimitations encumbering observer ratings itmay also serve to alter SA by artificially direct-ing the subjects attention to certain parame-ters Because the distribution of the subjects at-tention across the elements in the environmentlargely determines SA this method probablydoes not provide an unbiased assessment of op-erator SA

Questionnaires

Questionnaires allow for detailed informationabout subject SA to be collected on an element-by-element basis that can be evaluated againstreality thus providing an objective assessmentof operator SA This type of assessment is a moredirect measure of SA (ie it taps into ratherthan infers the operators perceptions) and doesnot require subjects or observers to make judg-ments about situational knowledge on the basisof incomplete information as subjective assess-ments do In a review of such questionnairesHerrmann (1984) concluded that when percep-tions can be evaluated on the basis of objectiveknowledge this method has been found to havegood validity Several methods of administra-tion are possible

Posttest A detailed questionnaire can be

70-March 1995

administered after the completion of eachsimulated trial This allows ample time for sub-jects to respond to a lengthy and detailed list ofquestions about their SA during the trial pro-viding needed information about subject per-ceptions Unfortunately people are not good atreporting detailed information about past men-tal events even recent ones there is a tendencyto overgeneralize and overrationalize Recall isstilted by the amount of time and interveningevents that occur between the activities of inter-est and the administration of the questionnaire(Nisbett and Wilson 1977) Earlier mispercep-tions can be quickly forgotten as the real pictureunfolds during the course of events Therefore aposttest questionnaire will reliably capture thesubjects SA only at the very end of the trial

On-line One way of overcoming this defi-ciency is to ask operators about their SA whilethey are carrying out their simulated tasks Un-fortunately this too has several drawbacksFirst of all in many situations the subject willbe under very heavy workload precluding theanswering of additional questions Such ques-tions would constitute a form of ongoing second-ary task loading that may alter performance onthe main task of operating the system Further-more the questions asked could cue the subjectto attend to the requested information on thedisplays thus altering the operators true SAAn assessment of time to answer as an indicatorof SA is also faulty as subjects may employvarious time sharing strategies between thedual tasks of operating the system and answer-ing the questions Overall this method will behighly intrusive on the primary task of systemoperation

Freeze technique To overcome the limitationsof reporting on SA after the fact several re-searchers have used a technique in which thesimulation is frozen at randomly selected timesand subjects are queried as to their perceptionsof the situation at the time (Endsley 1987 19881989a Fracker 1990 Marshak KupermanRamsey and Wilson 1987) With this techniquethe system displays are blanked and the simula-tion is suspended while subjects quickly answer

HUMAN FACTORS

questions about their current perceptions of thesituation Thus SA data can be collected imme-diately which reduces the problems incurredwhen collecting data after the fact but does notincur the problems of on-line questioning Sub-jects perceptions are then compared with thereal situation according to the simulationcomputer database to provide an objective mea-sure of SA

In a study evaluating competing aircraft dis-play concepts Marshak et al (1987) queried sub-jects about navigation threats and topographyusing this technique The resulting answerswere converted to an absolute percentage errorscore for each question allowing scores acrossdisplays to be compared along these dimen-sions Fracker (1990) in another aircraft studyrequested information on aircraft location andidentity for specifically indicated aircraft in thesimulation Mogford and Tansley (1991) col-lected data relevant to an air traffic control taskIn each study a measure of SA on only selectedparameters was obtained

The Situation Awareness Global AssessmentTechnique is a global tool developed to assessSA across all of its elements based on a compre-hensive assessment of operator SA requirements(Endsley 1987 1988 1990a) As a global mea-sure SAGAT includes queries about all SA re-quirements including Level I 2 and 3 compo-nents and considers system functioning andstatus and relevant features of the externalenvironment

Computerized versions of SAGAT have beendeveloped for air-to-air tactical aircraft (Ends-ley 1990c) and advanced bomber aircraft (Ends-ley 1989a) These versions allow queries to beadministered rapidly and SA data to be col-lected for analysis The SAGAT tool features aneasy-to-use query format that was designed tobe as compatible as possible with subject knowl-edge representations An example of a SAGATquery is shown in Figure 1 SAGATs basic meth-odology is generic and applicable to other typesof systems once a delineation of SA require-ments has been made in order to construct thequeries

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 5: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT

1990) it is unclear whether this is attribu-table to the workload or the understandingcomponents

In a new application the Subjective WorkloadDominance (SWORD) metric (Vidulich 1989)has been applied as a subjective rating tool forSA (Hughes Hassoun and Ward 1990) SWORDallows subjects to make pairwise comparativeratings of competing design concepts along acontinuum that expresses the degree to whichone concept entails less workload than the otherThe resultant preferences are then combined us-ing the analytic hierarchy process technique toprovide a linear ordering of the design conceptsIn the Hughes et al study SWORD was modi-fied to allow pilots to rate the degree to whichpresented display concepts provided SA insteadof workload Not surprisingly the display thatwas subjectively rated as providing the best SAusing SWORD was the display for which pilotsexpressed a strong preference It is difficult toascertain whether subjective preference led tohigher SWORD ratings of SA or vice-versa Fur-ther research is needed to determine the locus ofsuch ratings

Observer-rating A second type of subjectiverating involves using independent knowledge-able observers to rate the quality of a subjectsSA A trained observer might have more infor-mation than the operator about what is reallyhappening in a simulation (through perfectknowledge gleaned from the simulation com-puter) The observer would have limited knowl-edge however about what the operators con-cept of the situation is The only informationabout the operators perception of the situationwould come from operator actions and imbed-ded or elicited verbalizations by the operator(eg from voice transmissions during the courseof the task or from verbal protocols explicitlyrequested by the experimenter) Although thisknowledge can be useful diagnostically by forexample detecting overt errors in SA (statedmisperceptions or lack of knowledge) it does notprovide a complete representation of thatknowledge The operator may-and in all like-lihood will-store information internally that is

March 1995-69

not verbalized For instance a pilot may discussefforts to ascertain the identity of a certain air-craft but knowledge about ownship system sta-tus heading other aircraft and so on might notbe mentioned at all An outside observer has noway of knowing whether the pilot is aware ofthese variables but is not discussing them be-cause they are not of immediate concern orwhether the pilot has succumbed to attentionalnarrowing and is unaware of their status Assuch the rating of SA by outside observers isalso limited

Avariation on this theme is to use a confeder-ate who acts as an associate to the operator (eganother crew member or air traffic controller inthe case of a commercial aircraft) and who re-quests certain information from the operator toencourage further verbalization as has beensuggested by Sarter and Woods (1991) In addi-tion to this technique succumbing to the samelimitations encumbering observer ratings itmay also serve to alter SA by artificially direct-ing the subjects attention to certain parame-ters Because the distribution of the subjects at-tention across the elements in the environmentlargely determines SA this method probablydoes not provide an unbiased assessment of op-erator SA

Questionnaires

Questionnaires allow for detailed informationabout subject SA to be collected on an element-by-element basis that can be evaluated againstreality thus providing an objective assessmentof operator SA This type of assessment is a moredirect measure of SA (ie it taps into ratherthan infers the operators perceptions) and doesnot require subjects or observers to make judg-ments about situational knowledge on the basisof incomplete information as subjective assess-ments do In a review of such questionnairesHerrmann (1984) concluded that when percep-tions can be evaluated on the basis of objectiveknowledge this method has been found to havegood validity Several methods of administra-tion are possible

Posttest A detailed questionnaire can be

70-March 1995

administered after the completion of eachsimulated trial This allows ample time for sub-jects to respond to a lengthy and detailed list ofquestions about their SA during the trial pro-viding needed information about subject per-ceptions Unfortunately people are not good atreporting detailed information about past men-tal events even recent ones there is a tendencyto overgeneralize and overrationalize Recall isstilted by the amount of time and interveningevents that occur between the activities of inter-est and the administration of the questionnaire(Nisbett and Wilson 1977) Earlier mispercep-tions can be quickly forgotten as the real pictureunfolds during the course of events Therefore aposttest questionnaire will reliably capture thesubjects SA only at the very end of the trial

On-line One way of overcoming this defi-ciency is to ask operators about their SA whilethey are carrying out their simulated tasks Un-fortunately this too has several drawbacksFirst of all in many situations the subject willbe under very heavy workload precluding theanswering of additional questions Such ques-tions would constitute a form of ongoing second-ary task loading that may alter performance onthe main task of operating the system Further-more the questions asked could cue the subjectto attend to the requested information on thedisplays thus altering the operators true SAAn assessment of time to answer as an indicatorof SA is also faulty as subjects may employvarious time sharing strategies between thedual tasks of operating the system and answer-ing the questions Overall this method will behighly intrusive on the primary task of systemoperation

Freeze technique To overcome the limitationsof reporting on SA after the fact several re-searchers have used a technique in which thesimulation is frozen at randomly selected timesand subjects are queried as to their perceptionsof the situation at the time (Endsley 1987 19881989a Fracker 1990 Marshak KupermanRamsey and Wilson 1987) With this techniquethe system displays are blanked and the simula-tion is suspended while subjects quickly answer

HUMAN FACTORS

questions about their current perceptions of thesituation Thus SA data can be collected imme-diately which reduces the problems incurredwhen collecting data after the fact but does notincur the problems of on-line questioning Sub-jects perceptions are then compared with thereal situation according to the simulationcomputer database to provide an objective mea-sure of SA

In a study evaluating competing aircraft dis-play concepts Marshak et al (1987) queried sub-jects about navigation threats and topographyusing this technique The resulting answerswere converted to an absolute percentage errorscore for each question allowing scores acrossdisplays to be compared along these dimen-sions Fracker (1990) in another aircraft studyrequested information on aircraft location andidentity for specifically indicated aircraft in thesimulation Mogford and Tansley (1991) col-lected data relevant to an air traffic control taskIn each study a measure of SA on only selectedparameters was obtained

The Situation Awareness Global AssessmentTechnique is a global tool developed to assessSA across all of its elements based on a compre-hensive assessment of operator SA requirements(Endsley 1987 1988 1990a) As a global mea-sure SAGAT includes queries about all SA re-quirements including Level I 2 and 3 compo-nents and considers system functioning andstatus and relevant features of the externalenvironment

Computerized versions of SAGAT have beendeveloped for air-to-air tactical aircraft (Ends-ley 1990c) and advanced bomber aircraft (Ends-ley 1989a) These versions allow queries to beadministered rapidly and SA data to be col-lected for analysis The SAGAT tool features aneasy-to-use query format that was designed tobe as compatible as possible with subject knowl-edge representations An example of a SAGATquery is shown in Figure 1 SAGATs basic meth-odology is generic and applicable to other typesof systems once a delineation of SA require-ments has been made in order to construct thequeries

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 6: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

70-March 1995

administered after the completion of eachsimulated trial This allows ample time for sub-jects to respond to a lengthy and detailed list ofquestions about their SA during the trial pro-viding needed information about subject per-ceptions Unfortunately people are not good atreporting detailed information about past men-tal events even recent ones there is a tendencyto overgeneralize and overrationalize Recall isstilted by the amount of time and interveningevents that occur between the activities of inter-est and the administration of the questionnaire(Nisbett and Wilson 1977) Earlier mispercep-tions can be quickly forgotten as the real pictureunfolds during the course of events Therefore aposttest questionnaire will reliably capture thesubjects SA only at the very end of the trial

On-line One way of overcoming this defi-ciency is to ask operators about their SA whilethey are carrying out their simulated tasks Un-fortunately this too has several drawbacksFirst of all in many situations the subject willbe under very heavy workload precluding theanswering of additional questions Such ques-tions would constitute a form of ongoing second-ary task loading that may alter performance onthe main task of operating the system Further-more the questions asked could cue the subjectto attend to the requested information on thedisplays thus altering the operators true SAAn assessment of time to answer as an indicatorof SA is also faulty as subjects may employvarious time sharing strategies between thedual tasks of operating the system and answer-ing the questions Overall this method will behighly intrusive on the primary task of systemoperation

Freeze technique To overcome the limitationsof reporting on SA after the fact several re-searchers have used a technique in which thesimulation is frozen at randomly selected timesand subjects are queried as to their perceptionsof the situation at the time (Endsley 1987 19881989a Fracker 1990 Marshak KupermanRamsey and Wilson 1987) With this techniquethe system displays are blanked and the simula-tion is suspended while subjects quickly answer

HUMAN FACTORS

questions about their current perceptions of thesituation Thus SA data can be collected imme-diately which reduces the problems incurredwhen collecting data after the fact but does notincur the problems of on-line questioning Sub-jects perceptions are then compared with thereal situation according to the simulationcomputer database to provide an objective mea-sure of SA

In a study evaluating competing aircraft dis-play concepts Marshak et al (1987) queried sub-jects about navigation threats and topographyusing this technique The resulting answerswere converted to an absolute percentage errorscore for each question allowing scores acrossdisplays to be compared along these dimen-sions Fracker (1990) in another aircraft studyrequested information on aircraft location andidentity for specifically indicated aircraft in thesimulation Mogford and Tansley (1991) col-lected data relevant to an air traffic control taskIn each study a measure of SA on only selectedparameters was obtained

The Situation Awareness Global AssessmentTechnique is a global tool developed to assessSA across all of its elements based on a compre-hensive assessment of operator SA requirements(Endsley 1987 1988 1990a) As a global mea-sure SAGAT includes queries about all SA re-quirements including Level I 2 and 3 compo-nents and considers system functioning andstatus and relevant features of the externalenvironment

Computerized versions of SAGAT have beendeveloped for air-to-air tactical aircraft (Ends-ley 1990c) and advanced bomber aircraft (Ends-ley 1989a) These versions allow queries to beadministered rapidly and SA data to be col-lected for analysis The SAGAT tool features aneasy-to-use query format that was designed tobe as compatible as possible with subject knowl-edge representations An example of a SAGATquery is shown in Figure 1 SAGATs basic meth-odology is generic and applicable to other typesof systems once a delineation of SA require-ments has been made in order to construct thequeries

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 7: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT March 1995-71

0========= DESIGNATE AIRCRAFTS ALTITUDE

ALTITUDE (FEET)70 K

60 K

50 K

40 K

30 K

20 K

10 K

OKo MOVEgt TO SCAL9

Figure 1 Sample SAGAT query

This global approach has an advantage overprobes that cover only a limited number of SAitems in that subjects cannot prepare for thequeries in advance because queries could ad-dress nearly every aspect of the situation towhich subjects would normally attend There-fore the chance of subjects attention being bi-ased toward specific items is minimized In ad-dition the global approach overcomes theproblems incurred when collecting data afterthe fact and minimizes subject SA bias resultingfrom secondary task loading or artificially cuingthe subjects attention It also provides a directmeasure of SA that can be objectively collectedand objectively evaluated Finally the use ofrandom sampling (both of stop times and querypresentation) provides unbiased estimates ofSA thus allowing SA scores to be easily com-pared statistically across trials subjects andsystems The primary disadvantage of this tech-nique involves the temporary halt in the simu-lation

I conducted two studies to investigate wheth-er or not this imposes undue intrusiveness andto determine the best method for implementingthis technique In the first study I wanted to spe-cifically determine how long after a freeze in thesimulation SA information could be obtained In

the second study I investigated whether tempo-rarily freezing the simulation would result inany change in subject behavior thus specificallyevaluating potential intrusiveness

EXPERIMENT 1

In determining whether SA information is re-portable via the SAGAT methodology severalpossibilities must be considered First data maybe processed by subjects in short-term memory(STM) never reaching long-term memory(LTM) If a sequential information-processingmodel is used then it is possible that informa-tion might enter into STM and never be storedin LTM where it would be available for retrievalduring questioning In this case informationwould not be available during any SAGATquerysessions that exceeded the STM storage limita-tions (approximately 30 s with no rehearsal)

There is a good deal of evidence however thatSTM does not precede LTM but constitutes anactivated subset of LTM (Cowan 1988 Morton1969 Norman 1968) According to this type ofmodel information proceeds directly from sen-sory memory to LTM which is necessary for pat-tern recognition and coding Only those portionsof the environment that are salient are thenhighlighted in STM (either through focalized

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 8: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

72-March 1995

attention or automatic activation) This type ofmodel would predict that SA information thathas been perceived andor further processed bythe operator would exist in LTM stores and thusbe available for recall during SAGAT queryingthat exceeds 30 s

Second data may be processed in a highly au-tomated fashion and thus not be in the subjectsawareness Expert behavior can function in anautomated processingaction sequence in somecases Several authors have found that evenwhen effortful processing is not used howeverthe information is retained in LTM and is capa-ble of affecting subjeCt responses (Jacoby andDallas 1981 Kellog 1980 Tulving 1985)The type of questions used in SAGAT-pro-viding cued-recall and categorical or scalarresponses-should be receptive to retrieval ofthis type of information In addition a review ofthe literature indicates that even under thesecircumstances the product of the automaticprocesses (as a state of knowledge) is availableeven though the processes themselves may bedifficult to verbalize (Endsley 1995 this issue)

Third the information may be in LTM but noteasily recalled by the subjects Evidence sug-gests that when effortful processing and aware-ness are used during the storage process recallis enhanced (Cowan 1988) SA composed ofhighly relevant attended to and processed in-formation should be most receptive to recall Inaddition the SAGAT battery which requirescategorical or scalar responses is a cued as op-posed to total recall task thus aiding retrievalUnder conditions of SAGAT testing subjects areaware that they may be asked to report their SAat any time This too may aid in the storage andretrieval process Because the SAGATbattery isadministered immediately after the freeze in thesimulation no time for memory decay or com-peting event interference is allowed Thus con-ditions should be optimized for the retrieval ofSA information Although it cannot be said con-clusively that all of a subjects SA can be re-flected in this manner the vast majority shouldbe reportable via SAGAT

To further investigate this matter I conducted

HUMAN FACTORS

a study to specifically determine how long aftera freeze in the simulation SA information couldbe obtained I expected that if collection of SAdata via this technique was memory limitedthis would be evidenced by an increase in errorson SAGATqueries that occurred approximately30 to 60 s after the freeze time because of short-term memory restrictions (Although this wouldnot preclude use of the technique it would limitits use by restricting the number of questionsthat could be asked at each stop)

Procedure

A set of air-to-air engagements was conductedin a high-fidelity aircraft simulation facility Afighter sweep mission with a force ratio of two(blue team) versus four (red team) was used forthe trials The objective of the blue team was topenetrate red territory maximizing the kills ofred fighters while maintaining a high degree ofsurvivability The red team was directed to flyaround their assigned combat air patrol (CAP)points until a blue target was detected in redairspace The red team was then allowed toleave its CAP point to defend against the blueteam In all cases specific tactics were at thediscretion of the individual pilot teams

A total of 15 trials were completed by twoteams of six subjects At a random point in eachtrial the simulator was frozen and SAGAT datawere immediately collected from all six partici-pants At a given stop all of the queries werepresented once in random order As this orderwas different at each stop each query was pre-sented a variety of times after the stop acrosssubjects and trials Therefore performance ac-curacy on each question could be evaluated as afunction of the amount of time elapsed prior toits presentation After all subjects had com-pleted the SAGATbattery at a given stop a newtrial was begun

Prior to conducting the study all subjectswere trained on the use of the simulator thedisplays aircraft handling qualities andSAGAT In addition to three instructional train-ing sessions on using SAGAT each subject par-ticipated in 18 practice trials in which SAGAT

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 9: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT

was administered (Most subjects also had re-ceived a substantial amount of training in thesimulator in the past) Thus subjects were welltrained prior to testing

Facilities Ahigh-fidelity real-time simulationfacility was used for testing Aircraft control sys-tems avionics systems weapons systems crewstations and air vehicle performance were mod-eled to provide realistic aircraft control charac-teristics A Gould mainframe computer con-trolled the simulations and drove SiliconGraphics-generated high-resolution colorgraphics displays This test used six pilot sta-tions each including a simulated head-up dis-play a tactical situation display radar and sys-tem controls operated by a touch screen or stickand throttle control switches A realistic stickand throttle provided primary flight control

Subjects Twelve experienced former militaryfighter pilots participated in the test The meansubject age was 4816 years (range of 32 to 68)Subjects had an average of 3310 h (range 1500-6500) and an average of 155 years (range 7-26)of military flight experience Of the 12 subjects 7 had combat experience

Results

Each of the subjects answers to the SAGATqueries was compared with actual values as col-lected by the simulator computer at the time ofeach stop Included in the SAGATbattery at thetime of the test were 26 queries Of those 11could not be evaluated because the appropriatesimulator data were not available and 5 couldbe evaluated only by subjective means The 10remaining query answers concerned ownshipheading ownship location aircraft heading air-craft detections aircraft airspeed aircraftweapon selection aircraft Gs aircraft fuel levelaircraft weapon quantity and qircraft altitude

An error score (SAGAT query answer minusactual value) was computed for each responseAbsolute error scores for each query (across allsubjects and trials) were plotted against theamount of time elapsed between the stop in thesimulation and the query and a regression wascalculated F tests for the regressions and Pear-

March 1995-73

sons r for eight of the queries are presented inTable 1 Two of the queries provided a categor-ical response therefore chi-square tests wereperformed and are presented in Table 2

None of the regressions or chi-square testscomputed for each of the 10 queries was signif-icant (or even close to significant) indicatingthat subjects were neither more nor less prone toresponse error as the amount of time betweenthe simulator freeze and the presentation of thequery increased A plot of the regression for al-titude error shown in Figure 2 reveals little orno increase in error over time (Plots of the re-gressions for the other variables appeared quitesimilar)

Discussion

Based on these data it would appear that sub-jects under these conditions were able to pro-vide information on their SA about a particularsituation for up to 5 or 6 min The fact that all 10of the queries produced flat regressions lends ex-tra weight to this conclusion

Two explanations can be offered for thesefindings First this study investigated subjectswho were expert in particular tasks during a re-alistic simulation of those tasks The informa-tion subjects were asked to report was impor-tant to performance of those tasks Mostlaboratory studies that predict fairly rapid de-cay times (approximately 30 s for short-termmemory) typically employ the use of stimulithat have little or no inherent meaning tothe subject (nonsense words or pictures) The

TABLE 1

Tests on Regressions of Time of Query Presentation onQuery Accuracy

Query df F P r2

Own heading 119 0021 0887 0001Own location 119 0940 0334 0047Aircraft heading 195 0040 0834 0000G level 190 0500 0480 0006Fuel level 195 1160 0284 0Q12Weapon quantity 192 0001 0981 0000Altitude 196 0002 0965 0000Detection 1524 0041 0840 0000

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 10: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

74-March 1995 HUMAN FACTORS

TABLE 2

50000 r--------------------

Chi-Square Tests of Time Category of Query Presenta-tion on Query Accuracy

Time (Seconds)Figure 2 Altitude error by time until query presentation

EXPERIMENT 2

simulation was stopped No intervening periodof waiting or any competing activity was intro-duced prior to administering any SAGAT querySubject knowledge of SA information may beinterfered with if time delays or other activities(particularly continued operational tasks) areimposed before SAGAT is administered The ma-jor implication of these results is that underthese conditions SA data are readily obtainablethrough SAGAT for a considerable period after astop in the simulation (up to 5 or 6 min)

A second study was initiated to address theissue of possible intrusiveness It is necessary todetermine whether temporarily freezing thesimulation results in any change in subject be-havior If subject behavior were altered by sucha freeze certain limitations would be indicated(One might wish to not resume the trial after thefreeze for instance but start a new trial in-stead) The possibility of intrusiveness wastested by evaluating the effect of stopping thesimulator on subsequent subject performance

Procedure

A set of air-to-air engagements was conductedof a fighter sweep mission with a force ratio oftwo (blue team) versus four (red team) Thetraining instructions and pilot mission objec-tives were identical to those used in Experiment1 In this study however the trial was resumedafter a freeze following a specified period for col-lecting SAGAT data and was continued untilspecified criteria for completion of the missionwere met Subjects completed as many queriesas they could during each stop The queries werepresented in random order

Five teams of six subjects completed a full testmatrix The independent variables were dura-tion of the stops (12 1 or 2 min) and the fre-quency of stops (12 or 3 times during the trial) Each team participated twice in each of thesenine conditions (In any given trial multiplestops were of the same duration) Each teamalso completed six trials in which no stops oc-curred as a control condition Therefore a total

400

P

gt0995gt0995

300

016013

200

bull 1 bullbullbullbullmiddot1

df

5 (N = 97)5 (N = 98)

100

bull - - bull t bullbullbullbullbull o

o

Query

Weapon selectionAirspeed

storage and utilization of relevant informationmay be quite different from that of irrelevantinformation (Chase and Simon 1973)

Second the results indicate that the SA infor-mation was obtainable from long-term memorystores If schemata or other mechanisms areused to organize SA information (as opposed toworking memory processes only) then that in-formation will be resident in long-term memoryMany of the 10 items analyzed can be consideredLevell SA components The fact that this lower-level information was resident in LTM indicatesthat either the inputs to higher-level processingwere retained as well as the outputs or the Level1 components were retained as important piecesof pilot SA and are significant components ofLTM schemata (eg target altitude itself is im-portant to know and not just its implications)Both of these explanations may be correctThese findings generally support the predictionsof a processing model in which informationpasses into LTM storage before being high-lighted in STM

As a caveat note that subjects were activelyworking with their SA knowledge by answeringthe SAGAT queries for the entire period that the

bullbullZ 40000~o 30000w-8 20000Jbullbull 10000C(

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 11: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT March 1995-75

Results

o 1 2 3Number Of Stops

Figure 5 Blue kills by number of stops in trial

10

40

50

50

40

li2 30(1)aa1 20

IIIiiii~ 30(5j 20EIZ

o 1 2

Number Of LossesFigure 4 Blue losses Trials with stops versus trialswithout stops

10

Analysis of variance was used to evaluate theeffect of number of stops and duration of stopson each of the two performance measures blueteam kills and blue team losses The number ofstops during the trial had no significant effect oneither pilot performance measure F(3116) =

173 p gt 005 ~ = 055 and F(3116) = 020 pgt 005 ~ gt 060 respectively as shown in Fig-ures 5 and 6 The duration of the stoP also didnot significantly affect either performance mea-sure F(3116) = 216 p gt 005 ~ = 040 andF(31l6) = 077p gt 005 ~ gt 060 respectivelyas depicted in Figures 7 and 8 In viewing thedata no linear trend appears to be present in

10

of 30 trials were conducted for each duration ofstop condition and a total of 30 trials were con-ducted for each frequency of stop conditionThese conditions were compared with 30 trialsin which no stops occurred Conditions were ad-ministered in random order blocked by teamPilot performance was collected as the depen-dent measure

Facilities This study used the same pilotedmultiple-engagement simulator as in Experi-ment 1

Subjects Twenty-five experienced former mil-itary fighter pilots participated in this test(Four of the subjects participated on more thanone team) The mean subject age was 4516 years(range of 32 to 68) Subjects had an average of3582 h (range of 975 to 7045) and an average of169 years (range of 6 to 27) of military flightexperience Of the 25 subjects 14 had combatexperience

012

Number Of Kills

Figure 3 Blue kills Trials with stops versus trials with-out stops

50

40OJCI-o 30

E 20IZ

Pilot performance under each of the condi-tions was analyzed Pilot performance measuresincluded the number of blue team kills (redteam losses) and the number of blue team losses(red team kills) Chi-square tests were per-formed on the number of blue team kills x2 (4N = 120) = 5973 p gt 005 and blue teamlosses X2 (2 N = 120) = 005 p gt 005 compar-ing between trials in which there were stops tocollect SAGAT data and those in which therewere no stops as depicted in Figures 3 and 4There was no significant difference in pilot per-formance on either measure

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 12: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

76-March 1995 HUMAN FACTORS

50

40

50

40

o 1 2 3Number Of Stops

Figure 6 Blue losses by number of stops in trial

o 30 60 120Duration Of Stops

(Seconds)Figure 8 Blue losses by duration of stops in trial

10

oCI)o8 30JCI)a 20mIII10

oCI)030ooJCI) 20=-In

either case that would indicate a progressivelyworse (albeit nonsignificant) effect of increasingnumber or duration of stops This indicates thatstops to collect SAGATdata (as many as 3 for upto 2 min in duration) did not have a significanteffect on subject performance

Discussion

The lack of a significant influence of this pro-cedure on performance probably rests on the

50

40l-S2 30CI)

=-m 20

10

o 30 60 120Duration Of Stops

(Seconds)Figure 7 Blue kills by duration of stops in trial

fact that relevant schemata are actively used bysubjects during the entire freeze period Underthese conditions the subjects SA does not havea chance to decay before the simulation is re-sumed (as was indicated by Experiment 1)ThusSA is fairly intact when the simulation contin-ues allowing subjects to proceed with theirtasks where they left off Subjectively subjectsdid fairly well with this procedure and were ableto readily pick up the simulation at the point atwhich they left off at the time of the freezesometimes with the same sentence they hadstarted before the stop On many occasions sub-jects could not even remember if they had beenstopped to collect SA data during the trial alsoindicating a certain lack of intrusiveness

Summary

The use of a temporary freeze in the simula-tion to collect SA data is supported by Experi-ments 1 and 2 Subjects were able to report theirSA using this technique for as long as 5 or 6 minwithout apparent memory decay and the freezedid not appear to be intrusive on subject perfor-mance in the simulation allaying several con-cerns about the technique Although it is alwaysdifficult to establish no effect of one variable onanother (ie prove the null hypothesis) it is

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 13: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT

reassuring that the finding of no intrusion onperformance has been repeated in numerousother studies in which SAGAT was used toevaluate human-machine interface concepts(Bolstad and Endsley 1990 Endsley 1989bNorthrop 1988) if the timing of the stop wasunpredictable to the subjects (Endsley 1988)This finding was also repeated in one study inwhich pilots flew in a simulation against com-puter-controlled enemy aircraft (Endsley1990d) helping to rule out the possibility thatthe finding of no effect on performance in thepresent study could have been the result of pi-lots on both sides of the simulation beingequally influenced

SAGAThas thus far proven to meet the statedcriteria for measurement In addition to thepresent studies establishing a level of empiricalvalidity SAGAThas been shown to have predic-tive validity (Endsley 1990b) and content valid-ity (Endsley 1990d) The method is not withoutsome costs however as a detailed analysis of SArequirements is required in order to develop thebattery of queries to be administered On thepositive side this analysis can also be extremelyuseful for guiding design efforts

EXAMPLE

To proffer an example of the type of SA infor-mation provided by SAGAT I will presenta study that evaluates the use of a three-dimensional display for providing radar infor-mation to fighter pilots The study was con-ducted in the same high-fidelity real-timesimulator as in the prior two experiments Dur-ing the study each of six experienced pilot sub-jects flew a fighter sweep mission against a fieldof four digitally controlled enemy aircraft Eachsubject completed five trials in each of five dis-play conditions The initial starting conditionsand order of presentation of the display condi-tions were randomized Each trial was halted tocollect SAGATdata at a randomly selected timebetween 25 and 5 min into each trial Immedi-ately following the collection of SAGATdata theSubjective Workload Assessment Technique

March 1995-77

(SWAT)was also administered In this study thetrial was not resumed after the collection ofSAGAT and SWAT data

The five display conditions were constructedto evaluate the effect of a three-dimensional (3D)display concept The 3D display provided a gridrepresenting the ground above which aircrafttargets detected by the radar were presented bya small triangle pointing in the direction of theaircrafts heading and connected to the grid by aline corresponding to the aircrafts altitude Theplane the subject was flying was presented inthe center of this display so that each targetslocation and altitude were spatially representedrelative to the subjects ownship Four rotationsof the grid (0 30 45 and 60 deg) were investi-gated (each providing a different perspective onthe scene) as shown in Figure 9 At O-deg rota-tion the grid portrayed a Gods-eye view of thescene thus depicting a traditional two-dimensional display In addition these fourviews were compared with a two-display config-uration in which a Gods-eye view display wassupplemented with a second profile display thatpictorially presented altitude In all conditionsaltitude of the aircraft was also presented nu-merically next to the aircraft symbols

Following the trials the collected SAGAT andSWAT data were analyzed Although the finalSAGATbattery for fighter aircraft has 40 queries(Endsley 1990c) only 23 were administered atthe time of the test (as the battery was still indevelopment) The results of 4 pertinent queriesare presented here The subjects reportedknowledge of the range azimuth altitude andheading of the enemy targets was comparedwith the actual values of these elements at thetime of the freeze in the simulation The sub-jects perceptions were evaluated as correct orincorrect (using tolerance values that had beendetermined to be operationally relevant)

The percentage correct across subjects foreach condition on each of these four queries isshown in Figure 10 As the scored data are bino-mial an arcsine transformation was applied andan ANOVA performed The results showed thatthe display type had a significant effect on pilot

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 14: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

78-March 1995 HUMAN FACTORS

(a) 20 Display

~

nbull

lb)3D Display

00 Rotation

(d)3D Display

4sO Rotation

(e)3D Display

300 Rotation

(e)3D Display

600 Rotation

Figure 9 Two-dimensional display and three-dimensional display at four levels of rota-tion

SA of the aircrafts range F(4288) = 2575 P lt005 azimuth F(4278) = 4125 p lt 005 andheading F(4139) = 3040 P lt 005 but not al-titude F(494) = 183p gt 005 The 3D displaywhen rotated at 45 or 60 deg provided signifi-cantly lower SA on target range than when at 0or 30 deg and significantly lower SA of targetheading than of any of the other three condi-tions A rotation of 60 deg also provided signifi-cantly lower SA of target azimuth than did theother four conditions Although the ANOVAfortarget altitude was not significant there was an

opposite trend for the highly rotated 3D displayto provide more SA of target altitude than didrotations of 0 and 30 deg The SWAT data sup-ported these findings with significantly higherratings for workload in the 45- and 60-deg rota-tions than in the 0- or 30-deg rotations or thetwo-dimensional display

Amajor point to be gleaned from this exampleis the importance of examining the impact of agiven design across an operators SA require-ments The proposed 3D display was designed toprovide pilots with better information about air-

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 15: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT March 1995-79

2-D 3-D 3-0 3-0 3-000 300 450 600

al 10it 90E 80Ql

2 70~ 60bull 50~lt 400 301 20Ql

e 10Q)

0- 0 2-D 3-0 3-0 3-0 3-00deg 30deg 45deg 600

(a) Knowledge Of Target Range (b) Knowledge Of Target Heading

1 0 100Q) 90~ 90 E

80 E 80cGlGl 2 702 70

~ 60 ~ 60e 50 e 50lt 40 ( 400 30 0 30E 20 E 20Ql Ql

f 10 f 10Ql Ql

0- 0 0- 0 2-D 3-0 3-0 3-D 3-Dmiddot2-0 3-D 3-D 3-D 3-00deg 30deg 45deg 60deg 0deg 30deg 45deg 60deg

(e) Knowledge Of Target Azimuth (d) Knowledge Of Target AltitudeFigure 10 Situation awareness of target range heading alimuth and altitude in each display condition

craft altitude and it may have done so althoughthe trend was not significant However thiscame at the expense of SA on location azimuthand heading If a study had been performed thatmerely examined subjects knowledge of aircraftaltitude this important information would havebeen lost The SAGAT data also provided impor-tant diagnostic information that was useful inunderstanding the exact effect the design con-cept had on pilot SA in order to make futureiterations

IMPLEMENTATION RECOMMENDATIONS

Several recommendations for SAGAT admin-istration can be made based on previous experi-ence in using the procedure

TrainingAn explanation of the SAGAT procedures and

detailed instructions for answering each queryshould be provided to subjects before testingSeveral training trials should be conducted inwhich the simulator is halted frequently to al-low subjects ample opportunity to practice re-sponding to the SAGAT queries Usually three tofive samplings are adequate for a subject to be-come comfortable with the procedure and toclear up any uncertainties in how to answer thequeries

Test DesignSAGAT requires no special test consider-

ations The same principles of experimentaldesign and administration apply to SAGAT as to

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 16: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SO-March 1995

any other dependent measure Measures of sub-ject performance and workload may be collectedconcurrently with SAGAT as no ill effect fromthe insertion of breaks has been shown To becautious however if performance measures areto be collected simultaneously with SAGATdata half of the trials should be conducted with-out any breaks for SAGAT in order to provide acheck for this contingency

Procedures

Subjects should be instructed to attend totheir tasks as they normally would with theSAGATqueries considered as secondary No dis-plays or other visual aids should be visible whilesubjects are answering the queries If subjectsdo not know or are uncertain about the answerto a given query they should be encouraged tomake their best guess There is no penalty forguessing allowing for consideration of the de-fault values or other wisdom gained from expe-rience that subjects normally use in decisionmaking If subjects do not feel comfortableenough to make a guess they may go on to thenext question Talking or sharing of informationamong subjects should not be permitted If mul-tiple subjects are involved in the same simula-tion all subjects should be queried simulta-neously and the simulation resumed for allsubjects at the same time

Which Queries to Use

Random selection As it may be impossible toquery subjects about all of their SA require-ments in a given stop because of time con-straints a portion of the SA queries may be ran-domly selected and asked each time A randomsampling provides consistency and statisticalvalidity thus allowing SA scores to be easilycompared across trials subjects systems andscenarios

Because of attentional narrowing or lack ofinformation certain questions may seem unim-portant to a subject at the time of a given stop Itis important to stress that subjects should at-tempt to answer all queries anyway This is be-cause (a) even though they think it unimportant

HUMAN FACTORS

the information may have at least secondary im-portance (b) they may not be aware of informa-tion that makes a question very important (egthe location of a pop-up aircraft) and (c) if onlyquestions of the highest priority were askedsubjects might be inadvertently provided withartificial cues about the situation that will di-rect their attention when the simulation is re-sumed Therefore a random selection from aconstant set of queries is recommended ateach stop

Experimenter controlled In certain tests it maybe desirable to have some queries omitted be-cause of limitations of the simulation or char-acteristics of the scenarios For instance if thesimulation does not incorporate aircraft mal-functions the query related to this issue may beomitted In addition with particular test de-signs it may be desirable to ensure that certainqueries are presented every time When this oc-curs subjects should also be queried on a ran-dom sampling from all SA requirements and notonly on those related to a specific area of interestto the evaluation being conducted This is be-cause of the ability of subjects to shift attentionto the information on which they know they willbe tested What may appear to be an improve-ment in SA in one area may be merely a shift ofattention from another area When the SAGATqueries cover all of the SA requirements no suchartificial cuing can occur

When to Collect SAGAT Data

It is recommended that the timing of eachfreeze for SAGAr administration be randomlydetermined and unpredictable so that subjectscannot prepare for queries in advance If thefreeze occurrence is covertly associated with theoccurrence of specific events or at specific timesacross trials prior studies have shown that thesubjects will be able to figure this out (Endsley1988) allowing them to prepare for queries oractually improve SA through the artificiality ofthe freeze cues An informal rule has been to en-sure that no freezes occur earlier than 3 min intoa trial to allow subjects to build up a picture of

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 17: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT

the situation and that no two freezes occurwithin 1 min of each other

The result of this approach is that the activi-ties occurring at the time of the stops will berandomly selected Some stops may occur dur-ing very important activities that are of interestto the experimenter others when no critical ac-tivities are occurring This gives a good sam-pling of the subjects SA in a variety of situa-tions During analysis the experimenter maywant to stratify the data to take these variationsinto account

How Much SAGAT Data to Collect

The number of trials necessary will depend onthe variability present in the dependent vari-ables being collected and the number of datasamples taken during a trial This will vary withdifferent subjects and designs but between 30and 60 samplings per SA query with each designoption have previously been adequate in awithin-subjects test design

Multiple SAGAT stops may be taken withineach trial There is no known limit to the num-ber of times the simulator can be frozen duringa given trial In Experiment 2 no ill effects werefound from as many as three stops during a 15-min trial In general it is recommended that astop last until a certain amount of time haselapsed and then the trial is resumed regard-less of how many questions have been answeredStops as long as 2 min in duration were usedwith no undue difficulty or effect on subsequentperformance In Experiment 1 stops as long as 5min were shown to allow subjects access to SAinformation without memory decay

Data Collection

The simulator computer should be pro-grammed to collect objective data correspond-ing to the queries at the time of each freeze Con-sidering that some queries will pertain tohigher-level SA requirements that may be un-available in the computer an expert judgmentof the correct answer may be made by an expe-rienced observer who is privy to all informationreflecting the SA of a person with perfect knowl-

March 1995-81

edge A comparison of subjects perceptions ofthe situation (as input into SAGAT)with the ac-tual status of each variable (as collected per thesimulator computer and expert judgment) re-sults in an objective measure of subject SAQuestions asked of the subject but not answeredshould be considered incorrect No evaluationshould be made of questions not asked during agiven stop

It is recommended that answers to each querybe scored as correct or incorrect based onwhether or not it falls into an acceptable toler-ance band around the actual value For exampleit may be acceptable for a subject to be 10 milesper hour off of actual ground speed This methodof scoring poses less difficulty than dealing withabsolute error (see Marshak et aI 1987) A tab-ulation of the frequency of correctness can thenbe made within each test condition for each SAelement Because data scored as correct or in-correct are binomial the conditions for analysisof variance are violated A correction factor Y= arcsine(Y) can be applied to binomial datawhich allows analysis of variance to be used Inaddition a chi-square Cochrans Q or binomialt test (depending on the test design) can be usedto evaluate the statistical significance of differ-ences in SA between test conditions

Limitations and Applicability for Use

SAGAT has primarily been used within theconfines of high-fidelity and medium-fidelitypart-task simulations This provides experi-menter control over freezes and data collectionwithout any danger to the subject or processesinvolved in the domain It may be possible to usethe technique during actual task performance ifmultiple operators are present to ensure safetyFor example it might be possible to verballyquery one pilot in flight while another assumesflight control Such an endeavor should be un-dertaken with extreme caution however andmay not be appropriate for certain domains

A recent effort (Sheehy Davey Fiegel andGuo 1993) employed an adaptation of this tech-nique by making videotapes of an ongoingsituation in a nuclear power plant control room

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 18: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

82-March 1995

These tapes were then replayed to naive subjectswith freezes for SAGAT queries employed It isnot known how different the SA of subjects pas-sively viewing a situation may be from subjectsactually engaged in task performance howeverthis approach may yield some useful data

Most known uses of SAGAT have involvedfighter aircraft simulations In general it can beemployed in any domain in which a reasonablesimulation of task performance exists and ananalysis of SA requirements has been made inorder to develop the queries SAGAT querieshave been developed for advanced bomber air-craft (Endsley 1990a) and recently for en routeair traffic control (Endsley and Rodgers 1994)Several researchers have begun to use the tech-nique in evaluating operator SA in nuclear con-trol room studies (Hogg Torralba and Volden1993 Sheehy et aI 1993) The technique hasalso been employed in a simulated control taskto study adaptive automation (Carmody andGluckman 1993) Several new efforts are cur-rently under way to employ the technique inmedical decision making (E Swirsky personalcommunication March 1994) and helicopters(C Prince personal communication November1993) Potentially it could also be used in studiesinvolving automobile driving supervisory con-trol of manufacturing systems teleoperationsand operation of other types of dynamicsystems

CONCLUSION

The use of SA metrics in evaluating the effi-ciency of design concepts training programsand other potential interventions is criticalWithout careful evaluation it will be difficult tomake much progress in understanding the fac-tors that influence SA hindering human factorsand engineering efforts Several techniques havebeen reviewed here each with several advan-tages and disadvantages The use of one tech-nique SAGAT was explored extensively andtwo studies were presented that investigated thevalidity of the measure It has to date proven auseful and viable technique for measuring SA innumerous contexts Rigorous validity testing is

HUMAN FACTORS

needed for other techniques proposed for SAmeasurement Unless the veracity of these tech-niques is established interpreting their resultswill be hazardous

In addition to establishing validity of mea-sures of SA more research is needed regardingthe task of SA measurement itself For instanceas pointed out by Pew (1991) no criteria exist atthis time that establish the level of SA requiredfor successful performance Does an operatorneed to have SA that is 100 perfect (in bothcompleteness and accuracy) or is some lesseramount sufficient for good performance This isa complex issue I would assert that SA can beseen as a factor that increases the probability ofgood performance but that it does not necessar-ily guarantee it as other factors also come intoeffect (eg decision making workload perfor-mance execution system capabilities and SA ofothers in some cases) How much SA one needstherefore becomes a matter of how much prob-ability of error one is willing to accept Perhapssuch criteria should more properly and use-fully be established as the level of SA needed oneach subcomponent at different periods in timeSome guidelines will eventually need to be spec-ified if there is a desire to certify new systemdesigns on the basis of SA

Overall the beginnings of the field of SA mea-surement have been established allowing re-searchers in the area of SA to proceed frommainly speculation and anecdotal informationto solid empirical findings The tools for con-ducting further basic research on the SA con-struct and developing better system designs areavailable allowing needed research on SA in avariety of arenas

ACKNOWLEDGMENTS

This research was sponsored by the Northrop Corporation Ithank the many people at Northrop both past and presentwho contributed to this effort In particular I thank Vic Viz-carra for his assistance in conducting the studies Paul Cayleyand Gerry Armstrong for their programming efforts and MikeMajors and Del Jacobs for their support in developing an SAresearch program In addition I am indebted to MarkChignell who provided helpful support throughout the devel-opment of the methodology I also thank Mark Rodgers TornEnglish Dick Gilson and several anonymous reviewers fortheir comments on an earlier version of this paper

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 19: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

SITUATION AWARENESS MEASUREMENT

REFERENCES

Bolstad C A and Endsley M R (1990) Single versus dualscale range display investigation (NOR DOC 90-90) Haw-thorne CA Northrop Corporation

Carmody M A and Gluckman J P (1993) Task-specific ef-fects of automation and automation failure on perfor-mance workload and situational awareness In R SJensen and D Neumeister (Eds) Proceedings of the Sev-enth International Symposium on Aviation Psychology (pp167-171) Columbus OH Ohio State University Depart-ment of Aviation

Chase W G and Simon H A (1973) Perceptions in chessCognitive Psychology 4 55-81

Cowan N (1988) Evolving conceptions of memory storageselective attention and their mutual constraints withinthe human information-processing system PsychologicalBulletin 104 163-191

Endsley M R (1987) SAGAT A methodology for the measure-ment of situation awareness (NOR DOC 87-83) HawthorneCA Northrop Corp

Endsley M R (1988) Situation awareness global assessmenttechnique (SAGAT) In Proceedings of the National Aero-space and Electronics Conference (NAECON) (pp 789-795)New York IEEE

Endsley M R (1989a) Final report Situation awareness in anadvanced strategic mission (NOR DOC 89-32) HawthorneCA Northrop Corp

Endsley M R (l989b) Tactical simulation 3 test report Ad-dendum I situation awareness evaluations (81203033R)Hawthorne CA Northrop Corp

Endsley M R (1990a) A methodology for the objective mea-surement of situation awareness In Situational awarenessin aerospace operations (AGARD-CP-478 pp 11-19)Neuilly-Sur-Seine France NATO-Advisory Group forAerospace Research and Development

Endsley M R (1990b) Predictive utility of an objective mea-sure of situation awareness In Proceedings of the HumanFactors Society 34th Annual Meeting (pp 41-45) SantaMonica CA Human Factors and Ergonomics Society

Endsley M R (1990c) Situation awareness global assessmenttechnique (SAGAT) Air-to-air tactical version user guide(NOR DOC 89-58 rev A) Hawthorne CA Northrop Corp

Endsley M R (1990d) Situation awareness in dynamic humandecision making Theory and measurement Unpublisheddoctoral dissertation University of Southern CaliforniaLos Angeles CA

Endsley M R and RodgersM D (1994) Situation awarenessglobal assessment technique (SAGAT) En route air trafficcontrol version users guide (Draft) Lubbock Texas TechUniversity

Fracker M L (1990) Attention gradients in situation aware-ness In Situational awareness in aerospace operationsAGARD-CP-478 pp 61-610) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and De-velopment

Herrmann D J(l984) Questionnaires about memory In J EHarris and P E Morris (Eds) Everyday memory actionand absent-mindedness (pp 133-151) London Academic

Hogg D N Torralba B and Volden F S (1993) A situationawareness methodology for the evaluation of process controlsystems Studies of feasibility and the implication of use(1993-03-05) Storefjell Norway OECD Halden ReactorProject

March 1995-83

Hughes E R Hassoun J A and Ward F (1990) Advancedtactical fighter (ATF) simulation support Final report(CSEF-TR-ATF-90-91) Wright-Patterson Air Force BaseOH Air Force Systems Command Aeronautical SystemsDivision

Jacoby L L and Dallas M (1981) On the relationship be-tween autobiographical memory and perceptualleamingJournal of Experimental Psychology General lID 306-340

Kellog R T (1980) Is conscious attention necessary for long-term storage lournal of Experimental Psychology HumanLearning and Memory 6 379-390

Manktelow K and Jones J (1987) Principles from the psy-chology of thinking and mental models In M M Gardinerand B Christie (Eds) Applying cognitive psychology touser-interface design (pp 83-117) Chichester EnglandWiley

Marshak W P Kuperman G Ramsey E G and Wilson D(1987) Situational awareness in map displays In Proceed-ings of the Human Factors Society 31st Annual Meeting (pp533-535) Santa Monica CA Human Factors and Ergo-nomics Society

McDonnell Douglas Aircraft Corporation (1982) Advancedmedium-range air-ta-air missile operational utility evalua-tion (AMRAAM OUE) test final report St Louis MO Au-thor

Mogford R H and Tansley B W (1991) The importance ofthe air traffic controllers mental model Presented at theHuman Factors Society of Canada Annual Meeting Can-ada

Morton J (1969) Interaction of information in word recogni-tion Psychological Review 76165-178

Nisbett R E and Wilson T D (1977) Telling more than wecan know Verbal reports on mental processes Psycholog-ical Review 84 231-259

Norman D A (1968) Towards a theory of memory and atten-tion Psychological Review 75 522-536

Northrop Corporation (1988) Tactical simulation 2 test reportAddendum I situation awareness test results HawthorneCA Author

Pew R W (1991) Defining and measuring situation aware-ness in the commercial aircraft cockpit In Proceedings ofthe Conference on Challenges in Aviation Human FactorsThe National Plan (pp 30-31) Washington DC AmericanInstitute of Aeronautics and Astronautics

Sarter N B and Woods D D (1991) Situation awareness Acritical but ill-defined phenomenon International Journalof Aviation Psychology I 45-57

Selcon S J and Taylor R M (1990) Evaluation of the situ-ational awareness rating technique (SART) as a tool foraircrew systems design In Situational awareness in aero-space operations (AGARD-CP-478 pp 51-58) Neuilly-Sur-Seine France NATO-Advisory Group for AerospaceResearch and Development

Sheehy E J Davey E C Fiegel T T and Guo K Q (1993April) Usability benchmark for CANDU annunciation-lessons learned Presented at the ANS Topical Meeting onNuclear Plant Instrumentation Control and Man-MachineInterface Technology Oak Ridge TN

Smolensky M W (1993) Toward the physiological measure-ment of situation awareness The case for eye movementmeasurements In Proceedings of the Human Factors andErgonomics Society 37th Annual Meeting (p 41) SantaMonica CA Human Factors and Ergonomics Society

Taylor R M (1990) Situational awareness rating technique(SART) The development of a tool for aircrew systemsdesign In Situational awareness in aerospace operations(AGARD-CP-478 pp 31-317) Neuilly-Sur-Seine France

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994

Page 20: 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B729A71/Literature/SA_M/Endsley_1995.pdf · 1:8BDA:>:?C @; 5=CD8C=@? +E8A:?:BB =? -F?8>=9 5FBC:>B ... 33*$

84-March 1995

NATO-Advisory Group for Aerospace Research and De-velopment

Tulving E (1985) How many memory systems are thereAmerican Psychologist 40 385-398

Venturino M Hamilton W 1 and Dvorchak S R (1990)Performance-based measures of merit for tactical situationawareness In Situation awareness in aerospace operations(AGARD-CP-478 pp 41-45) Neuilly-Sur-Seine FranceNATO-Advisory Group for Aerospace Research and Demiddotvelopment

HUMAN FACTORS

Vidulich M A (1989) The use of judgment matrices in sub-jective workload assessment The subjective workloaddominance (SWORD) technique In Proceedings of the Hu-man Factors Society 33rd Annual Meeting (pp 1406-1410)Santa Monica CA Human Factors and Ergonomics Soci-ety

Date received January 26 1993Date accepted October 4 1994