17
THE INTERNATIONAL JOURNAL OF AVIATION PSYCHOLOGY, 23(2), 113–129 Copyright © 2013 Taylor & Francis Group, LLC ISSN: 1050-8414 print / 1532-7108 online DOI: 10.1080/10508414.2011.582455 Airport Security Screener Competency: A Cross-Sectional and Longitudinal Analysis Tobias Halbherr, 1 Adrian Schwaninger, 1,2 Glen R. Budgell, 3 and Alan Wales 1 1 Department of Informatics, University of Zurich, Switzerland 2 School of Applied Psychology, University of Applied Sciences and Arts Northwestern Switzerland, Switzerland 3 Canadian Air Transport Security Authority, Ottawa, Ontario, Canada The performance of 5,717 aviation security screeners in detecting prohibited items in x-ray images of passenger bags was analyzed over a period of 4 years. The measure used in this study was detection performance on the X-Ray Competency Assessment Test (X-Ray CAT) and performance on this measure was obtained on an annual basis. Between tests, screeners performed varying amounts of training in the task of prohibited item detection using an adaptive computer-based training system called X-Ray Tutor (XRT). For both XRT and X-Ray CAT, prohibited items are catego- rized into guns, knives, improvised explosive devices (IEDs), and other prohibited items. Cross-sectional and longitudinal analyses of the training and test data were performed. Both types of analysis revealed a strong improvement of X-Ray CAT performance as a result of XRT training for all 4 item categories. The results of the study indicate that screener competency in detecting prohibited items in x-ray images of passenger bags can be greatly improved through routine XRT training. Today, in aviation security worldwide, x-ray screening of passenger bags is the key safeguard in identifying prohibited items in passengers’ baggage. Despite great improvements in technical equipment, threat item detection still depends on human operators who interpret the x-ray images. Most important, the final Correspondence should be sent to Adrian Schwaninger, School of Applied Psychology, University of Applied Sciences and Arts Northwestern Switzerland, Institute Human in Complex Systems, Riggenbachstr. 16, 4600 Olten, Switzerland. E-mail: [email protected] Downloaded by [Fachhochschule Nordwestschweiz], [Adrian Schwaninger] at 08:49 08 April 2013

Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

  • Upload
    letram

  • View
    216

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

THE INTERNATIONAL JOURNAL OF AVIATION PSYCHOLOGY, 23(2), 113–129Copyright © 2013 Taylor & Francis Group, LLCISSN: 1050-8414 print / 1532-7108 onlineDOI: 10.1080/10508414.2011.582455

Airport Security Screener Competency:A Cross-Sectional andLongitudinal Analysis

Tobias Halbherr,1 Adrian Schwaninger,1,2 Glen R. Budgell,3

and Alan Wales1

1Department of Informatics, University of Zurich, Switzerland2School of Applied Psychology, University of Applied Sciences and Arts

Northwestern Switzerland, Switzerland3Canadian Air Transport Security Authority, Ottawa, Ontario, Canada

The performance of 5,717 aviation security screeners in detecting prohibited items inx-ray images of passenger bags was analyzed over a period of 4 years. The measureused in this study was detection performance on the X-Ray Competency AssessmentTest (X-Ray CAT) and performance on this measure was obtained on an annualbasis. Between tests, screeners performed varying amounts of training in the task ofprohibited item detection using an adaptive computer-based training system calledX-Ray Tutor (XRT). For both XRT and X-Ray CAT, prohibited items are catego-rized into guns, knives, improvised explosive devices (IEDs), and other prohibiteditems. Cross-sectional and longitudinal analyses of the training and test data wereperformed. Both types of analysis revealed a strong improvement of X-Ray CATperformance as a result of XRT training for all 4 item categories. The results ofthe study indicate that screener competency in detecting prohibited items in x-rayimages of passenger bags can be greatly improved through routine XRT training.

Today, in aviation security worldwide, x-ray screening of passenger bags is thekey safeguard in identifying prohibited items in passengers’ baggage. Despitegreat improvements in technical equipment, threat item detection still dependson human operators who interpret the x-ray images. Most important, the final

Correspondence should be sent to Adrian Schwaninger, School of Applied Psychology, Universityof Applied Sciences and Arts Northwestern Switzerland, Institute Human in Complex Systems,Riggenbachstr. 16, 4600 Olten, Switzerland. E-mail: [email protected]

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 2: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

114 HALBHERR ET AL.

decision on whether a passenger bag can enter an airplane is always made bya human operator (an airport security officer or screener). Therefore, abilitiesand aptitudes as well as initial and recurrent training of screeners are of utmostimportance in aviation security (Schwaninger, 2005; Schwaninger, Hardmeier, &Hofer, 2005; Schwaninger, Hofer, & Wetter 2007; Koller, Hardmeier, Michel, &Schwaninger, 2008; Schwaninger, Bolfing, Halbherr, Helman, Belyavin, & Hay,2008; Schwaninger, 2009).

OBJECT RECOGNITION AND THE LUGGAGE SCREENING TASK

Screeners who are operating an x-ray machine are essentially performing anobject recognition and visual search task. Object recognition is a key functionof human perception and the cognitive system (Palmer, 1999). Its importance isperhaps best illustrated by its early emergence in the evolution of our brain. Forhumans, every day object recognition—although actually a tremendously com-plex task—is effortless and almost 100% reliable (Kosslyn, 1994). Recognizinga chair as a chair happens virtually instantaneously, with no perceived effort onour part. Hardly ever do we mistakenly take a chair for something else—a table ora banana, for example. We are able to recognize a chair correctly from differentviewpoints, even though the retinal image might completely change depending onthe viewpoint (Tarr & Bülthoff, 1998). Furthermore, we are capable of correctlyidentifying an object as a chair, even if we have never before seen a chair like theone in question, let alone that specific chair: We are able to generalize. Confrontedwith objects that are novel to us, not yet fitting any of our categories, we arecapable of building new categories and quickly learn to correctly identify objectsbelonging to the new category: Human object recognition is adaptable, we learn.If objects are partially concealed or distorted, most of the time we are still capableof correctly identifying them: Human object recognition is noise tolerant. In thelast decade, progress in cognitive neuroscience, artificial intelligence, computervision, and automated object recognition has been tremendous and promising(e.g., Gauthier, Tarr, & Bub, 2009; Curio, Bülthoff, & Giese, 2010; Pfeiffer &Bongard, 2007; Osaka, Rentschler, & Biederman, 2007). However, recognizingobjects in x-ray images remains a task that cannot be solved by computers, sohuman screeners remain an essential component in airport security screening.

As previously discussed, people perform object recognition tasks with greatease, speed, and accuracy in everyday life. However, research has shown that indi-viduals perform poorly in the task of recognizing prohibited items in x-ray imagesof passenger bags if they are unfamiliar with this task (e.g., Fiore, Scielzo, Jentsch,& Howard, 2006; Koller et al., 2008). Why does the ease of everyday object recog-nition not carry over to the task of x-ray luggage screening? We would like tohighlight two important reasons for the difficulty of the object recognition task

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 3: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 115

in x-ray luggage screening: unfamiliarity with the task and so-called image-basedfactors.

X-RAY IMAGE INTERPRETATION TRAINING: OVERCOMINGUNFAMILIARITY

X-ray images represent objects in a fashion that is very different from how weperceive them in everyday life. In x-ray images, object opacity depends on mate-rial density rather than translucence. As a consequence interior components ofobjects that are naturally opaque become visible. Coloring of objects is a functionof object density and thus completely unrelated to the natural colors of objects.Because many objects are transparent in x-ray images and because passenger bagscan be quite crammed, there is a lot of clutter and superposition to deal with andthere are few depth queues. Furthermore, due to transparency, two superimposedobjects will coalesce, rather than one concealing the other. As a result of all theseproperties, objects in x-ray images can look very different to what we are accus-tomed to. To make matters worse, we might be completely unfamiliar with threatitems because we do not encounter them in our day-to-day lives, and other threatobjects might look very similar to harmless objects (Schwaninger, 2005).

X-ray images are in many ways substantially different from how we naturallyperceive objects. It should thus come as no surprise that the x-ray luggage screen-ing task is very difficult for people who are unfamiliar with it. It has been shown,however, that specific object recognition skills can be improved through training(e.g., Gauthier, Williams, Tarr, & Tanaka, 1998). Thus, we should also be able tosignificantly improve luggage screening skills through adequate training.

Screeners participating in this study trained with the X-Ray Tutor (XRT)computer-based training system (Schwaninger, 2004a). Several studies havefound large training effects for XRT on detection performance in both an on-the-job setting with high ecological validity (Schwaninger, Hofer, & Wetter, 2007)as well as in standardized computer-based tests with high construct validity andexperimental control (Michel, de Ruiter, Hogervorst, Koller, & Schwaninger,2007; Bolfing, Halbherr, & Schwaninger, 2008; Koller et al., 2008; Schwaningeret al., 2008). Comparison of on-the-job performance measurement and perfor-mance in computer-based tests has shown that the results are consistent witheach other (Schwaninger et al., 2007). Koller et al. (2007, 2008) and Madhavanand Gonzalez (2009) demonstrated that computer-based training leads to transfereffects on novel stimuli. Brown and Madhavan (2008) demonstrated that transfereffects are larger with high threat-present rates and heterogeneous stimuli. XRTuses a high threat-present to no threat-present ratio of 1:1. XRT uses a large imagelibrary of bags and threat items to ensure good transfer effects. Finally, XRT putsa large emphasis on image-based factors (which we discuss later), varying imagefactor difficulties systematically to allow screeners to adapt to their challenge.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 4: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

116 HALBHERR ET AL.

Bolfing et al. (2008) were able to explain between 59% (knives) and 73% (impro-vised explosive devices [IEDs]) of variance in individual differences in detectionperformance with the amount of training performed.

IMAGE-BASED FACTORS MODERATE DETECTION PERFORMANCE

Aside from unfamiliarity with the task, there is another important reason forthe difficulty of the x-ray screening task: so-called image-based factors. Image-based factors are specific properties of the x-ray image that influence detectionperformance. Three important image-based factors for the x-ray luggage screen-ing task have been identified: view difficulty, superposition, and bag complexity(Schwaninger et al., 2005). Bag complexity is divided into opacity and clut-ter (Schwaninger et al., 2008). View difficulty makes threat detection difficultbecause objects might seem unfamiliar in unusual views or they might resem-ble a harmless object. Superposition makes detection difficult because parts ofan object are concealed by or coalesce with other objects. Finally, opacity andclutter make the task difficult, because they add noise to an image. Bolfing et al.(2008) were able to explain almost 70% of variance in item difficulty with thementioned image-based factors, demonstrating their great influence on detectionperformance.

OTHER PERFORMANCE-RELEVANT FACTORS

Besides training and image-based factors, several other performance-relevant fac-tors for the luggage screening task have been identified and discussed. Whatpart does visual search play in the luggage screening task and how does itinfluence performance? Visual search tasks depend on target saliency and aretime-sensitive (Beck, 1982; Treisman, 1986). Research has shown that trainingnot only improves detection performance, but also reaction times. Average reac-tion times reported ranged from less than 4 sec for trained screeners judgingbags with a threat item present, to over 6 sec for screeners judging harmlessbags. Improvement through training suffers only slightly when x-ray images arepresented for 4 sec rather than 8 sec in training trials (Schwaninger & Hofer,2004; Schwaninger et al., 2007). In other words, on-the-job time constraints forscreening bags should not impede detection performance, as trained screenersare capable of performing the task reliably in a short time. McCarley, Kramer,Wickens, Vidoni, and Boot (2004) found detection performance improvementsthrough training to be based on improvements in the object recognition task ratherthan the visual search task. Research into effects of baggage size on detectionperformance showed no substantial effects (Schwaninger et al., 2008). Finally,reaction time improvements through training have been shown to be largely due

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 5: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 117

to improvements in nonsearch time (object recognition and decision time), ratherthan search time (Koller et al., 2009).

Age and gender have also been found to be performance-relevant factors.Riegelnig and Schwaninger (2006) found women, on average, to be more riskaverse in their responses (i.e., more answers “threat item present”) than menwhile detection performance (sensitivity) was similar between men and women.Reaction time increases and detection performance decreases with age (Kolleret al., 2009; Schwaninger, Riegelnig, Hardmeier, & Martin, 2010). However,analysis of training data from luggage screeners has shown that the latter handi-cap is (over)compensated through (voluntary) increases of training time with age(Bolfing et al., 2008).

MOTIVATION

There exists to date a considerable amount of research on computer-based train-ing and the luggage screening task, providing ample evidence for the impact oftraining on detection performance (see previous section). However, published dataare often cross-sectional or from a single airport only, longitudinal studies presentdata that usually span no more than 1 year, data might be collected from indi-viduals who do not work as x-ray security screeners, and training might onlytake place as part of the experiment, rather than being part of the screeners’on-the-job duties. In this study we present on-the-job training data and luggagescreening competency assessment data, spanning over more than 4 years, of morethan 5,000 luggage screeners, from more than 70 airports, in a country that putin place national policies requiring luggage screeners to perform computer-basedtraining regularly. We analyze these data to determine how the introduction ofthese policies translated into global detection performance improvements acrossthe country. To our knowledge no comparable data set has ever been publishedbefore.

Screener x-ray image interpretation competency in the luggage screening taskis measured with the computer-based test X-Ray Competency Assessment Test(X-Ray CAT, Koller & Schwaninger, 2006), so as to achieve high construct valid-ity. Loose standards are applied to screener data for inclusion in the study, so asto ensure high representativeness of results. We expect to confirm existing find-ings on effects of computer-based training on the luggage screening task. Mostnotably:

1. The introduction of training policies will lead to global increases in x-rayimage interpretation competency.

2. X-ray image interpretation competency will improve as a function ofcomputer-based training.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 6: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

118 HALBHERR ET AL.

3. Effect sizes of training on x-ray image interpretation competency will belarge.

4. Characteristic data patterns will be mirrored (see, e.g., Bolfing et al., 2008;Koller et al., 2008), particularly:● Training effects will be largest for IEDs. IEDs will be the most difficult

threat items prior to training and the most reliably detected threat itemsafter a sufficient amount of training.

● Training effects will be smallest for knives.

We have chosen the additional specific predictions because these patterns haveproven to be very robust and can be found in numerous publications. The pre-sumable reason for the comparatively largest training effects for IEDs is twofold.First, IEDs are particularly difficult to detect for novices because they are themost unfamiliar threat item category:1 They are (almost) never seen at securitycheckpoints or in everyday life. Second, they are comparatively easy to detectfor trained screeners, because they are largely viewpoint independent. The smalltraining effect for knives is explainable by the fact that knives can be difficult todetect when rotated.

METHOD

This is a study relying on applied data. We examined all training and compe-tency assessment data between 2005 and 2008 of luggage screeners conductingcabin baggage (or pre-board) screening from more than 70 airports in a singlecountry.

Participants

Data were obtained from a total of 6,865 screeners from more than 70 airportslocated in a country that has in place national regulations and policies govern-ing the aviation security screening process. By these regulations, screeners arerequired to do a minimum of 20 min of computer-based training per week toimprove their performance on the luggage screening task. On taking up theirjob and before beginning with training, an initial X-Ray CAT measurement ofscreener detection performance is taken (baseline test). X-Ray CAT measurementsare then repeated on an annual basis. In this study we present an analysis of thesevaluable data.

1Threat Image Projection (TIP), a technology with which virtual threat items are occasionallyprojected into x-ray images of passenger bags at the security checkpoints, will lessen this unfamiliaritysomewhat. However, at the launch of this study, TIP systems were not yet operational in the countryin question.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 7: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 119

Our data set includes 2,174 screeners that had complete data for a comparativeanalysis between X-Ray CAT performance in the 2008 testing and a prior baselinetest. For the screeners not included, either baseline test data or 2008 test data werenot available, or the 2008 test was their baseline test, because they entered the jobin 2008 or because XRT training at their airport was installed in 2008. Theseresults are discussed as global results.

This group of 2,174 screeners was further broken up into subgroups, accordingto the total number of training hours they had performed (i.e., 10–20 hr of train-ing, 20–30, 30–40, etc.), for a cross-sectional analysis to determine whether thenumber of training hours on XRT would predict X-Ray CAT performance in 2008.

XRT has been operational in the country for several years. Based on these data,a within-subjects longitudinal analysis was performed on 5,717 screeners. Ourdata span from 2005 to 2008. Within these years, 5,717 screeners had performeda baseline test, 3,286 a second test after 1 year, 1,167 a third after 2 years, and 136a fourth after 3 years. Decreasing sample sizes are due to job turnover and the factthat training infrastructure was not installed at all airports simultaneously.

Finally, a longitudinal analysis of global detection performance per calendaryear is performed.

Eligibility Criteria

For both the comparative and cross-sectional analyses screeners must have metthe following eligibility criteria to be included:

● Completed at least one X-Ray CAT and the baseline X-Ray CAT test wasadministered prior to 2008 (N = 2,174).

● Completed no more than 100 hr of training to be included in the cross-sectional analysis (N = 2,113). There were too few screeners with morethan 100 hr of training for reliable results.

The X-Ray Competency Assessment Test

The X-Ray CAT was utilized to measure x-ray image interpretation competency inx-ray screening (Koller & Schwaninger, 2006). It consists of a total of 256 items,of which 128 items are images of bags that contain a prohibited item (signal-noise[SN] trials), and 128 images of bags that do not contain a prohibited item (noise[N] trials). The 1:1 ratio of SN to N trials makes possible a precise measurementof detection performance with a minimal amount of trials. Threat items belongto one of four threat item categories (guns, knives, IEDs, and other) with eightitems of each category appearing in the test. Each item is presented in an easy anddifficult view, and each item is embedded into its bag with a low level and with a

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 8: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

120 HALBHERR ET AL.

high level of superposition. Screeners must choose between OK and NOT OK toindicate whether the x-ray image does or does not contain a threat item.

By default, the first measurement of screener x-ray threat image interpretationcompetency with X-Ray CAT takes place before XRT training commences.We refer to this initial test of x-ray image interpretation competency as thebaseline test. Recurrent X-Ray CAT testing is conducted as a measure oftraining ability progression and occurs approximately annually—depending onairport administration. X-Ray CAT performance should be viewed as beingrepresentative of the x-ray image interpretation competency of real-life screeners,which is a prerequisite for good performance at the checkpoint (among otherfactors such as, e.g., motivation and attention). For further information on X-RayCAT, please refer to Koller and Schwaninger (2006).

Dependent and Independent Variables

Detection performance is usually calculated based on hits (correctly identifyinga threat item) and false alarms (incorrectly stating a threat item is present). Hitrate alone does not provide a useful measurement for detection performance: In acomputer-based test a screener can achieve a high hit rate by simply responding tomost bags with NOT OK. Therefore, to achieve a meaningful estimate of detectionperformance, the false alarm rate must also be taken into account. The tendency tofavor OK or NOT OK responses is called bias or criterion. As mentioned earlier,a useful measure for detection performance must take bias into account. There arevarious performance measures in use that do this, in this study the measure A’ hasbeen used (Pollack & Norman, 1964).

A’ = 1 represents a perfect detection performance, and A’ = .5 representschance performance where hit and false alarm rates are equal. A’ is a boundedvariable, ranging from chance performance at A’ = .5 to a perfect performancewith A’ = 1; thus A’ has the advantage of relative ease of interpretation. Withregard to competing detection statistics, Hofer and Schwaninger (2004), in anempirical study, showed that A’ scores are nevertheless highly intercorrelated withother measures of detection performance such as, for example, d’.

Due to security and confidentiality reasons, absolute A’ values cannot bereported. To make possible the comparison of data across graphs, all graphs dis-play the same range of A’ values. To provide meaningful results, we presentrelative comparisons between conditions and effect sizes.

RESULTS

Global Results

X-Ray CAT A’ scores for baseline and 2008 test measurements and total num-ber of individual training hours were compared. Table 1 gives an overview of the

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 9: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 121

TABLE 1Overview of Measures for Performance Improvement

Itemt-Test

Significance Cohen’s dCorrelation

With Training

CorrelationWith

Baseline

All p < .001 1.74 .45∗∗∗ .27∗∗Guns p < .001 1.06 .39∗∗∗ .22∗∗Knives p < .001 1.03 .38∗∗∗ .22∗∗Improvised explosive

devicesp < .001 1.64 .41∗∗∗ .19∗∗

Other p < .001 1.46 .42∗∗∗ .16∗∗

Note. P-values of t-test comparisons between the baseline and 2008 X-Ray CAT A’ scores are inthe first column, the corresponding effect sizes (Cohens’s d) are in the second column. Correlationsbetween 2008 X-Ray CAT A’ scores and logarithmically transformed total training hours since base-line measurement are in the third column. Correlations between baseline and 2008 X-Ray CAT A’scores are in the fourth column. ∗∗∗correlation significant with p < .001, ∗∗correlation significant withp < .01.

FIGURE 1 Distribution plot of baseline and 2008 X-Ray CAT detection performances.Means and standard deviation bars are shown for overall group results.

most important statistics relating detection performance with training. Figure 1compares the scores of the baseline and 2008 X-Ray CAT measurements andgives an overall impression of the distributional shift that occurs with training.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 10: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

122 HALBHERR ET AL.

Guns Knives IEDs Other

De

te

ctio

n P

erfo

rm

an

ce

A'

Threat Type

Baseline 2008

FIGURE 2 Means and standard deviations of baseline and 2008 X-Ray CAT detectionperformances per threat item category (color figure available online).

Note. IEDs = improvised explosive devices.

The performance distribution is shifted toward optimum performance. The dif-ference between the two measurements is highly significant with p < .001,t(2,173) = 81.08, and the effect size is d = 1.74. According to Cohen (1998),effect sizes d > 0.8 constitute strong effects.

Figure 2 dissociates detection performance by threat type. Detection perfor-mance was significantly improved from baseline to 2008 A’ scores for all fourthreat item types. The largest improvement was evidenced for IEDs. Knivesshowed the smallest improvement. All differences between baseline and 2008 A’scores were highly significant with p < .001, with t(2,173) = 49.46, t(2,173) =48.10, t(2,173) = 76.64, and t(2,173) = 68.21 for guns, knives, IEDs, and other,respectively. Effect sizes (Cohen’s d) are considerably smaller for guns (d = 1.06)and knives (d = 1.03) than for IEDs (d = 1.64) and other (d = 1.46). This indicatesthat improvement of detection performance through training is larger for IEDs andother than for guns and knives. Bolfing et al. (2008) showed that in comparisonwith IEDs and other, detection performance for guns and especially knives is moredependent on image-based factors than human factors (i.e., training).

Cross-Sectional Results

Figure 4 shows the between-participant improvements that occur with increasinglevels of training. The average and standard deviation of the baseline X-Ray CATmeasurement of the 2,174 screeners represents performance with zero hours ofXRT training. The remaining screener subsamples were formed according to thetotal number of hours the screeners had trained by the time of the 2008 X-Ray

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 11: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 123

baseline 10 20 30 40 50 60 70 80 90 100

Det

ecti

on P

erfo

rman

ce A

'

Training Hours

FIGURE 3 Means and standard deviations of X-Ray CAT detection performances based ontraining hour subsamples.

CAT measurement. Training hour subsample sizes range from 33 (90–100 hr) to538 screeners (10–20 hr). Subsample sizes for more than 100 hr of training weresmaller than 30 and are not discussed. A law of diminishing returns is observed,with large initial gains in A’ with training hours showing up until around 50 hr,after which point minimal X-Ray CAT A’ gains accrue with further training. Alogarithmic transformation can be applied to create a linear relationship betweentraining hours and CAT performance. The resultant correlation is highly signifi-cant (r = .45, p < .001) and indicates good predictive power of training for X-RayCAT performance.

When split by threat item type, Figure 4 shows a similar relationship to the onefound in Figure 2. A stepwise increase in detection ability for each threat itemwas found, even up to 40 hr of training (refer to Figure 4). Correlations between(logarithmic) training hours and detection performance A’ by threat item typemirror the findings of the effect size analysis earlier: Correlations between totaltraining hours and A’ are smaller for guns (r = .39) and knives (r = .38) than forIEDs (r = .41) and other prohibited items (r = .42). The stronger improvement ofdetection performance through training for IEDs and other is perhaps best illus-trated by the following observation: In Figure 4 we can see that after only 20 hr oftraining, detection performance for IEDs is higher than for any other threat itemcategory—despite the fact that for untrained screeners IEDs are the most difficultitems to detect.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 12: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

124 HALBHERR ET AL.

guns knives IEDs other

Det

ecti

on P

erfo

rman

ce A

'

Threat Type

Baseline10hrs20hrs30hrs40hrs

FIGURE 4 Means and standard deviations of X-Ray CAT detection performance per threatitem category based on training hour subsamples.

Note. IEDs = improvised explosive devices.

Within-Subjects Longitudinal Results

Consecutive X-Ray CAT test measurements (A’) and average amount of screenertraining hours between those tests are shown in Figure 5. There is a clearcorrespondence between training between tests and the outcome performancewhen viewed within subjects. Like the cross-sectional analysis, the biggest effectscan be seen at the beginning of training, between the baseline and second tests,with decreasing effects on subsequent tests. Only after 3 years of training doperformance increases level out.

Longitudinal Analysis of Global Results

Figure 6 illustrates how training policy changes take effect with time. From 2005to 2006 no noticeable improvement in global detection performance (A’) can beseen. This is due to the fact that training infrastructure needed to be installed andthus by 2006 most screeners had only just started with their training. In 2007 and2008, however, increasing gains in average training hours and global detectionperformance can be seen, as infrastructure has been installed across the countryand enough time for training has passed for policy changes to start taking effect.

DISCUSSION

All four predictions have been confirmed. The introduction of training policies hasled to global increases in X-ray image interpretation competency. Competency

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 13: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 125

0

20

40

60

80

100

120

140

160

Baseline Test 2 Test 3 Test 4

Tra

inin

g H

ou

rs

De

te

ctio

n P

erfo

rm

an

ce

A'

FIGURE 5 Means and standard deviations of Competency Assessment Test (CAT) detectionperformances in screeners’ first through fifth test measurements. The lighter bars representtraining hours, and the higher and darker bars show the CAT performance.

2005 2006 2007 2008

Tra

inin

g H

ou

rs

De

te

ctio

n P

erfo

rm

an

ce

A'

FIGURE 6 Means and standard deviations of X-Ray CAT detection performances in 2005through 2008. The lighter bars represent training hours, and the higher and darker bars show theCAT performance. Due to reasons of confidentiality, specific training hours cannot be reported.

improved as a function of computer-based training. Effect sizes of training onX-ray image interpretation competency were large, and training effects werelargest for IEDs—the most unfamiliar threat items—and smallest for knives—the most difficult items when rotated. Trained screeners outperform untrainedscreeners by a large margin. In fact, as illustrated by Figure 1, only the weak-est trained screeners perform worse than the best untrained screeners. When theresults are examined at an organizational level, there are clear advantages of sys-tematic and continuous training for screeners. Figures 3 and 4 demonstrate clearly

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 14: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

126 HALBHERR ET AL.

that, according to X-Ray CAT measurements, those screeners who have under-taken at least 40 hr of training outperform those who have trained less. The biggesttraining gains are seen incrementally up to 40 hr of training.

Figure 2 shows that screeners substantially improve their performance inall threat categories compared to their baseline performance. Taken together,the cross-sectional and longitudinal results cancel out the argument that theseimprovements could be due to becoming familiar with the task. In fact, Kolleret al. (2008) showed that X-Ray CAT improvements indeed are not attributableto task-specific exposure and consequent performance capabilities by includ-ing a control group that took X-Ray CATs but no training. Furthermore, at theairports included in this study, X-Ray CATs are performed at approximately1-year intervals, so it is unlikely that memorization of the stimuli could haveoccurred.

In an applied study of this nature there are both advantages and disadvantages.Although a certain loss of experimental control does occur, the benefits outweighthe detractions by being able to observe performance of actual screeners in a largedata set. Data have been tracked over a 5-year period, from the point before train-ing began, monitoring improvement from novice ability through to various levelsof training expertise. The fact that 1,000 screeners had completed three X-RayCAT tests attests to the power and reliability of these inferences.

Further studies need to be conducted to determine if the effects of training onscreener competency found in this study generalize to screener performance on thejob. Job performance for screeners—and in most jobs—is dependent on more thanknowledge and skills. Such factors as motivation, attitude, level of alertness, anddistractions can have a significant impact on screeners’ ability to detect prohibiteditems on the x-ray.

APPLICATIONS

The results of this study confirm findings of previous research for a large data setwith high ecological validity. The study is proof positive that the large effects ofcomputer-based training on screener competency found in previous, more exper-imental studies will indeed come into effect “on the ground,” if correspondingregulations are put in place and enforced. Indeed, in light of the huge margin bywhich trained screeners outperform untrained screeners, and the comparativelyshort amount of training time required to achieve this improvement, one mightcome to the conclusion that it would be grossly negligent not to put in placesuch regulations! Furthermore, the comparatively poor performance of untrainedscreeners and the large effect of computer-based training on detection perfor-mance raises the question of how resources should be allocated to best improveaviation security screening procedures in the future. Several new security imaging

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 15: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 127

technologies are in development or entering the market, such as backscatter x-raysand millimeter wave scanners (body scanners), multiview x-ray machines or 3Dx-ray machines. The acquisition of any of these technologies is an expensiveinvestment, to be justified by substantial improvements in security. However, theresults presented here suggest that the potential of security imaging technologycan only be fully harvested if its operators are properly trained. At worst, thereplacement of existing security imaging technology with new technology—butwithout proper training environments for its operators—could, in effect, lead to adecline in security through screening procedures. Thus, the development of propertraining environments for any new screening technology should be regarded ascritically important.

ACKNOWLEDGMENTS

We would like to thank Mathias Neukom for his work in double checking the dataassembly process. This research was financially supported by the Canadian AirTransport Security Authority.

REFERENCES

Beck, J. (1982). Textural segmentation. In Beck, J. (Ed.), Organization and representation inperception (pp. 285–317). Hillsdale, NJ: Erlbaum.

Bolfing, A., Halbherr, T., & Schwaninger, A. (2008). How image based factors and human factorscontribute to threat detection performance in x-ray aviation security screening. HCI and Usabilityfor Education and Work, Lecture Notes in Computer Science, 5298, 419–438.

Brown, J. R., & Madhavan, P. (2008). Effects of training base rates and transfer similarity: A compari-son of two search scenarios. Human Factors and Ergonomics Society Annual Meeting Proceedings,52(4), 333–337.

Cohen, J. (1998). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum.Curio, C., Bülthoff, H. H., & Giese, M. A. (2010). Dynamic faces: Insights from experiments and

computation. Cambridge, MA: MIT Press.Fiore, S. M., Scielzo, S., Jentsch, F., & Howard, M. L. (2006). Understanding performance and cogni-

tive efficiency when training for x-ray security screening. Human Factors and Ergonomics SocietyAnnual Meeting Proceedings, Training, 2610–2614.

Gauthier, I., Tarr, M. J., & Bub, D. (2009). Perceptual expertise: Bridging brain and behavior. NewYork, NY: Oxford University Press.

Gauthier, I., Williams, P., Tarr, M. J., & Tanaka, J. (1998). Training “greeble” experts: A frameworkfor studying expert object recognition processes. Vision Research, 38, 2401–2428.

Hofer, F., & Schwaninger, A. (2004). Reliable and valid measures of threat detection performance inx-ray screening. IEEE ICCST Proceedings, 38, 303–308.

Koller, S. M., Drury, C. G., & Schwaninger, A. (2009). Change of search time and non-search time inx-ray baggage screening due to training. Ergonomics, 52, 644–656.

Koller, S. M., Hardmeier, D., Michel, S., & Schwaninger, A. (2007). Investigating training and transfereffects resulting from recurrent CBT of x-ray image interpretation. In D. S. McNamara & J. G.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 16: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

128 HALBHERR ET AL.

Trafton (Eds.), Proceedings of the 29th annual Cognitive Science Society (pp. 1181–1186). Austin,TX: Cognitive Science Society.

Koller, S., Hardmeier, D., Michel, S., & Schwaninger, A. (2008). Investigating training, transfer,and viewpoint effects resulting from recurrent CBT of x-ray image interpretation. Journal ofTransportation Security 1, 81–106.

Koller, S., & Schwaninger, A. (2006). Assessing X-ray image interpretation competency of air-port security screeners. In Garot, J.-M., & Tosic, V. (Eds.), Proceedings of the 2nd InternationalConference on Research in Air Transportation, ICRAT 2006 (pp. 399–402). Belgrade, Serbia:ICRAT Publications.

Kosslyn, S. M. (1994). Image and brain. Cambridge, MA: MIT Press.Madhavan, P., & Gonzalez, C. (2009). The relationship between stimulus-response mappings and the

detection of novel stimuli in a simulated luggage screening task. Theoretical Issues in ErgonomicsScience, 11, 461–473.

McCarley, J. S., Kramer, A. F., Wickens, C. D., Vidoni, E. D., & Boot, W. R. (2004). Visual skills inairport screening. Psychological Science, 15, 302–306.

Michel, S., de Ruiter, J., Hogervorst, M., Koller, S., & Schwaninger, A. (2007). Investigating trainingeffects resulting from recurrent CBT of x-ray image interpretation. IEEE ICCST Proceedings, 41,201–206.

Osaka, N., Rentschler, I., & Biederman, I. (2007). Object recognition, attention, and action. Tokyo,Japan: Springer.

Palmer, S. E. (1999). Vision science. Cambridge, MA: MIT Press.Pfeiffer, R., & Bongard, J. (2007). How the body shapes the way we think. Cambridge, MA: MIT

Press.Pollack, I., & Norman, D. A. (1964). A non-parametric analysis of recognition experiments.

Psychonomic Science, 1, 125–126.Riegelnig, J., & Schwaninger, A. (2006). The influence of age and gender on detection performance

and the criterion in x-ray screening. In Garot, J.-M., & Tosic, V. (Eds.), Proceedings of the 2ndInternational Conference on Research in Air Transportation, ICRAT 2006 (pp. 403–407). Belgrade,Serbia: ICRAT Publications.

Schwaninger, A., & Hofer, F. (2004). Evaluation of CBT for increasing threat detection performancein X-ray screening. In K. Morgan & M. J. Spector (Eds.), The internet society 2004, advances inlearning, commerce and security (pp. 147–156). Wessex: WIT Press.

Schwaninger, A. (2004a, February). Computer based training: A powerful tool to the enhancement ofhuman factors. Aviation Security International, 31–36.

Schwaninger, A. (2004b, November). Increasing efficiency in airport security screening. Paperpresented at AVSEC World 2004, Vancouver, BC, Canada.

Schwaninger, A. (2005). Increasing efficiency in airport security screening. WIT Transactions on theBuilt Environment, 82, 407–416.

Schwaninger, A., Hardmeier, D., & Hofer, F. (2005). Aviation security screeners visual abilities &visual knowledge measurement. IEEE Aerospace and Electronic Systems, 20(6), 29–35.

Schwaninger, A., Bolfing, A., Halbherr, T., Helman, S., Belyavin, A., & Hay, L. (2008). The impactof image based factors and training on threat detection performance in X-ray screening. In Young,D., & Saunders-Hodge, S. (Eds.), Proceedings of the 3rd International Conference on Research inAir Transportation, ICRAT 2008 (pp. 317–324). Fairfax, VA: ICRAT Publications.

Schwaninger, A. (2009). Why do airport security screeners sometimes fail in covert tests? IEEE ICCSTProceedings, 43, 41–45.

Schwaninger, A., Hofer, F., & Wetter, O. E. (2007, October). Adaptive computer-based train-ing increases on the job performance of x-ray screeners. Paper presented at the 41st CarnahanConference on Security Technology, Ottawa, ON, Canada.

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013

Page 17: Airport Security Screener Competency: A Cross … (X-Ray CAT) and performance on ... of human perception and the cognitive system (Palmer, 1999). Its importance is ... We expect to

AIRPORT SECURITY SCREENER COMPETENCY 129

Schwaninger, A., Hardmeier, D., Riegelnig, J., & Martin, M. (2010). Use it and still lose it? Theinfluence of age and job experience on detection performance in x-ray. GeroPsych: The Journal ofGerontopsychology and Geriatric Psychiatry, 23(3), 169–175.

Tarr, M. J., & Bülthoff, H. H. (1998). Image-based object recognition in man, monkey and machine.Cambridge, MA: MIT Press.

Treisman, A. (1986). Features and objects in visual processing. Scientific American, 255, 114B–125B.

Manuscript first received: October 2009

Dow

nloa

ded

by [

Fach

hoch

schu

le N

ordw

ests

chw

eiz]

, [A

dria

n Sc

hwan

inge

r] a

t 08:

49 0

8 A

pril

2013