6
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 1 Cortically–coupled Computer Vision for Rapid Image Search Adam D. Gerson, Lucas C. Parra, and Paul Sajda * Abstract— We describe a real-time EEG-based brain-computer interface (BCI) system for triaging imagery presented using rapid serial visual presentation (RSVP). A target image in a sequence of non-target distractor images elicits in the EEG a stereotypical spatio-temporal response, which can be detected. A pattern classifier uses this response to re-prioritize the image sequence, placing detected targets in the front of an image stack. We use single-trial analysis based on linear discrimination to recover spatial components that reflect differences in EEG activity evoked by target vs. non-target images. We find an optimal set of spatial weights for 59 EEG sensors within a sliding 50 ms time window. Using this simple classifier allows us to process EEG in real- time. The detection accuracy across five subjects is on average 92%, i.e. in a sequence of 2500 images, resorting images based on detector output results in 92% of target images being moved from a random position in the sequence to one of the first 250 images (first 10% of the sequence). The approach leverages the highly robust and invariant object recognition capabilities of the human visual system, using single-trial EEG analysis to efficiently detect neural signatures correlated with the recognition event. Index Terms— electroencephalography, brain–computer inter- face, cortically–coupled computer vision, rapid serial visual presentation, image triage. I. I NTRODUCTION T HE human visual system is exquisitely adept at parsing a scene and recognizing objects at a glance. Such ability for rapid processing of visual information is even more impressive in light of the fact that neurons are relatively slow process- ing elements–compared to digital computers where individual transistors can switch 10 6 times faster than a neuron can spike. Rapid recognition is not only critical for our survival but is also necessary given the architectural constraints of our visual pathways. Computer vision systems have proven to be less effective, in general, than human image analysts. An important problem, in disciplines ranging from radiology to satellite reconnaissance, is optimizing image throughput. However, often there are a limited number of human analysts available to exploit the enormous amount of imagery being collected. Given that the “human visual processor” is the most general purpose and robust vision system ever “constructed”, how can we optimally utilize the available human visual processors for searching through the immense amount of imagery? Adam D. Gerson, [email protected], and Paul Sajda * , [email protected], are with the Department of Biomedical Engineering, Columbia University, New York, NY 10027. Lucas C. Parra, [email protected], is with the Department of Biomedical Engineering, City College of New York, New York, NY 10031. * To whom correspondance should be addressed. In this paper we describe an EEG system capable of detecting neural signatures of visual recognition events evoked during rapid serial visual presentation (RSVP). The system is capable of using these signatures to triage sequences of images, reordering them so that target images are placed near the beginning of the sequence. We term our system “cortically–coupled computer vision” since we leverage the robust recognition capabilities of the human visual system (e.g. invariance to pose, lighting, scale, etc), and use a non- invasive cortical interface (e.g. EEG) to intercept signatures of recognition events – the visual processor performs perception and recognition and the EEG interface detects the result (e.g. decision) of that processing. II. MATERIALS AND METHODS A. Participants Five participants (3 females, age 21 years) from the Columbia University community participated. All had normal or corrected to normal vision and reported no history of neurological problems. Informed consent was obtained from all participants in accordance with the guidelines and approval of the Columbia University Institutional Review Board. B. Data acquisition EEG data was acquired in an electrostatically shielded room (ETS-Lindgren, Glendale Heights, IL) using a Sensorium EPA-6 Electrophysiological Amplifier (Charlotte, VT) from 59 Ag/AgCl scalp electrodes mounted in a standard electrode cap (Electro-Cap, Eaton, OH) at locations based on the In- ternational 10-20 system and from three periocular electrodes placed below the left eye and at the left and right outer canthi. All channels were referenced to the left mastoid with input impedance < 15kΩ and chin ground. Data was sampled at 1000 Hz with an analog pass band of 0.01–300 Hz using 12 dB/octave high pass and 8th order Elliptic low pass filters. Subsequently, a software based 0.5 Hz high pass filter was used to remove DC drifts and 60 Hz and 120 Hz (harmonic) notch filters were applied to minimize line noise artifacts. These filters were designed to be linear-phase to prevent delay distortions. Motor response and stimulus events recorded on separate channels were delayed to match latencies introduced by the digital filtering of the EEG. C. Behavioral paradigm During the RSVP triage task, participants were presented with a continuous sequence of natural scenes. Participants

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

Embed Size (px)

Citation preview

Page 1: IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 1

Cortically–coupled Computer Visionfor Rapid Image SearchAdam D. Gerson, Lucas C. Parra, and Paul Sajda∗

Abstract— We describe a real-time EEG-based brain-computerinterface (BCI) system for triaging imagery presented using rapidserial visual presentation (RSVP). A target image in a sequenceof non-target distractor images elicits in the EEG a stereotypicalspatio-temporal response, which can be detected. A patternclassifier uses this response to re-prioritize the image sequence,placing detected targets in the front of an image stack. We usesingle-trial analysis based on linear discrimination to recoverspatial components that reflect differences in EEG activity evokedby target vs. non-target images. We find an optimal set of spatialweights for 59 EEG sensors within a sliding 50 ms time window.Using this simple classifier allows us to process EEG in real-time. The detection accuracy across five subjects is on average92%, i.e. in a sequence of 2500 images, resorting images basedon detector output results in 92% of target images being movedfrom a random position in the sequence to one of the first 250images (first 10% of the sequence). The approach leverages thehighly robust and invariant object recognition capabilities of thehuman visual system, using single-trial EEG analysis to efficientlydetect neural signatures correlated with the recognition event.

Index Terms— electroencephalography, brain–computer inter-face, cortically–coupled computer vision, rapid serial visualpresentation, image triage.

I. INTRODUCTION

THE human visual system is exquisitely adept at parsing ascene and recognizing objects at a glance. Such ability for

rapid processing of visual information is even more impressivein light of the fact that neurons are relatively slow process-ing elements–compared to digital computers where individualtransistors can switch 106 times faster than a neuron can spike.Rapid recognition is not only critical for our survival but isalso necessary given the architectural constraints of our visualpathways.

Computer vision systems have proven to be less effective, ingeneral, than human image analysts. An important problem, indisciplines ranging from radiology to satellite reconnaissance,is optimizing image throughput. However, often there are alimited number of human analysts available to exploit theenormous amount of imagery being collected. Given that the“human visual processor” is the most general purpose androbust vision system ever “constructed”, how can we optimallyutilize the available human visual processors for searchingthrough the immense amount of imagery?

Adam D. Gerson, [email protected], and Paul Sajda∗ ,[email protected], are with the Department of Biomedical Engineering,Columbia University, New York, NY 10027.

Lucas C. Parra, [email protected], is with the Department of BiomedicalEngineering, City College of New York, New York, NY 10031.

∗To whom correspondance should be addressed.

In this paper we describe an EEG system capable ofdetecting neural signatures of visual recognition events evokedduring rapid serial visual presentation (RSVP). The systemis capable of using these signatures to triage sequences ofimages, reordering them so that target images are placednear the beginning of the sequence. We term our system“cortically–coupled computer vision” since we leverage therobust recognition capabilities of the human visual system(e.g. invariance to pose, lighting, scale, etc), and use a non-invasive cortical interface (e.g. EEG) to intercept signatures ofrecognition events – the visual processor performs perceptionand recognition and the EEG interface detects the result (e.g.decision) of that processing.

II. MATERIALS AND METHODS

A. ParticipantsFive participants (3 females, age 21 years) from the

Columbia University community participated. All had normalor corrected to normal vision and reported no history ofneurological problems. Informed consent was obtained fromall participants in accordance with the guidelines and approvalof the Columbia University Institutional Review Board.

B. Data acquisitionEEG data was acquired in an electrostatically shielded room

(ETS-Lindgren, Glendale Heights, IL) using a SensoriumEPA-6 Electrophysiological Amplifier (Charlotte, VT) from59 Ag/AgCl scalp electrodes mounted in a standard electrodecap (Electro-Cap, Eaton, OH) at locations based on the In-ternational 10-20 system and from three periocular electrodesplaced below the left eye and at the left and right outer canthi.All channels were referenced to the left mastoid with inputimpedance < 15kΩ and chin ground. Data was sampled at1000 Hz with an analog pass band of 0.01–300 Hz using 12dB/octave high pass and 8th order Elliptic low pass filters.Subsequently, a software based 0.5 Hz high pass filter wasused to remove DC drifts and 60 Hz and 120 Hz (harmonic)notch filters were applied to minimize line noise artifacts.These filters were designed to be linear-phase to prevent delaydistortions. Motor response and stimulus events recorded onseparate channels were delayed to match latencies introducedby the digital filtering of the EEG.

C. Behavioral paradigmDuring the RSVP triage task, participants were presented

with a continuous sequence of natural scenes. Participants

Page 2: IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 2

Fig. 1. Example RSVP triage trial. A fixation cross lasting two secondsis followed by a sequence of 100 images. Each sequence contains twotarget images. These targets can appear at any position within the sequence.Following the image sequence, a 10 x 10 matrix is displayed showing thesequence of images. Target images are outlined. After pressing the space-bar,this matrix is sorted according to EEG. Ideally the target images move to thefirst two positions after resorting based on EEG. Participants press the space-bar again to bring up a summary slide showing the position of the targetimages before and after triage. The next trial begins after the subject pressesthe space-bar.

completed two blocks of 50 sequences with a brief restperiod lasting no more than five minutes between blocks.Each sequence consisted of 100 images and contained twotarget images with one or more people in a natural scene.These target images could appear at any position within each100 image sequence. The remaining natural scenes without aperson are referred to as distractor images. Each image waspresented for 100 ms. As illustrated in Figure 1, following theimage sequence a series of self–paced feedback slides werepresented indicating the position of target images within thesequence before and after EEG–based triage. A fixation crosswas displayed for 2 seconds between trials.

During the second block, participants were instructed toquickly press the left button of a generic 3-button mousewith their right index finger as soon as they recognized targetimages. They were instructed to press the button twice, asquickly as possible, if one target image immediately followedthe other. Participants did not respond with a button pressduring the first block.

D. StimuliImages were selected randomly with replacement from a

queue of 251 non-target and 33 target gray scale images.Images were obtained with a Kodak DCS420 digital camerawith a 28mm camera lens (Rochester, NY) [1]. Images weredeblurred and the size of the images was reduced from1536x1024 pixels to 640x426 pixels to decrease processingrequirements of the stimulus computer and ensure precisetiming of stimulus presentation. The images were nonlin-early transformed via gamma correction to match their meanluminance. The fixation cross display had the same meanluminance as the images. A Dell Precision 530 Workstation(Round Rock, TX) with nVidia Quadro4 900XGL graphics

card (Santa Clara, CA) and E-Prime software (PsychologicalSoftware Tools, Pittsburgh, PA) controlled stimulus display.An LCD projector (InFocus LP130, Wilsonville, OR) projectedstimuli through an RF shielded window onto a front projectionscreen. Stimuli subtended 33°±3°×25°±3°of visual field.Target images were visually inspected to ensure that targetobjects did not comprise more than 25% of the area (fractionof pixels) in the scene.

E. On–line artifact reductionImmediately prior to the RSVP task, participants completed

an eye motion calibration experiment during which they wereinstructed to blink repeatedly upon the appearance of a whiteon black fixation cross and then make several horizontal andvertical eye movements according to the position of a fixationcross subtending 1°× 1°of the visual field. Horizontal eyemovements subtended 33°±3°and vertical eye movements sub-tended 25°±3°. The timing of these visual cues was recordedsimultaneously with EEG. This enabled determination of linearcomponents associated with eye blinks and eye movementsthat were subsequently projected out of EEG recorded duringthe RSVP triage task. This procedure is described in detailin [2].

F. Real–time spatial integration for image classificationThe RSVP task is effectively an oddball task, eliciting a

P3 response which has been used extensively to develop braincomputer interfaces for communication [3]. In order to classifyEEG on–line we use a Fisher linear discriminator [4], [5]to estimate a spatial weighting vector wτ,δ which maximallydiscriminates between sensor array signals for two conditions,c ∈ (0, 1), where c = 1 represents a target trial and c = 0 anon-target trial. This weighting vector is specific to a trainingwindow starting at a relative onset time τ , with a duration ofδ. The spatial weighting vector generates a component1:

yc(t) = wTτ,δxc(t) + bτ,δ (1)

that best separates the EEG signals by making y0(t) < y1(t)for as many samples in the training window as possible.The result is a discriminating component specific to targetrecognition activity while minimizing activity correlated withboth task conditions such as early visual processing.

Following each trial, the mean µc and covariance matrixΣc are updated for the condition associated with the trial.The spatial weighting vector wτ,δ and bias bτ,δ are updatedas

wτ,δ = Σ#

poolµ1 −Σ#

poolµ0 (2)

bτ,δ =1

2

[

µ0Σ#

poolµ0 − µ1Σ#

poolµ1

]

(3)

where the pooled covariance Σpool = N0Σ0+N1Σ1

N0+N1

, and N0

and N1 are the number of samples acquired for non–target and

1We use the term “component” instead of “source” to make it clear thatthis is a projection of all the activity correlated with the underlying source.

Page 3: IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 3

target conditions. We use the pseudo-inverse of Σpool, Σ#

pool,to ensure the stability of our on–line algorithm particularly atthe beginning of training when only a few samples have beenobserved and Σ is rank deficient.

Given our linear model, determination of sensor projectionsfrom the discriminating activity is straightforward, namely

a =〈x(t), y(t)〉

〈y(t), y(t)〉(4)

where 〈·, ·〉 denotes an inner product. Equation 4 describesthe projection a of the discriminating component y(t) thatexplains most of the activity x(t). A strong projection indicateslow attenuation. Therefore the intensity of sensor projectionsa indicates proximity of the component to the sensors.

During each experimental condition (with and without mo-tor response), 5000 images were presented to the subject insequences of 100 images. EEG evoked by the first 2500 images(50 targets, 2450 non-targets) was used to train the classifier.During the experimental sessions, a training window between400-500 ms following stimulus onset was used to extracttraining data. This training window was selected based on thetiming of the P3 for this task. Weights were updated adaptivelywith each trial during the training period. These weights werefrozen at the end of the training period and blindly applied tothe subsequent testing dataset (images 2501-5000).

1) Multiple classifiers: After the experiment, multiple clas-sifiers with different training window onsets, τ , were used toboost triage performance. The training window onsets rangedfrom 0 to 900 ms in steps of 50 ms. The duration of thetraining windows, δ, was 50ms. Once these classifiers weretrained, the optimal weighting of these classifier outputs wasfound using logistic regression to discriminate between targetand non–target images.

zc = ωTψc + β, (5)making z0 < z1 for as many trials as possible. ψ =

[

〈y(t)τ1〉 〈y(t)τ2

〉 · · · 〈y(t)τJ〉]T

where the sample average

y(t)τj

=1

δ

τj+δ−1∑

t=τj

y(t), (6)

is taken over corresponding training windows with onsetsτ1 = 0ms, τ2 = 50ms, · · · , τJ = 900ms. Note thatwhile the structure of our classifier resembles a multi-layerperceptron, rather than adapting the coefficients for all clas-sifiers simultaneously classifiers were trained separately toavoid over-training and improve generalization. Again, onlyEEG data evoked by the first 2500 images was used to trainthe classifiers and then find the inter–classifier weights. Theseweights were then applied to the testing data set evoked bythe second set of 2500 images (images 2501-5000).

Similarly as with the sensor projection, a, we can findprojections of the discriminating activity,

α =〈ψ, z〉

〈z, z〉(7)

The projection α describes the relative strength of thetemporal discriminating component over time.

G. Image triageAfter each image sequence (2 targets and 98 non–targets),

image matrices (10 x 10) were presented to the subject to showthe position of each image within the sequence (Figure 1).Target images were outlined in red. The image sequence wasthen resorted based on the output of a single classifier withone training window, 〈y(t)〉 (τ = 400ms, δ = 100ms). Ideallythe two target images are triaged to the first two positions inthe re-sorted image sequence.

Following the experiment, all image sequences were con-catenated to create training and testing sequences that eachcontain 2500 images (50 targets and 2450 non–targets). Theseimage sequences are re–sorted according to the output of ourclassifier with multiple training windows, z, for EEG evokedby every image.

1) Button–based image triage: For comparison, sequenceswere triaged based on the button response. Images wereresorted according to:

p(target|RT ) =

p(RT |target)p(target)p(RT |target)p(target) + p(RT |non–target)p(non–target)

(8)

where RT is the onset of a button response that occurswithin one second of image onset. p(target|RT ) = 0 when noresponse occurred within one second of image onset. The pri-ors p(target) = 0.02 and p(non–target) = 0.98. p(RT |target)is a Gaussian distribution with a mean and variance determinedfrom the response times from the training sequences. Sincemore than one response is likely to follow a target image ifthe two target images are presented within one second of eachother, for training sequences response times were assigned totarget images based on the position of the target image withinthe sequence. In other words if the target appeared first inthe sequence and two button responses occurred within onesecond of this target’s onset, the first response was assignedto that target image and the second response was assignedto the second target image. For testing sequences, if two ormore responses occur within one second of the onset of anyimage, the response with the greatest p(target|RT ) is assignedto the image. p(RT |non–target) is a mixture of 13 Gaussians,each with the same variance as that used for p(RT |target) andwith means assigned by shifting the mean from p(RT |target)600 ms in the past to 700 ms in the future in incrementsof 100 ms, excluding the actual mean of p(RT |target). Thismixture model contains a sufficient number of Gaussians sothat the mixture is consistent within the one second intervalfollowing image onset. p(RT |non–target) was designed tomodel responses occurring within one second of the onset ofa non–target image that is presented within one second priorto or following a target image.

III. RESULTS

A. Task performanceIn the RSVP task, five participants correctly responded to

92 ± 5% and 90 ± 10% of targets during training and testingsequences respectively. Response times for correctly identified

Page 4: IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 4

target across five subjects had a mean of 421 ± 91 ms and426 ± 71 ms for training and testing sequences respectively.Behavioral statistics for five subjects are shown in Table I.Mean and standard deviation of response times for trainingand testing sequences are listed with the corresponding percentof correctly identified targets. The response bias [6], [7], c, isalso listed for testing sequences:

c =Φ−1(H) + Φ−1(F )

2(9)

where H and F are the hit rate and the false alarm raterespectively, and Φ−1 is the inverse of the normal cumulativedistribution function used to convert probabilities into z scores.Negative values of c indicate a bias toward not responding toimages.

B. Discriminating componentsFollowing the experiment 19 classifiers were trained using

training windows with onsets ranging from 0 to 900 ms inincrements of 50 ms. The duration of each training windowwas 50 ms. Figure 2 shows the mean scalp projections fordiscriminating components from these classifiers across fiveparticipants. Scalp projections a of discriminating componentswere normalized prior to averaging across subjects. Compo-nents are shown for both the motor and non–motor tasks.

Bilateral occipital activity seen about 150 ms followingstimulus onset is followed by strong negatively correlatedactivity over frontal electrodes. This frontal activity occursslightly later for the motor condition (250 ms vs. 200 ms)and is followed by positively correlated activity over parietalelectrodes about 350 ms after stimulus onset. This parietalactivity spreads over central electrodes until about 500 ms forboth motor and non–motor tasks. This frontal–parietal pro-gression of activity is consistent with models of the P3 com-plex indicating that anterior cingulate, inferior–temporal lobe,and hippocampus are primarily responsible for generation ofobserved potentials [2], [8]–[11]. The scalp maps diverge formotor and non–motor tasks from 550 ms through 700 ms. Themotor condition shows sustained positively correlated activityover frontal and central electrodes during this period, while thenon–motor condition elicits only negatively correlated activitybilaterally over occipital electrodes. Subsequent componentsshow positively correlated activity over frontal electrodesfor both motor and non–motor conditions. Curiously activityspecific to the motor conditions occurs after the response.While it is possible that this activity may be associated with asomatosensory response, given the symmetry of this bilateralresponse this seems unlikely. Figure 2 also shows the timecourse of the multiple classifier system α. A strong temporalprojection indicates there is strong correlation in time. Thistime course is stronger for the earlier components of the non–motor condition vs. motor condition.

C. Triage performanceTriage results for one subject (subject 2) are shown in Fig-

ure 3. Figure 3(a) shows number of targets as a function of thenumber of distractor images both before and after triage based

Fig. 2. Group results over five subjects. The scalp maps on the left show thesensor projections, a, of discriminating activity y(t) averaged over all subjectsfor the non–motor (left columns) and motor (right columns) experimentaltasks. The numbers to the left of each scalp map pair correspond to the onset,relative to stimulus presentation, of the temporal window selected to train eachclassifier, in milliseconds. The duration of each classifier is 50 ms. Note thatthe scalp maps for motor and non-motor conditions begin to diverge about550 ms following stimulus onset. The plot on the right shows the projectedtime-course, α, of discriminating activity for the non-motor (dash–dot curve)and motor (dotted curve) tasks.

on button press and EEG. The area under the curve generatedby plotting fraction of targets as a function of the fractionof distractor images presented is used to quantify triageperformance. Triage performance of the multiple classifiersystem for five subjects is listed in Table I. This area is 0.50 forall unsorted image sequences since target images are randomlydistributed throughout the sequences. Ideal triage performanceresults in an area of 1.00. There is no significant differencein performance between button–based and EEG–based triage(0.93±0.06, 0.92±0.03, p = 0.69, N = 5). Interestingly thereis no significant difference in performance between EEG–based triage for the motor and no motor response conditions(0.92± 0.03, 0.91± 0.02, p = 0.81, N = 5).

Figures 3(b)–3(f) are rasters showing the position of thetarget images (black squares) and non–target images (whitesquares) in the concatenated image sequence. Based on theserasters and the EEG and button–based triage performance forfive subjects list in Table I, it is clear that both EEG andbutton–based triage systems are capable of a high level ofperformance. The button–based triage performance begins tofail, however, when subjects do not consistently respond totarget stimuli and response times exceed one second. Subject 2,for instance correctly responded to only 74% of targets duringthe testing session, and had a stronger bias than other subjectsnot to respond to images as reflected by the response biasmeasure c listed in Table I. In fact, this subject did not respondto 12 of 50 target images and the response time for 1 targetimage exceeded one second. Excessively late responses cannoteffectively be classified using our bayesian methods since itis not clear whether these button presses were in responseto the target image or a subsequent non–target image. TheEEG response evoked by images with either no response or alate response is, however, still consistent with EEG evokedby the target images with predictable response times. TheEEG–based triage system is therefore capable of detecting therecognition of these target images and subsequently resortingthese target images appropriately. For this reason we exploitthe information provided by both EEG and button press usinganother perceptron to boost triage performance. This approachis effective for increasing triage performance for subjects that

Page 5: IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 5

either did not respond or had a delayed motor response to asignificant number of target images (e.g. subjects 1 and 2).

IV. CONCLUSION

We have demonstrated a real–time system for triagingsequences of images based on EEG signatures extracted usinglinear classifiers. While there were no significant differencesin triage performance based on EEG or button press, button–based triage performance was superior for subjects that cor-rectly responded to a high percentage of target images. How-ever, for two subjects that responded to fewer images correctly(subjects 1 and 2), the EEG–based triage system offered betterperformance. We find that integrating information from EEGand behavioral response offers the best strategy. Using thismethod we can capture images that lack an overt behavioralresponse and would have otherwise been missed by a button–based triage system.

Our on–line image triage system was initially designedto classify EEG based on a single training window (τ =400ms, δ = 100ms). We realized once the experiments werecompleted that the performance of our EEG–based triage sys-tem is vastly improved using the multiple classifier system andwe plan to use these methods for future BCI experiments. Wedo not present results from the single classifier but rather notethe improvement in performance using the multiple classifiersystem. It is clear from the performance improvement thatthe evoked response contains information that supplements theP3 for classification. Data was presented sequentially to themultiple classifier system in order to simulate on–line trainingconditions. This classifier was implemented for use on–line.

Following each image sequence, feedback was presentedshowing the position of target images before and after triage.The influence of feedback on behavioral performance andevoked EEG is unknown since we did not conduct a controlexperiment without feedback. This certainly warrants furtherstudy. It would be of interest, for instance, to determinewhether feedback evokes activity in reward centers such asthe anterior cingulate.

It is not clear why some subjects consistently respondto more target images correctly. With this in mind, we areinvestigating the impact of target salience within the scene(e.g. clutter, camouflage, and noise), image presentation rate,and user expertise (e.g. novice vs. expert) on both behavioraland neural responses. We expect that the density of targetimages within each sequence will have a significant impacton evoked EEG activity due to attentional blink effects [12],[13] and the strong dependence of the P300 response on targetprobability and target–to–target interval (TTI) [14]. Sincethe strength of this response is also modulated by the levelof surprise, we also expect contextual cues to affect EEGevoked by target images. For this reason we are also exploringthe effects of spatio-temporal correlation between consecutivevideo frames.

ACKNOWLEDGMENTS

This work was funded by Defense Advanced ResearchProjects Agency (DARPA), the DoD Multidisciplinary Uni-

versity Research Initiative (MURI) administered by the Of-fice of Naval Research (N00014-01-1-0625) and by the NIH(EB004730).

REFERENCES

[1] J. van Hateren and A. van der Schaaf, “Independent component filtersof natural images compared with simple cells in primary visual cortex,”Proceedings of the Royal Society of London B, vol. 265, pp. 359–366,1998.

[2] A. Gerson, L. Parra, and P. Sajda, “Cortical origins of response timevariability during rapid discrimination of visual objects,” NeuroImage,in press.

[3] E. Donchin, K. M. Spencer, and R. Wijesinghe, “The mental prosthesis:assessing the speed of a p300-based brain-computer interface,” IEEETransactions on Rehabilitation Engineering, vol. 8, no. 2, pp. 174–179,June 2000.

[4] R. Duda, P. Hart, and D. Stork, Pattern Classification. New York: JohnWiley & Sons, Inc., 2001.

[5] L. C. Parra, C. D. Spence, A. Gerson, and P. Sajda, “Recipes for thelinear analysis of EEG,” NeuroImage, in press.

[6] J. G. Snodgrass and J. Corwin, “Pragmatics of measuring recognitionmemory: Applications to dementia and amnesia,” Journal of Experimen-tal Psychology: General, vol. 117, p. 34050, 1988.

[7] D. M. Green and J. A. Swets, Signal detection theory and psychophysics.New York: John Wiley & Sons, Inc., 1966.

[8] D. Friedman, “Cognition and aging: A highly selective overview ofevent-related potential (ERP) data,” Journal of Clinical and Experimen-tal Neuropsychology, vol. 25, no. 5, pp. 702–720, 2003.

[9] J. Polich, Ed., Detection of Change: Event-Related Potential and fMRIFindings. Boston: Kluwer Academic Publishers, 2003.

[10] S. Makeig, A. Delorme, M. Westerfield, J. Townsend, E. Courchense,and T. Sejnowski, “Electroencephalographic brain dynamics followingvisual targets requiring manual responses,” PLoS Biology, vol. 2, no. 6,pp. 747–762, June 2004.

[11] S. Makeig, M. Westerfield, T.-P. Jung, J. Covington, J. Townsend,T. Sejnowski, and E. Courchesne, “Independent components of the latepositive response complex in a visual spatial attention task,” Journal ofNeuroscience, vol. 19, pp. 2665–2680, 1999.

[12] K. K. Evans and A. Treisman, “Perception of objects in natural scenes:Is it really attention free?” Journal of Experimental Psychology: HumanPerception and Performance, vol. 31, no. 6, pp. 1476–1492, 2005.

[13] C. Kranczioch, S. Debener, and A. K. Engel, “Event-related potentialcorrelates of the attentional blink phenomenon,” Cognitive Brain Re-search, vol. 17, pp. 177–187, 2003.

[14] C. J. Gonsalvez and J. Polich, “P300 amplitude is determined by target-to-target interval,” Psychophysiology, vol. 39, pp. 388–396, 2002.

Page 6: IEEE TRANSACTIONS ON NEURAL SYSTEMS AND …newton.bme.columbia.edu/publications/triage_ieee.pdf · IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL ... placed

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. ?, NO. ?, JUNE 2006 6

0 500 1000 1500 2000 24500

5

10

15

20

25

30

35

40

45

50Subject 2

Distractors Presented

Targ

ets

Pres

ente

d

Pre−triageEEG (no motor)EEG (motor)ButtonEEG (motor) and Button

(a) Triage performance2500

1201

451

701

951

1201

1451

1701

1951

2201

2451

(b) Original sequence2500

1201

451

701

951

1201

1451

1701

1951

2201

2451

(c) EEG (no motor)

2500

1201

451

701

951

1201

1451

1701

1951

2201

2451

(d) EEG (motor)2500

1201

451

701

951

1201

1451

1701

1951

2201

2451

(e) Button2500

1201

451

701

951

1201

1451

1701

1951

2201

2451

(f) EEG (motor) and Button

Fig. 3. Triage performance for subject 2 using multiple classifiers. (a) Number of target images presented as a function of the number of distractor imagespresented. An ideal triage system will place 50 (100%) of target images before all 2450 distractor images. The light gray curve shows the original sequence.Button-based triage is shown by the dashed curve. The dash–dot curve shows EEG-based triage during the experiment without motor response. The dottedcurve shows EEG-based triage during the experiment with motor response and the thick black curve shows triage based on EEG (motor) and the buttonresponse. (b–f) Rasters showing the position of non–target (white squares) and target (black squares) within the (b) original image sequence (c) EEG (nomotor)–based triage sequence, (d) EEG (motor)–based triage sequence (e) button–based triage sequence and (f) Combined EEG (motor) & Button–basedtriage sequence. The first and last images in each sequence are shown by the squares in the upper left and lower right of each raster respectively.

TABLE ITRIAGE PERFORMANCE AND BEHAVIORAL RESULTS

Subject EEG EEG Button EEG (motor) RT (training) RT (testing) % Correct % Correct Response bias(no motor) (motor) and Button (ms) (ms) (training) (testing) c, (testing)

1 0.92 0.91 0.87 0.94 418 ± 133 413 ± 101 88 86 -0.82 0.94 0.96 0.86 0.97 412 ± 64 450 ± 64 94 74 -1.23 0.90 0.87 0.96 0.96 445 ± 79 423 ± 59 86 94 -0.74 0.91 0.92 0.98 0.98 433 ± 74 445 ± 59 98 98 -0.55 0.91 0.93 0.98 0.98 398 ± 86 402 ± 58 96 96 -0.9

Group 0.91 ± 0.02 0.92 ± 0.03 0.93 ± 0.06 0.97 ± 0.02 421 ± 91 426 ± 71 92 ± 5 90 ± 10 -0.8 ± 0.3