14
Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA Gabriel Kreiman # $ Department of Ophthalmology, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA Center for Brain Science, Harvard University, Cambridge, MA, USA Swartz Center for Theoretical Neuroscience, Harvard University, Cambridge, MA, USA Humans can recognize objects and scenes in a small fraction of a second. The cascade of signals underlying rapid recognition might be disrupted by temporally jittering different parts of complex objects. Here we investigated the time course over which shape information can be integrated to allow for recognition of complex objects. We presented fragments of object images in an asynchronous fashion and behaviorally evaluated categorization performance. We observed that visual recognition was significantly disrupted by asynchronies of approximately 30 ms, suggesting that spatiotemporal integration begins to break down with even small deviations from simultaneity. However, moderate temporal asynchrony did not completely obliterate recognition; in fact, integration of visual shape information persisted even with an asynchrony of 100 ms. We describe the data with a concise model based on the dynamic reduction of uncertainty about what image was presented. These results emphasize the importance of timing in visual processing and provide strong constraints for the development of dynamical models of visual shape recognition. Introduction Humans and other primates can recognize complex objects and scenes in a glimpse. Psychophysics data suggest rapid processing of visual information within approximately 150 ms of stimulus presentation (Kirchner & Thorpe, 2006; Potter & Levy, 1969). The ability to recognize complex shapes is instantiated by the cascade of processes along the ventral visual stream (Connor, Brincat, & Pasupathy, 2007; Logothetis & Sheinberg, 1996; Serre et al., 2007; Tanaka, 1996). Consistent with the behavioral measures, single neuron recordings in macaque inferior temporal cortex (Hung, Kreiman, Poggio, & DiCarlo, 2005; Richmond, Opti- can, & Spitzer, 1990; Rolls, 1991), electroencephalo- graphic signals from the human scalp (Johnson & Olshausen, 2003; Thorpe, Fize, & Marlot, 1996), and intracranial field potentials from the human occipital and inferior temporal cortex (Liu, Agam, Madsen, & Kreiman, 2009) have revealed image-specific responses as early as 100–150 ms after stimulus onset. While these neurophysiological and behavioral observations sug- gest that information can quickly propagate through the visual system, it is not clear what proportion of human performance this initial wave of activity can account for; recognition in the natural world may require significant temporal integration. Given that there are at minimum approximately eight synapses between photoreceptors in the retina and high-level visual neurons in the inferior temporal cortex, several investigators have argued that, to a first approximation, rapid recognition can be reasonably described by a mostly bottom-up hierarchy of trans- formations along the ventral visual stream (Deco & Rolls, 2004; DiCarlo, Zoccolan, & Rust, 2012; Fu- kushima, 1980; Riesenhuber & Poggio, 2000; Rolls, 1991; Serre et al., 2007; vanRullen & Thorpe, 2002). These so-called feed-forward architectures represent a major simplification of the complex organization of neocortex, which includes ubiquitous top-down signals and horizontal connections in addition to bottom-up synapses (Douglas & Martin, 2004; Felleman & Van Essen, 1991). Independently of the relative contribu- tions of bottom-up and top-down signals, the short latencies observed in the human and macaque inferior temporal cortex impose a maximum limit of 10 to 20 ms on the amount of processing that can take place at Citation: Singer, J. M., & Kreiman, G. (2014). Short temporal asynchrony disrupts visual object recognition. Journal of Vision, 14(5):7, 1–14, http://www.journalofvision.org/content/14/5/7, doi:10.1167/14.5.7. Journal of Vision (2014) 14(5):7, 1–14 1 http://www.journalofvision.org/content/14/5/7 doi: 10.1167/14.5.7 ISSN 1534-7362 Ó 2014 ARVO Received November 15, 2013; published May 12, 2014

Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

Short temporal asynchrony disrupts visual object recognition

Jedediah M Singer $Department of Ophthalmology Boston Childrenrsquos Hospital

Harvard Medical School Boston MA USA

Gabriel Kreiman $

Department of Ophthalmology Boston Childrenrsquos HospitalHarvard Medical School Boston MA USA

Center for Brain Science Harvard UniversityCambridge MA USA

Swartz Center for Theoretical NeuroscienceHarvard University Cambridge MA USA

Humans can recognize objects and scenes in a smallfraction of a second The cascade of signals underlyingrapid recognition might be disrupted by temporallyjittering different parts of complex objects Here weinvestigated the time course over which shapeinformation can be integrated to allow for recognition ofcomplex objects We presented fragments of objectimages in an asynchronous fashion and behaviorallyevaluated categorization performance We observed thatvisual recognition was significantly disrupted byasynchronies of approximately 30 ms suggesting thatspatiotemporal integration begins to break down witheven small deviations from simultaneity Howevermoderate temporal asynchrony did not completelyobliterate recognition in fact integration of visual shapeinformation persisted even with an asynchrony of 100ms We describe the data with a concise model based onthe dynamic reduction of uncertainty about what imagewas presented These results emphasize the importanceof timing in visual processing and provide strongconstraints for the development of dynamical models ofvisual shape recognition

Introduction

Humans and other primates can recognize complexobjects and scenes in a glimpse Psychophysics datasuggest rapid processing of visual information withinapproximately 150 ms of stimulus presentation(Kirchner amp Thorpe 2006 Potter amp Levy 1969) Theability to recognize complex shapes is instantiated bythe cascade of processes along the ventral visual stream(Connor Brincat amp Pasupathy 2007 Logothetis ampSheinberg 1996 Serre et al 2007 Tanaka 1996)Consistent with the behavioral measures single neuron

recordings in macaque inferior temporal cortex (HungKreiman Poggio amp DiCarlo 2005 Richmond Opti-can amp Spitzer 1990 Rolls 1991) electroencephalo-graphic signals from the human scalp (Johnson ampOlshausen 2003 Thorpe Fize amp Marlot 1996) andintracranial field potentials from the human occipitaland inferior temporal cortex (Liu Agam Madsen ampKreiman 2009) have revealed image-specific responsesas early as 100ndash150 ms after stimulus onset While theseneurophysiological and behavioral observations sug-gest that information can quickly propagate throughthe visual system it is not clear what proportion ofhuman performance this initial wave of activity canaccount for recognition in the natural world mayrequire significant temporal integration

Given that there are at minimum approximatelyeight synapses between photoreceptors in the retina andhigh-level visual neurons in the inferior temporalcortex several investigators have argued that to a firstapproximation rapid recognition can be reasonablydescribed by a mostly bottom-up hierarchy of trans-formations along the ventral visual stream (Deco ampRolls 2004 DiCarlo Zoccolan amp Rust 2012 Fu-kushima 1980 Riesenhuber amp Poggio 2000 Rolls1991 Serre et al 2007 vanRullen amp Thorpe 2002)These so-called feed-forward architectures represent amajor simplification of the complex organization ofneocortex which includes ubiquitous top-down signalsand horizontal connections in addition to bottom-upsynapses (Douglas amp Martin 2004 Felleman amp VanEssen 1991) Independently of the relative contribu-tions of bottom-up and top-down signals the shortlatencies observed in the human and macaque inferiortemporal cortex impose a maximum limit of 10 to 20ms on the amount of processing that can take place at

Citation Singer J M amp Kreiman G (2014) Short temporal asynchrony disrupts visual object recognition Journal of Vision14(5)7 1ndash14 httpwwwjournalofvisionorgcontent1457 doi1011671457

Journal of Vision (2014) 14(5)7 1ndash14 1httpwwwjournalofvisionorgcontent1457

doi 10 1167 14 5 7 ISSN 1534-7362 2014 ARVOReceived November 15 2013 published May 12 2014

each stage before the first bits of information are passedon to the next stage

Despite the rapid progression of this initial wave ofinformation response durations can extend from a fewtens of ms in V1 (Ringach Hawken amp Shapley 2003)to approximately 70 ms in MT (Bair amp Movshon 2004)to 100 ms or more in inferior temporal cortex (DeBaene Premereur amp Vogels 2007) Spatiotemporalintegration is critical for the definition of receptivefields in early visual areas as well as for motiondetection signals along the dorsal stream The contri-butions of spatiotemporal integration along the ventralstream in regions involved in high-level shape recogni-tion such as inferior temporal cortex are less clearlyunderstood The accumulation of visual shape infor-mation over time may improve recognition perfor-mance beyond what could be achieved by independentprocessing of sequential lsquolsquosnapshotsrsquorsquo

Psychophysics studies have shown that subjects canhold a representation of basic visual information suchas letters (Sperling 1960) or flashing arrays of squaresor dots (Brockmole Wang amp Irwin 2002 Hogben ampDi Lollo 1974) in lsquolsquoiconic memoryrsquorsquo for a short timeafter presentation The recognition of a whole objectcan be facilitated when presented in close temporalcontiguity and spatial register with one of its parts(Sanocki 2001) demonstrating that varying the dy-namics with which object information is presented caninfluence perception Prior studies of temporal inte-gration in terms of interference between parts ofdifferent faces (Anaki Boyd amp Moscovitch 2007Cheung Richler Phillips amp Gauthier 2011 Singer ampSheinberg 2006) emotion recognition (Schyns Petroamp Smith 2007) and extraction of low-level shapefeatures (Aspell Wattam-Bell amp Braddick 2006Clifford Holcombe amp Pearson 2004) have foundrelevant time scales in the range of several tens to a fewhundred milliseconds Longer integration windows upto several seconds have been reported in studies ofmotion discrimination (Burr amp Santoro 2001) andbiological motion discrimination (Neri Morrone ampBurr 1998)

There is thus a range of temporal integrationwindows that could play a role in recognition ofcomplex objects Observations of rapid processingsuggest that even small disruptions to simultaneitymight carry large consequences while observations oflong temporal receptive fields and integration orpersistence of low-level visual stimuli suggest thatobject recognition might be robust to temporalasynchrony Here we used psychophysics experimentsin which images were broken into asynchronouslypresented parts (Figure 1) to evaluate the time courseover which asynchronous information can be integrat-ed together and lead to the recognition of complexobjects

Methods

All procedures were carried out with subjectsrsquoinformed consent and in accordance with protocolsapproved by the Boston Childrenrsquos Hospital Institu-tional Review Board

Apparatus

Stimuli were presented on a Sony Multiscan G52021-in cathode-ray tube monitor (Sony CorporationTokyo Japan) running at a 170 Hz refresh rate and872 middot 654 pixel resolution The experiment was run onan Apple MacBook Pro computer (Apple ComputerCupertino CA) running MATLAB software (Math-Works Natick MA) with the Psychophysics Toolboxand Eyelink Toolbox extensions (Brainard 1997Cornelissen Peters amp Palmer 2002 Pelli 1997) For37 of the 50 subjects across all experiments weobtained reliable measurements of subjectsrsquo eye move-ments using the EyeLink 1000 system (using infraredcorneal reflection and pupil location with nine-pointcalibration) running in remote mode (SR ResearchMississauga Ontario) Subjects were seated in a dimlylit windowless room approximately 53 cm from the eyetracking camera and approximately 71 cm from thedisplay monitor Subjects performed a four-alternativeforced choice task (described below) and indicated theirresponses using a Logitech Cordless RumblePad 2game controller (Logitech Fremont CA) Trials inwhich the requested times for stimulus onset weremissed by more than one screen refresh (59 ms) werediscarded In separate tests we used a photodiode toindependently verify the accuracy of the imagepresentation times

Stimulus presentation Experiment 1

Sixteen subjects participated in Experiment 1Images were drawn randomly with replacement from alibrary of 428 grayscale silhouettes of animals peopleplants and vehicles with each category equallyrepresented These images were obtained from severalfreely accessible sources on the Internet resized tooccupy most of a 256 middot 256 pixel square (whichsometimes involved clipping of object edges) super-imposed on a noise background and adjusted usingPhotoshop (Adobe San Jose CA) to have flat intensityhistograms The power spectrum of each image wasthen calculated using a Fast Fourier Transform (FFT)along with the average power spectrum across allimages The final set of images was generated by foreach image taking the inverse FFT of the population

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 2

average power spectrum with that imagersquos respectiveunique phase spectrum This resulted in a set of imageswith consistent power at each spatial frequency andbackgrounds filled with cloud-like noise Exampleimages are shown in Figure 1B (the line drawings onthe second row were only used in Experiment 2) all theimages used in these experiments can be obtained fromhttpklabtchharvardeduresourcessinger_asynchronyhtml

Each of 16 subjects (13 female age 19ndash35 mean age25) completed 50 blocks of 46 trials each The sequenceof events in each trial is illustrated in Figure 1A Theimage selected for each trial was drawn with equalprobability from one of the four categories Each trialbegan with a 14-pixel (058) fixation cross presented for

500 ms In those subjects for which eye tracking wasavailable (14 of 16 subjects) we required 500 ms offixation within 38 of the center of the cross Followingfixation a 256 middot 256 pixel (938 middot 938) square offlickering (170 Hz) noise appeared in the center of thescreen This noise was generated by randomizing thephase of the mean power spectrum of the library ofimages The purpose of the noise was to eliminate thepossible effects of apparent motion between twosuccessive fragments (eg Cavanagh Holcombe ampChou 2008) We considered a 4 middot 4 square tessellationof each image edge alpha masks for these squares wereblurred with a Gaussian blur of radius 4 pixels usingPhotoshop We defined a lsquolsquofragmentrsquorsquo as 3 out of the 16squares (not necessarily spatially contiguous) randomly

Figure 1 Schematic of the organization of the experiments A After 500 ms fixation between one and four fragments each consisting

of 316 of a source image (upper left) were briefly shown embedded in a stream of flickering 170 Hz phase-scrambled visual noise

Fragments lasted 118 ms and were presented with a stimulus onset asynchrony (SOA constant within a given trial) that took 1 of 10

possible values from 0 ms (synchronous presentation) to 2941 ms (Methods) The source image the number of fragments and the

SOA value were randomly chosen in each trial Noise continued for 500 ms after the onset of the final fragment after which subjects

were presented with a four-alternative forced choice categorization task B Examples of other images used in the experiments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 3

chosen in each trial Depending on the configuration ofthe three squares each fragment spanned a minimumof 478 and a maximum of 938 Spatial integration of allof the information in four fragments required inte-grating over 938 Each fragment was present for 118ms (two monitor refreshes) Trials containing onethree or four fragments were randomly interleavedWhen more than one fragment was used in a trial theycontained nonoverlapping parts of the source imageFragments were sequentially presented with the firstfragment appearing 500 ms after the noise beganSuccessive fragments were presented with a stimulusonset asynchrony (SOA) selected at random from thefollowing list 0 ms (synchronous presentation) 118 ms(one fragmentrsquos offset coincided with the subsequentfragmentrsquos onset) 177 ms 294 ms 471 ms 706 ms100 ms 1471 ms 2059 ms 2941 ms The SOA valuewas constant within a trial and the different SOAs wererandomly interleaved across trials (Movie 1) We alsoincluded randomly interleaved catch trials in which 1516 of an image was shown for 588 ms The catch trialsserved to ensure that subjects were engaged in the taskand understood the instructions subjects who scoredbelow 80 on catch trials or below 13 overall wereexcluded (criteria established a priori all subjectspassed in Experiment 1)

The flickering noise continued while the fragmentswere displayed and for 500 ms after the last fragmentappeared at which point it was replaced by a blankgray screen containing the text lsquolsquoPlease indicate whichkind of picture was hidden in the cloudsrsquorsquo and fouroptions were shown corresponding to the four game-pad buttons used for lsquolsquoAnimalsrsquorsquo lsquolsquoPeoplersquorsquo lsquolsquoVehiclesrsquorsquoand lsquolsquoPlantsrsquorsquo This choice screen remained until thesubject pressed a button Performance is reported asmean 6 SEM throughout the manuscript A high-contrast mask (one of the noise images selected atrandom with its contrast maximized) then appearedfor one monitor refresh followed by the fixation crossfor the next trial Trials in which the subject lookedaway from the stimulus or blinked were excluded fromanalyses No feedback was given except betweenblocks when the overall performance for that blockwas displayed

Stimulus presentation Experiment 2 variant A

Experiment 2A addressed two questions First canthe uncertainty reduction model accurately describe thesimplest possible asynchronous trials containing onlytwo fragments Second while the performance curvesfrom Experiment 1 appeared to be near asymptote by2941 ms SOA do the results hold at very long SOAImages were drawn from a library of 364 line drawingsand silhouettes which partially overlapped with that

used in Experiment 1 Aside from the different imageset and the additional conditions the experimentparameters were identical to those in Experiment 1Twenty-one new subjects (13 female age 19ndash51 meanage 27) participated in this experiment We excludedfrom analyses one subject due to low performance oncatch trials ( 08 correct) and three subjects due to lowoverall performance ( 13 correct) this exclusion didnot significantly change the overall results In thisvariant we had eye tracking available for 10 of the 17subjects whose performance exceeded threshold (theeye tracker was unavailable for five subjects andcalibration was unreliable for two subjects) Catchtrials were included in six subjects There were 36blocks of 36 or 37 trials depending on whether catchtrials were included

Stimulus presentation Experiment 2 variant B

In Experiment 1 longer SOAs and more fragmentsboth led to longer trials These differences might haveinfluenced the observed results (eg forgetting infor-mation over the course of a second or more may lead tolower performance in longer trials) To control for thispossibility we fixed the total trial duration in a secondvariant experiment Moreover during the approxi-mately 2-hr duration of each run of Experiment 1 each116 of each image was presented an average of 34times To ensure that this repetition did not lead to rotememorization or long-term learning effects this secondvariant also included a larger image library so that

Figure 2 Mean performance across all 16 subjects in

Experiment 1 in trials with zero SOA Error bars indicate

standard errors of the mean (SEM) All presentations lasted for

118 ms except the catch trials which lasted 588 ms The

dashed line indicates chance performance

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 4

each 116 of each image was presented to each subjectan average of 05 times There were 36 blocks of 34trials SOA values were drawn from the same list as inExperiment 1 Two-fragment trials were also presentedbut not included in the reported analyses to make theresults more directly comparable to those in Experi-ment 1 Adding the data from the two-fragment trialsdid not change the conclusions One of the two keydifferences in this variant is that the stream of flickeringnoise in each trial lasted for 18824 ms (for all SOAvalues and number of fragments)mdashlong enough toaccommodate a trial with four fragments at the longestasynchrony used (2941 ms) followed by 500 ms ofnoise Trials with shorter asynchronies or fewerfragments concluded with a longer period of uninfor-mative noise In this way the time interval between trialonset and behavioral response was identical in all trialsThe second key difference in this variant is that imageswere drawn at random from a library of 1200 linedrawings and silhouettes a superset of those used inExperiments 1 and 2A This means that each 116 ofeach image had an approximately 50 chance of beingpresented to a given subject Eighteen new subjects (13female age 18ndash35 mean age 24) participated inExperiment 2B one subject was excluded from analysesdue to low performance on catch trials ( 08 correct)We had reliable eye tracking in 13 of the 17 subjectswhose performance exceeded threshold

Data analyses

Diagnostic image parts

Even though we took several precautions to makeimages homogeneous in terms of basic low-levelproperties (see above) some images were easier tocategorize than others and parts of some images weremore informative than others If part of an image isreliably diagnostic subjects may be able to perform thetask based only on that part regardless of any otherinformation presented This would confound the studyof temporal integration there might be no contributionfrom the other fragments presented and thus SOAwould be irrelevant To mitigate this effect weinvestigated how informative each part of each imagewas Each square j (j frac14 1 16 for each image) wasassigned a lsquolsquodiagnosticityrsquorsquo score DSj defined as

DSj frac14

Xt

cethtTHORN sethtTHORN pethtTHORNaethtTHORN

Xt

sethtTHORN pethtTHORNaethtTHORN

eth1THORN

Here t ranges over all trials for all subjects andacross all experiments in which square j was present Ifthe category reported in a given trial t was correct c(t)

is 1 otherwise c(t) is 0 The overall performance of thesubject who performed trial t is p(t) and that subjectrsquosoverall performance on trials with the same number offragments and the same asynchrony as trial t is given bya(t) Finally s(t) is the number of 116 squares of theimage present overall in trial t (that is three times thenumber of fragments in trial t) This score indicateshow reliably each particular 116 square led to correctclassification across all subjects We excluded trials inwhich any of the 116 squares shown had a diagnos-ticity score greater than 08 This excluded an averageof 337 6 03 of trials from Experiment 1 and172 6 03 and 231 6 03 from Experiments2A and 2B respectively Including these trials does notchange the conclusions reported here The uncertaintyreduction modelrsquos window of integration for individualsubjects (see below) when including all trials changedby only 13 6 3 Including trials containingdiagnostic squares does however raise performancedisproportionally at longer asynchronies due to thegreater number of trials in which a single fragment issufficient for categorization In decreasing the dynamicrange of performance the variances of the model fitsare increased

Probability summation

Assuming independent responses to each fragmentone may predict performance in trials with multiplefragments from the performance observed in single-fragment trials y1 We assume that the subject actuallyknew the correct image category in some fraction i ofthe trials and guessed at chance in the rest so that y1frac14i thorn (1 ndash i)4 (y1 frac14 14 if i frac14 0 and y1 frac14 1 if i frac14 1) Thechance that the subject fails to discern the categoryfrom a trial with n fragments is (1 i)n and the finalperformance predicted by probability summation isthen

1 eth1 iTHORNn thorn eth1 iTHORNn

4frac14 1 3eth1 iTHORNn

4 eth2THORN

Note that even for small values of i this expressioncan be appreciably higher than chance (025) Forexample a single fragment performance of y1frac14 030 isobtained from i frac14 00667 and this leads to an overallperformance of 043 when nfrac14 4 fragments We cannotassume independence however because multiple frag-ments from a single image contain correlated infor-mation and so this independent probability summationprediction overestimates expected performance Ob-serving the opposite pattern ie exceeding theperformance predicted by independent probabilitysummation is thus a particularly strong indicator ofsynergistic spatiotemporal integration of informationfrom multiple fragments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 5

Results

We evaluated the sensitivity to temporal imprecisionduring visual object recognition by quantifying cate-gorization of asynchronously presented object frag-ments (Methods Figure 1)

Experiment 1

Overall performance in the task was above chance(chancefrac14 025) for all subjects numbers of fragmentsand SOA values Averaged across all SOA values andnumbers of fragments performance ranged from 037to 077 (060 6 003 mean 6 SEM) Performanceincreased as the number of fragments shown increased(Figure 2) All 16 subjectsrsquo performance was better forthe four-fragment condition than the three-fragmentcondition significantly so for 13 subjects (chi-squared

test p 005 compare triangles versus squares inFigure 3) Catch trial performance was essentially atceiling (097 6 001) Single-fragment performance wasslightly above chance levels (033 6 001) Overallperformance varied slightly but significantly by cate-gory (chi-squared test p 108 animals 059 people060 plants 054 vehicles 057) In sum all subjectswere able to correctly perform the task in spite of therapid presentations and fragmented objects

If subjects performed the task by independentlyevaluating each fragment the performance in single-fragment trials would be predictive of overall perfor-mance in trials with multiple fragments via independentprobability summation (Methods) Different fragmentsfrom a given image contain redundant informationwhich would bias an independent probability summa-tion prediction towards higher values Yet performanceat all SOAs was higher than that predicted byprobability summation dropping at the longest SOA tovalues consistent with probability summation (at SOA

Figure 3 Individual subjectsrsquo performance in Experiment 1 Error bars indicate Clopper-Pearson 95 confidence intervals The x axis

indicates SOA and the y axis indicates the average fraction correct performance for the three-fragment condition (triangles) four-

fragment condition (squares) and single fragment condition () Chance performance is 025 and the performance in catch trials was

097 6 001 The dashed lines show the fits from the uncertainty reduction model (Text) The mean value of the integration window

indicated by the uncertainty reduction mode is indicated by a vertical dotted line

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 6

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 2: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

each stage before the first bits of information are passedon to the next stage

Despite the rapid progression of this initial wave ofinformation response durations can extend from a fewtens of ms in V1 (Ringach Hawken amp Shapley 2003)to approximately 70 ms in MT (Bair amp Movshon 2004)to 100 ms or more in inferior temporal cortex (DeBaene Premereur amp Vogels 2007) Spatiotemporalintegration is critical for the definition of receptivefields in early visual areas as well as for motiondetection signals along the dorsal stream The contri-butions of spatiotemporal integration along the ventralstream in regions involved in high-level shape recogni-tion such as inferior temporal cortex are less clearlyunderstood The accumulation of visual shape infor-mation over time may improve recognition perfor-mance beyond what could be achieved by independentprocessing of sequential lsquolsquosnapshotsrsquorsquo

Psychophysics studies have shown that subjects canhold a representation of basic visual information suchas letters (Sperling 1960) or flashing arrays of squaresor dots (Brockmole Wang amp Irwin 2002 Hogben ampDi Lollo 1974) in lsquolsquoiconic memoryrsquorsquo for a short timeafter presentation The recognition of a whole objectcan be facilitated when presented in close temporalcontiguity and spatial register with one of its parts(Sanocki 2001) demonstrating that varying the dy-namics with which object information is presented caninfluence perception Prior studies of temporal inte-gration in terms of interference between parts ofdifferent faces (Anaki Boyd amp Moscovitch 2007Cheung Richler Phillips amp Gauthier 2011 Singer ampSheinberg 2006) emotion recognition (Schyns Petroamp Smith 2007) and extraction of low-level shapefeatures (Aspell Wattam-Bell amp Braddick 2006Clifford Holcombe amp Pearson 2004) have foundrelevant time scales in the range of several tens to a fewhundred milliseconds Longer integration windows upto several seconds have been reported in studies ofmotion discrimination (Burr amp Santoro 2001) andbiological motion discrimination (Neri Morrone ampBurr 1998)

There is thus a range of temporal integrationwindows that could play a role in recognition ofcomplex objects Observations of rapid processingsuggest that even small disruptions to simultaneitymight carry large consequences while observations oflong temporal receptive fields and integration orpersistence of low-level visual stimuli suggest thatobject recognition might be robust to temporalasynchrony Here we used psychophysics experimentsin which images were broken into asynchronouslypresented parts (Figure 1) to evaluate the time courseover which asynchronous information can be integrat-ed together and lead to the recognition of complexobjects

Methods

All procedures were carried out with subjectsrsquoinformed consent and in accordance with protocolsapproved by the Boston Childrenrsquos Hospital Institu-tional Review Board

Apparatus

Stimuli were presented on a Sony Multiscan G52021-in cathode-ray tube monitor (Sony CorporationTokyo Japan) running at a 170 Hz refresh rate and872 middot 654 pixel resolution The experiment was run onan Apple MacBook Pro computer (Apple ComputerCupertino CA) running MATLAB software (Math-Works Natick MA) with the Psychophysics Toolboxand Eyelink Toolbox extensions (Brainard 1997Cornelissen Peters amp Palmer 2002 Pelli 1997) For37 of the 50 subjects across all experiments weobtained reliable measurements of subjectsrsquo eye move-ments using the EyeLink 1000 system (using infraredcorneal reflection and pupil location with nine-pointcalibration) running in remote mode (SR ResearchMississauga Ontario) Subjects were seated in a dimlylit windowless room approximately 53 cm from the eyetracking camera and approximately 71 cm from thedisplay monitor Subjects performed a four-alternativeforced choice task (described below) and indicated theirresponses using a Logitech Cordless RumblePad 2game controller (Logitech Fremont CA) Trials inwhich the requested times for stimulus onset weremissed by more than one screen refresh (59 ms) werediscarded In separate tests we used a photodiode toindependently verify the accuracy of the imagepresentation times

Stimulus presentation Experiment 1

Sixteen subjects participated in Experiment 1Images were drawn randomly with replacement from alibrary of 428 grayscale silhouettes of animals peopleplants and vehicles with each category equallyrepresented These images were obtained from severalfreely accessible sources on the Internet resized tooccupy most of a 256 middot 256 pixel square (whichsometimes involved clipping of object edges) super-imposed on a noise background and adjusted usingPhotoshop (Adobe San Jose CA) to have flat intensityhistograms The power spectrum of each image wasthen calculated using a Fast Fourier Transform (FFT)along with the average power spectrum across allimages The final set of images was generated by foreach image taking the inverse FFT of the population

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 2

average power spectrum with that imagersquos respectiveunique phase spectrum This resulted in a set of imageswith consistent power at each spatial frequency andbackgrounds filled with cloud-like noise Exampleimages are shown in Figure 1B (the line drawings onthe second row were only used in Experiment 2) all theimages used in these experiments can be obtained fromhttpklabtchharvardeduresourcessinger_asynchronyhtml

Each of 16 subjects (13 female age 19ndash35 mean age25) completed 50 blocks of 46 trials each The sequenceof events in each trial is illustrated in Figure 1A Theimage selected for each trial was drawn with equalprobability from one of the four categories Each trialbegan with a 14-pixel (058) fixation cross presented for

500 ms In those subjects for which eye tracking wasavailable (14 of 16 subjects) we required 500 ms offixation within 38 of the center of the cross Followingfixation a 256 middot 256 pixel (938 middot 938) square offlickering (170 Hz) noise appeared in the center of thescreen This noise was generated by randomizing thephase of the mean power spectrum of the library ofimages The purpose of the noise was to eliminate thepossible effects of apparent motion between twosuccessive fragments (eg Cavanagh Holcombe ampChou 2008) We considered a 4 middot 4 square tessellationof each image edge alpha masks for these squares wereblurred with a Gaussian blur of radius 4 pixels usingPhotoshop We defined a lsquolsquofragmentrsquorsquo as 3 out of the 16squares (not necessarily spatially contiguous) randomly

Figure 1 Schematic of the organization of the experiments A After 500 ms fixation between one and four fragments each consisting

of 316 of a source image (upper left) were briefly shown embedded in a stream of flickering 170 Hz phase-scrambled visual noise

Fragments lasted 118 ms and were presented with a stimulus onset asynchrony (SOA constant within a given trial) that took 1 of 10

possible values from 0 ms (synchronous presentation) to 2941 ms (Methods) The source image the number of fragments and the

SOA value were randomly chosen in each trial Noise continued for 500 ms after the onset of the final fragment after which subjects

were presented with a four-alternative forced choice categorization task B Examples of other images used in the experiments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 3

chosen in each trial Depending on the configuration ofthe three squares each fragment spanned a minimumof 478 and a maximum of 938 Spatial integration of allof the information in four fragments required inte-grating over 938 Each fragment was present for 118ms (two monitor refreshes) Trials containing onethree or four fragments were randomly interleavedWhen more than one fragment was used in a trial theycontained nonoverlapping parts of the source imageFragments were sequentially presented with the firstfragment appearing 500 ms after the noise beganSuccessive fragments were presented with a stimulusonset asynchrony (SOA) selected at random from thefollowing list 0 ms (synchronous presentation) 118 ms(one fragmentrsquos offset coincided with the subsequentfragmentrsquos onset) 177 ms 294 ms 471 ms 706 ms100 ms 1471 ms 2059 ms 2941 ms The SOA valuewas constant within a trial and the different SOAs wererandomly interleaved across trials (Movie 1) We alsoincluded randomly interleaved catch trials in which 1516 of an image was shown for 588 ms The catch trialsserved to ensure that subjects were engaged in the taskand understood the instructions subjects who scoredbelow 80 on catch trials or below 13 overall wereexcluded (criteria established a priori all subjectspassed in Experiment 1)

The flickering noise continued while the fragmentswere displayed and for 500 ms after the last fragmentappeared at which point it was replaced by a blankgray screen containing the text lsquolsquoPlease indicate whichkind of picture was hidden in the cloudsrsquorsquo and fouroptions were shown corresponding to the four game-pad buttons used for lsquolsquoAnimalsrsquorsquo lsquolsquoPeoplersquorsquo lsquolsquoVehiclesrsquorsquoand lsquolsquoPlantsrsquorsquo This choice screen remained until thesubject pressed a button Performance is reported asmean 6 SEM throughout the manuscript A high-contrast mask (one of the noise images selected atrandom with its contrast maximized) then appearedfor one monitor refresh followed by the fixation crossfor the next trial Trials in which the subject lookedaway from the stimulus or blinked were excluded fromanalyses No feedback was given except betweenblocks when the overall performance for that blockwas displayed

Stimulus presentation Experiment 2 variant A

Experiment 2A addressed two questions First canthe uncertainty reduction model accurately describe thesimplest possible asynchronous trials containing onlytwo fragments Second while the performance curvesfrom Experiment 1 appeared to be near asymptote by2941 ms SOA do the results hold at very long SOAImages were drawn from a library of 364 line drawingsand silhouettes which partially overlapped with that

used in Experiment 1 Aside from the different imageset and the additional conditions the experimentparameters were identical to those in Experiment 1Twenty-one new subjects (13 female age 19ndash51 meanage 27) participated in this experiment We excludedfrom analyses one subject due to low performance oncatch trials ( 08 correct) and three subjects due to lowoverall performance ( 13 correct) this exclusion didnot significantly change the overall results In thisvariant we had eye tracking available for 10 of the 17subjects whose performance exceeded threshold (theeye tracker was unavailable for five subjects andcalibration was unreliable for two subjects) Catchtrials were included in six subjects There were 36blocks of 36 or 37 trials depending on whether catchtrials were included

Stimulus presentation Experiment 2 variant B

In Experiment 1 longer SOAs and more fragmentsboth led to longer trials These differences might haveinfluenced the observed results (eg forgetting infor-mation over the course of a second or more may lead tolower performance in longer trials) To control for thispossibility we fixed the total trial duration in a secondvariant experiment Moreover during the approxi-mately 2-hr duration of each run of Experiment 1 each116 of each image was presented an average of 34times To ensure that this repetition did not lead to rotememorization or long-term learning effects this secondvariant also included a larger image library so that

Figure 2 Mean performance across all 16 subjects in

Experiment 1 in trials with zero SOA Error bars indicate

standard errors of the mean (SEM) All presentations lasted for

118 ms except the catch trials which lasted 588 ms The

dashed line indicates chance performance

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 4

each 116 of each image was presented to each subjectan average of 05 times There were 36 blocks of 34trials SOA values were drawn from the same list as inExperiment 1 Two-fragment trials were also presentedbut not included in the reported analyses to make theresults more directly comparable to those in Experi-ment 1 Adding the data from the two-fragment trialsdid not change the conclusions One of the two keydifferences in this variant is that the stream of flickeringnoise in each trial lasted for 18824 ms (for all SOAvalues and number of fragments)mdashlong enough toaccommodate a trial with four fragments at the longestasynchrony used (2941 ms) followed by 500 ms ofnoise Trials with shorter asynchronies or fewerfragments concluded with a longer period of uninfor-mative noise In this way the time interval between trialonset and behavioral response was identical in all trialsThe second key difference in this variant is that imageswere drawn at random from a library of 1200 linedrawings and silhouettes a superset of those used inExperiments 1 and 2A This means that each 116 ofeach image had an approximately 50 chance of beingpresented to a given subject Eighteen new subjects (13female age 18ndash35 mean age 24) participated inExperiment 2B one subject was excluded from analysesdue to low performance on catch trials ( 08 correct)We had reliable eye tracking in 13 of the 17 subjectswhose performance exceeded threshold

Data analyses

Diagnostic image parts

Even though we took several precautions to makeimages homogeneous in terms of basic low-levelproperties (see above) some images were easier tocategorize than others and parts of some images weremore informative than others If part of an image isreliably diagnostic subjects may be able to perform thetask based only on that part regardless of any otherinformation presented This would confound the studyof temporal integration there might be no contributionfrom the other fragments presented and thus SOAwould be irrelevant To mitigate this effect weinvestigated how informative each part of each imagewas Each square j (j frac14 1 16 for each image) wasassigned a lsquolsquodiagnosticityrsquorsquo score DSj defined as

DSj frac14

Xt

cethtTHORN sethtTHORN pethtTHORNaethtTHORN

Xt

sethtTHORN pethtTHORNaethtTHORN

eth1THORN

Here t ranges over all trials for all subjects andacross all experiments in which square j was present Ifthe category reported in a given trial t was correct c(t)

is 1 otherwise c(t) is 0 The overall performance of thesubject who performed trial t is p(t) and that subjectrsquosoverall performance on trials with the same number offragments and the same asynchrony as trial t is given bya(t) Finally s(t) is the number of 116 squares of theimage present overall in trial t (that is three times thenumber of fragments in trial t) This score indicateshow reliably each particular 116 square led to correctclassification across all subjects We excluded trials inwhich any of the 116 squares shown had a diagnos-ticity score greater than 08 This excluded an averageof 337 6 03 of trials from Experiment 1 and172 6 03 and 231 6 03 from Experiments2A and 2B respectively Including these trials does notchange the conclusions reported here The uncertaintyreduction modelrsquos window of integration for individualsubjects (see below) when including all trials changedby only 13 6 3 Including trials containingdiagnostic squares does however raise performancedisproportionally at longer asynchronies due to thegreater number of trials in which a single fragment issufficient for categorization In decreasing the dynamicrange of performance the variances of the model fitsare increased

Probability summation

Assuming independent responses to each fragmentone may predict performance in trials with multiplefragments from the performance observed in single-fragment trials y1 We assume that the subject actuallyknew the correct image category in some fraction i ofthe trials and guessed at chance in the rest so that y1frac14i thorn (1 ndash i)4 (y1 frac14 14 if i frac14 0 and y1 frac14 1 if i frac14 1) Thechance that the subject fails to discern the categoryfrom a trial with n fragments is (1 i)n and the finalperformance predicted by probability summation isthen

1 eth1 iTHORNn thorn eth1 iTHORNn

4frac14 1 3eth1 iTHORNn

4 eth2THORN

Note that even for small values of i this expressioncan be appreciably higher than chance (025) Forexample a single fragment performance of y1frac14 030 isobtained from i frac14 00667 and this leads to an overallperformance of 043 when nfrac14 4 fragments We cannotassume independence however because multiple frag-ments from a single image contain correlated infor-mation and so this independent probability summationprediction overestimates expected performance Ob-serving the opposite pattern ie exceeding theperformance predicted by independent probabilitysummation is thus a particularly strong indicator ofsynergistic spatiotemporal integration of informationfrom multiple fragments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 5

Results

We evaluated the sensitivity to temporal imprecisionduring visual object recognition by quantifying cate-gorization of asynchronously presented object frag-ments (Methods Figure 1)

Experiment 1

Overall performance in the task was above chance(chancefrac14 025) for all subjects numbers of fragmentsand SOA values Averaged across all SOA values andnumbers of fragments performance ranged from 037to 077 (060 6 003 mean 6 SEM) Performanceincreased as the number of fragments shown increased(Figure 2) All 16 subjectsrsquo performance was better forthe four-fragment condition than the three-fragmentcondition significantly so for 13 subjects (chi-squared

test p 005 compare triangles versus squares inFigure 3) Catch trial performance was essentially atceiling (097 6 001) Single-fragment performance wasslightly above chance levels (033 6 001) Overallperformance varied slightly but significantly by cate-gory (chi-squared test p 108 animals 059 people060 plants 054 vehicles 057) In sum all subjectswere able to correctly perform the task in spite of therapid presentations and fragmented objects

If subjects performed the task by independentlyevaluating each fragment the performance in single-fragment trials would be predictive of overall perfor-mance in trials with multiple fragments via independentprobability summation (Methods) Different fragmentsfrom a given image contain redundant informationwhich would bias an independent probability summa-tion prediction towards higher values Yet performanceat all SOAs was higher than that predicted byprobability summation dropping at the longest SOA tovalues consistent with probability summation (at SOA

Figure 3 Individual subjectsrsquo performance in Experiment 1 Error bars indicate Clopper-Pearson 95 confidence intervals The x axis

indicates SOA and the y axis indicates the average fraction correct performance for the three-fragment condition (triangles) four-

fragment condition (squares) and single fragment condition () Chance performance is 025 and the performance in catch trials was

097 6 001 The dashed lines show the fits from the uncertainty reduction model (Text) The mean value of the integration window

indicated by the uncertainty reduction mode is indicated by a vertical dotted line

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 6

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 3: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

average power spectrum with that imagersquos respectiveunique phase spectrum This resulted in a set of imageswith consistent power at each spatial frequency andbackgrounds filled with cloud-like noise Exampleimages are shown in Figure 1B (the line drawings onthe second row were only used in Experiment 2) all theimages used in these experiments can be obtained fromhttpklabtchharvardeduresourcessinger_asynchronyhtml

Each of 16 subjects (13 female age 19ndash35 mean age25) completed 50 blocks of 46 trials each The sequenceof events in each trial is illustrated in Figure 1A Theimage selected for each trial was drawn with equalprobability from one of the four categories Each trialbegan with a 14-pixel (058) fixation cross presented for

500 ms In those subjects for which eye tracking wasavailable (14 of 16 subjects) we required 500 ms offixation within 38 of the center of the cross Followingfixation a 256 middot 256 pixel (938 middot 938) square offlickering (170 Hz) noise appeared in the center of thescreen This noise was generated by randomizing thephase of the mean power spectrum of the library ofimages The purpose of the noise was to eliminate thepossible effects of apparent motion between twosuccessive fragments (eg Cavanagh Holcombe ampChou 2008) We considered a 4 middot 4 square tessellationof each image edge alpha masks for these squares wereblurred with a Gaussian blur of radius 4 pixels usingPhotoshop We defined a lsquolsquofragmentrsquorsquo as 3 out of the 16squares (not necessarily spatially contiguous) randomly

Figure 1 Schematic of the organization of the experiments A After 500 ms fixation between one and four fragments each consisting

of 316 of a source image (upper left) were briefly shown embedded in a stream of flickering 170 Hz phase-scrambled visual noise

Fragments lasted 118 ms and were presented with a stimulus onset asynchrony (SOA constant within a given trial) that took 1 of 10

possible values from 0 ms (synchronous presentation) to 2941 ms (Methods) The source image the number of fragments and the

SOA value were randomly chosen in each trial Noise continued for 500 ms after the onset of the final fragment after which subjects

were presented with a four-alternative forced choice categorization task B Examples of other images used in the experiments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 3

chosen in each trial Depending on the configuration ofthe three squares each fragment spanned a minimumof 478 and a maximum of 938 Spatial integration of allof the information in four fragments required inte-grating over 938 Each fragment was present for 118ms (two monitor refreshes) Trials containing onethree or four fragments were randomly interleavedWhen more than one fragment was used in a trial theycontained nonoverlapping parts of the source imageFragments were sequentially presented with the firstfragment appearing 500 ms after the noise beganSuccessive fragments were presented with a stimulusonset asynchrony (SOA) selected at random from thefollowing list 0 ms (synchronous presentation) 118 ms(one fragmentrsquos offset coincided with the subsequentfragmentrsquos onset) 177 ms 294 ms 471 ms 706 ms100 ms 1471 ms 2059 ms 2941 ms The SOA valuewas constant within a trial and the different SOAs wererandomly interleaved across trials (Movie 1) We alsoincluded randomly interleaved catch trials in which 1516 of an image was shown for 588 ms The catch trialsserved to ensure that subjects were engaged in the taskand understood the instructions subjects who scoredbelow 80 on catch trials or below 13 overall wereexcluded (criteria established a priori all subjectspassed in Experiment 1)

The flickering noise continued while the fragmentswere displayed and for 500 ms after the last fragmentappeared at which point it was replaced by a blankgray screen containing the text lsquolsquoPlease indicate whichkind of picture was hidden in the cloudsrsquorsquo and fouroptions were shown corresponding to the four game-pad buttons used for lsquolsquoAnimalsrsquorsquo lsquolsquoPeoplersquorsquo lsquolsquoVehiclesrsquorsquoand lsquolsquoPlantsrsquorsquo This choice screen remained until thesubject pressed a button Performance is reported asmean 6 SEM throughout the manuscript A high-contrast mask (one of the noise images selected atrandom with its contrast maximized) then appearedfor one monitor refresh followed by the fixation crossfor the next trial Trials in which the subject lookedaway from the stimulus or blinked were excluded fromanalyses No feedback was given except betweenblocks when the overall performance for that blockwas displayed

Stimulus presentation Experiment 2 variant A

Experiment 2A addressed two questions First canthe uncertainty reduction model accurately describe thesimplest possible asynchronous trials containing onlytwo fragments Second while the performance curvesfrom Experiment 1 appeared to be near asymptote by2941 ms SOA do the results hold at very long SOAImages were drawn from a library of 364 line drawingsand silhouettes which partially overlapped with that

used in Experiment 1 Aside from the different imageset and the additional conditions the experimentparameters were identical to those in Experiment 1Twenty-one new subjects (13 female age 19ndash51 meanage 27) participated in this experiment We excludedfrom analyses one subject due to low performance oncatch trials ( 08 correct) and three subjects due to lowoverall performance ( 13 correct) this exclusion didnot significantly change the overall results In thisvariant we had eye tracking available for 10 of the 17subjects whose performance exceeded threshold (theeye tracker was unavailable for five subjects andcalibration was unreliable for two subjects) Catchtrials were included in six subjects There were 36blocks of 36 or 37 trials depending on whether catchtrials were included

Stimulus presentation Experiment 2 variant B

In Experiment 1 longer SOAs and more fragmentsboth led to longer trials These differences might haveinfluenced the observed results (eg forgetting infor-mation over the course of a second or more may lead tolower performance in longer trials) To control for thispossibility we fixed the total trial duration in a secondvariant experiment Moreover during the approxi-mately 2-hr duration of each run of Experiment 1 each116 of each image was presented an average of 34times To ensure that this repetition did not lead to rotememorization or long-term learning effects this secondvariant also included a larger image library so that

Figure 2 Mean performance across all 16 subjects in

Experiment 1 in trials with zero SOA Error bars indicate

standard errors of the mean (SEM) All presentations lasted for

118 ms except the catch trials which lasted 588 ms The

dashed line indicates chance performance

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 4

each 116 of each image was presented to each subjectan average of 05 times There were 36 blocks of 34trials SOA values were drawn from the same list as inExperiment 1 Two-fragment trials were also presentedbut not included in the reported analyses to make theresults more directly comparable to those in Experi-ment 1 Adding the data from the two-fragment trialsdid not change the conclusions One of the two keydifferences in this variant is that the stream of flickeringnoise in each trial lasted for 18824 ms (for all SOAvalues and number of fragments)mdashlong enough toaccommodate a trial with four fragments at the longestasynchrony used (2941 ms) followed by 500 ms ofnoise Trials with shorter asynchronies or fewerfragments concluded with a longer period of uninfor-mative noise In this way the time interval between trialonset and behavioral response was identical in all trialsThe second key difference in this variant is that imageswere drawn at random from a library of 1200 linedrawings and silhouettes a superset of those used inExperiments 1 and 2A This means that each 116 ofeach image had an approximately 50 chance of beingpresented to a given subject Eighteen new subjects (13female age 18ndash35 mean age 24) participated inExperiment 2B one subject was excluded from analysesdue to low performance on catch trials ( 08 correct)We had reliable eye tracking in 13 of the 17 subjectswhose performance exceeded threshold

Data analyses

Diagnostic image parts

Even though we took several precautions to makeimages homogeneous in terms of basic low-levelproperties (see above) some images were easier tocategorize than others and parts of some images weremore informative than others If part of an image isreliably diagnostic subjects may be able to perform thetask based only on that part regardless of any otherinformation presented This would confound the studyof temporal integration there might be no contributionfrom the other fragments presented and thus SOAwould be irrelevant To mitigate this effect weinvestigated how informative each part of each imagewas Each square j (j frac14 1 16 for each image) wasassigned a lsquolsquodiagnosticityrsquorsquo score DSj defined as

DSj frac14

Xt

cethtTHORN sethtTHORN pethtTHORNaethtTHORN

Xt

sethtTHORN pethtTHORNaethtTHORN

eth1THORN

Here t ranges over all trials for all subjects andacross all experiments in which square j was present Ifthe category reported in a given trial t was correct c(t)

is 1 otherwise c(t) is 0 The overall performance of thesubject who performed trial t is p(t) and that subjectrsquosoverall performance on trials with the same number offragments and the same asynchrony as trial t is given bya(t) Finally s(t) is the number of 116 squares of theimage present overall in trial t (that is three times thenumber of fragments in trial t) This score indicateshow reliably each particular 116 square led to correctclassification across all subjects We excluded trials inwhich any of the 116 squares shown had a diagnos-ticity score greater than 08 This excluded an averageof 337 6 03 of trials from Experiment 1 and172 6 03 and 231 6 03 from Experiments2A and 2B respectively Including these trials does notchange the conclusions reported here The uncertaintyreduction modelrsquos window of integration for individualsubjects (see below) when including all trials changedby only 13 6 3 Including trials containingdiagnostic squares does however raise performancedisproportionally at longer asynchronies due to thegreater number of trials in which a single fragment issufficient for categorization In decreasing the dynamicrange of performance the variances of the model fitsare increased

Probability summation

Assuming independent responses to each fragmentone may predict performance in trials with multiplefragments from the performance observed in single-fragment trials y1 We assume that the subject actuallyknew the correct image category in some fraction i ofthe trials and guessed at chance in the rest so that y1frac14i thorn (1 ndash i)4 (y1 frac14 14 if i frac14 0 and y1 frac14 1 if i frac14 1) Thechance that the subject fails to discern the categoryfrom a trial with n fragments is (1 i)n and the finalperformance predicted by probability summation isthen

1 eth1 iTHORNn thorn eth1 iTHORNn

4frac14 1 3eth1 iTHORNn

4 eth2THORN

Note that even for small values of i this expressioncan be appreciably higher than chance (025) Forexample a single fragment performance of y1frac14 030 isobtained from i frac14 00667 and this leads to an overallperformance of 043 when nfrac14 4 fragments We cannotassume independence however because multiple frag-ments from a single image contain correlated infor-mation and so this independent probability summationprediction overestimates expected performance Ob-serving the opposite pattern ie exceeding theperformance predicted by independent probabilitysummation is thus a particularly strong indicator ofsynergistic spatiotemporal integration of informationfrom multiple fragments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 5

Results

We evaluated the sensitivity to temporal imprecisionduring visual object recognition by quantifying cate-gorization of asynchronously presented object frag-ments (Methods Figure 1)

Experiment 1

Overall performance in the task was above chance(chancefrac14 025) for all subjects numbers of fragmentsand SOA values Averaged across all SOA values andnumbers of fragments performance ranged from 037to 077 (060 6 003 mean 6 SEM) Performanceincreased as the number of fragments shown increased(Figure 2) All 16 subjectsrsquo performance was better forthe four-fragment condition than the three-fragmentcondition significantly so for 13 subjects (chi-squared

test p 005 compare triangles versus squares inFigure 3) Catch trial performance was essentially atceiling (097 6 001) Single-fragment performance wasslightly above chance levels (033 6 001) Overallperformance varied slightly but significantly by cate-gory (chi-squared test p 108 animals 059 people060 plants 054 vehicles 057) In sum all subjectswere able to correctly perform the task in spite of therapid presentations and fragmented objects

If subjects performed the task by independentlyevaluating each fragment the performance in single-fragment trials would be predictive of overall perfor-mance in trials with multiple fragments via independentprobability summation (Methods) Different fragmentsfrom a given image contain redundant informationwhich would bias an independent probability summa-tion prediction towards higher values Yet performanceat all SOAs was higher than that predicted byprobability summation dropping at the longest SOA tovalues consistent with probability summation (at SOA

Figure 3 Individual subjectsrsquo performance in Experiment 1 Error bars indicate Clopper-Pearson 95 confidence intervals The x axis

indicates SOA and the y axis indicates the average fraction correct performance for the three-fragment condition (triangles) four-

fragment condition (squares) and single fragment condition () Chance performance is 025 and the performance in catch trials was

097 6 001 The dashed lines show the fits from the uncertainty reduction model (Text) The mean value of the integration window

indicated by the uncertainty reduction mode is indicated by a vertical dotted line

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 6

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 4: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

chosen in each trial Depending on the configuration ofthe three squares each fragment spanned a minimumof 478 and a maximum of 938 Spatial integration of allof the information in four fragments required inte-grating over 938 Each fragment was present for 118ms (two monitor refreshes) Trials containing onethree or four fragments were randomly interleavedWhen more than one fragment was used in a trial theycontained nonoverlapping parts of the source imageFragments were sequentially presented with the firstfragment appearing 500 ms after the noise beganSuccessive fragments were presented with a stimulusonset asynchrony (SOA) selected at random from thefollowing list 0 ms (synchronous presentation) 118 ms(one fragmentrsquos offset coincided with the subsequentfragmentrsquos onset) 177 ms 294 ms 471 ms 706 ms100 ms 1471 ms 2059 ms 2941 ms The SOA valuewas constant within a trial and the different SOAs wererandomly interleaved across trials (Movie 1) We alsoincluded randomly interleaved catch trials in which 1516 of an image was shown for 588 ms The catch trialsserved to ensure that subjects were engaged in the taskand understood the instructions subjects who scoredbelow 80 on catch trials or below 13 overall wereexcluded (criteria established a priori all subjectspassed in Experiment 1)

The flickering noise continued while the fragmentswere displayed and for 500 ms after the last fragmentappeared at which point it was replaced by a blankgray screen containing the text lsquolsquoPlease indicate whichkind of picture was hidden in the cloudsrsquorsquo and fouroptions were shown corresponding to the four game-pad buttons used for lsquolsquoAnimalsrsquorsquo lsquolsquoPeoplersquorsquo lsquolsquoVehiclesrsquorsquoand lsquolsquoPlantsrsquorsquo This choice screen remained until thesubject pressed a button Performance is reported asmean 6 SEM throughout the manuscript A high-contrast mask (one of the noise images selected atrandom with its contrast maximized) then appearedfor one monitor refresh followed by the fixation crossfor the next trial Trials in which the subject lookedaway from the stimulus or blinked were excluded fromanalyses No feedback was given except betweenblocks when the overall performance for that blockwas displayed

Stimulus presentation Experiment 2 variant A

Experiment 2A addressed two questions First canthe uncertainty reduction model accurately describe thesimplest possible asynchronous trials containing onlytwo fragments Second while the performance curvesfrom Experiment 1 appeared to be near asymptote by2941 ms SOA do the results hold at very long SOAImages were drawn from a library of 364 line drawingsand silhouettes which partially overlapped with that

used in Experiment 1 Aside from the different imageset and the additional conditions the experimentparameters were identical to those in Experiment 1Twenty-one new subjects (13 female age 19ndash51 meanage 27) participated in this experiment We excludedfrom analyses one subject due to low performance oncatch trials ( 08 correct) and three subjects due to lowoverall performance ( 13 correct) this exclusion didnot significantly change the overall results In thisvariant we had eye tracking available for 10 of the 17subjects whose performance exceeded threshold (theeye tracker was unavailable for five subjects andcalibration was unreliable for two subjects) Catchtrials were included in six subjects There were 36blocks of 36 or 37 trials depending on whether catchtrials were included

Stimulus presentation Experiment 2 variant B

In Experiment 1 longer SOAs and more fragmentsboth led to longer trials These differences might haveinfluenced the observed results (eg forgetting infor-mation over the course of a second or more may lead tolower performance in longer trials) To control for thispossibility we fixed the total trial duration in a secondvariant experiment Moreover during the approxi-mately 2-hr duration of each run of Experiment 1 each116 of each image was presented an average of 34times To ensure that this repetition did not lead to rotememorization or long-term learning effects this secondvariant also included a larger image library so that

Figure 2 Mean performance across all 16 subjects in

Experiment 1 in trials with zero SOA Error bars indicate

standard errors of the mean (SEM) All presentations lasted for

118 ms except the catch trials which lasted 588 ms The

dashed line indicates chance performance

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 4

each 116 of each image was presented to each subjectan average of 05 times There were 36 blocks of 34trials SOA values were drawn from the same list as inExperiment 1 Two-fragment trials were also presentedbut not included in the reported analyses to make theresults more directly comparable to those in Experi-ment 1 Adding the data from the two-fragment trialsdid not change the conclusions One of the two keydifferences in this variant is that the stream of flickeringnoise in each trial lasted for 18824 ms (for all SOAvalues and number of fragments)mdashlong enough toaccommodate a trial with four fragments at the longestasynchrony used (2941 ms) followed by 500 ms ofnoise Trials with shorter asynchronies or fewerfragments concluded with a longer period of uninfor-mative noise In this way the time interval between trialonset and behavioral response was identical in all trialsThe second key difference in this variant is that imageswere drawn at random from a library of 1200 linedrawings and silhouettes a superset of those used inExperiments 1 and 2A This means that each 116 ofeach image had an approximately 50 chance of beingpresented to a given subject Eighteen new subjects (13female age 18ndash35 mean age 24) participated inExperiment 2B one subject was excluded from analysesdue to low performance on catch trials ( 08 correct)We had reliable eye tracking in 13 of the 17 subjectswhose performance exceeded threshold

Data analyses

Diagnostic image parts

Even though we took several precautions to makeimages homogeneous in terms of basic low-levelproperties (see above) some images were easier tocategorize than others and parts of some images weremore informative than others If part of an image isreliably diagnostic subjects may be able to perform thetask based only on that part regardless of any otherinformation presented This would confound the studyof temporal integration there might be no contributionfrom the other fragments presented and thus SOAwould be irrelevant To mitigate this effect weinvestigated how informative each part of each imagewas Each square j (j frac14 1 16 for each image) wasassigned a lsquolsquodiagnosticityrsquorsquo score DSj defined as

DSj frac14

Xt

cethtTHORN sethtTHORN pethtTHORNaethtTHORN

Xt

sethtTHORN pethtTHORNaethtTHORN

eth1THORN

Here t ranges over all trials for all subjects andacross all experiments in which square j was present Ifthe category reported in a given trial t was correct c(t)

is 1 otherwise c(t) is 0 The overall performance of thesubject who performed trial t is p(t) and that subjectrsquosoverall performance on trials with the same number offragments and the same asynchrony as trial t is given bya(t) Finally s(t) is the number of 116 squares of theimage present overall in trial t (that is three times thenumber of fragments in trial t) This score indicateshow reliably each particular 116 square led to correctclassification across all subjects We excluded trials inwhich any of the 116 squares shown had a diagnos-ticity score greater than 08 This excluded an averageof 337 6 03 of trials from Experiment 1 and172 6 03 and 231 6 03 from Experiments2A and 2B respectively Including these trials does notchange the conclusions reported here The uncertaintyreduction modelrsquos window of integration for individualsubjects (see below) when including all trials changedby only 13 6 3 Including trials containingdiagnostic squares does however raise performancedisproportionally at longer asynchronies due to thegreater number of trials in which a single fragment issufficient for categorization In decreasing the dynamicrange of performance the variances of the model fitsare increased

Probability summation

Assuming independent responses to each fragmentone may predict performance in trials with multiplefragments from the performance observed in single-fragment trials y1 We assume that the subject actuallyknew the correct image category in some fraction i ofthe trials and guessed at chance in the rest so that y1frac14i thorn (1 ndash i)4 (y1 frac14 14 if i frac14 0 and y1 frac14 1 if i frac14 1) Thechance that the subject fails to discern the categoryfrom a trial with n fragments is (1 i)n and the finalperformance predicted by probability summation isthen

1 eth1 iTHORNn thorn eth1 iTHORNn

4frac14 1 3eth1 iTHORNn

4 eth2THORN

Note that even for small values of i this expressioncan be appreciably higher than chance (025) Forexample a single fragment performance of y1frac14 030 isobtained from i frac14 00667 and this leads to an overallperformance of 043 when nfrac14 4 fragments We cannotassume independence however because multiple frag-ments from a single image contain correlated infor-mation and so this independent probability summationprediction overestimates expected performance Ob-serving the opposite pattern ie exceeding theperformance predicted by independent probabilitysummation is thus a particularly strong indicator ofsynergistic spatiotemporal integration of informationfrom multiple fragments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 5

Results

We evaluated the sensitivity to temporal imprecisionduring visual object recognition by quantifying cate-gorization of asynchronously presented object frag-ments (Methods Figure 1)

Experiment 1

Overall performance in the task was above chance(chancefrac14 025) for all subjects numbers of fragmentsand SOA values Averaged across all SOA values andnumbers of fragments performance ranged from 037to 077 (060 6 003 mean 6 SEM) Performanceincreased as the number of fragments shown increased(Figure 2) All 16 subjectsrsquo performance was better forthe four-fragment condition than the three-fragmentcondition significantly so for 13 subjects (chi-squared

test p 005 compare triangles versus squares inFigure 3) Catch trial performance was essentially atceiling (097 6 001) Single-fragment performance wasslightly above chance levels (033 6 001) Overallperformance varied slightly but significantly by cate-gory (chi-squared test p 108 animals 059 people060 plants 054 vehicles 057) In sum all subjectswere able to correctly perform the task in spite of therapid presentations and fragmented objects

If subjects performed the task by independentlyevaluating each fragment the performance in single-fragment trials would be predictive of overall perfor-mance in trials with multiple fragments via independentprobability summation (Methods) Different fragmentsfrom a given image contain redundant informationwhich would bias an independent probability summa-tion prediction towards higher values Yet performanceat all SOAs was higher than that predicted byprobability summation dropping at the longest SOA tovalues consistent with probability summation (at SOA

Figure 3 Individual subjectsrsquo performance in Experiment 1 Error bars indicate Clopper-Pearson 95 confidence intervals The x axis

indicates SOA and the y axis indicates the average fraction correct performance for the three-fragment condition (triangles) four-

fragment condition (squares) and single fragment condition () Chance performance is 025 and the performance in catch trials was

097 6 001 The dashed lines show the fits from the uncertainty reduction model (Text) The mean value of the integration window

indicated by the uncertainty reduction mode is indicated by a vertical dotted line

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 6

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 5: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

each 116 of each image was presented to each subjectan average of 05 times There were 36 blocks of 34trials SOA values were drawn from the same list as inExperiment 1 Two-fragment trials were also presentedbut not included in the reported analyses to make theresults more directly comparable to those in Experi-ment 1 Adding the data from the two-fragment trialsdid not change the conclusions One of the two keydifferences in this variant is that the stream of flickeringnoise in each trial lasted for 18824 ms (for all SOAvalues and number of fragments)mdashlong enough toaccommodate a trial with four fragments at the longestasynchrony used (2941 ms) followed by 500 ms ofnoise Trials with shorter asynchronies or fewerfragments concluded with a longer period of uninfor-mative noise In this way the time interval between trialonset and behavioral response was identical in all trialsThe second key difference in this variant is that imageswere drawn at random from a library of 1200 linedrawings and silhouettes a superset of those used inExperiments 1 and 2A This means that each 116 ofeach image had an approximately 50 chance of beingpresented to a given subject Eighteen new subjects (13female age 18ndash35 mean age 24) participated inExperiment 2B one subject was excluded from analysesdue to low performance on catch trials ( 08 correct)We had reliable eye tracking in 13 of the 17 subjectswhose performance exceeded threshold

Data analyses

Diagnostic image parts

Even though we took several precautions to makeimages homogeneous in terms of basic low-levelproperties (see above) some images were easier tocategorize than others and parts of some images weremore informative than others If part of an image isreliably diagnostic subjects may be able to perform thetask based only on that part regardless of any otherinformation presented This would confound the studyof temporal integration there might be no contributionfrom the other fragments presented and thus SOAwould be irrelevant To mitigate this effect weinvestigated how informative each part of each imagewas Each square j (j frac14 1 16 for each image) wasassigned a lsquolsquodiagnosticityrsquorsquo score DSj defined as

DSj frac14

Xt

cethtTHORN sethtTHORN pethtTHORNaethtTHORN

Xt

sethtTHORN pethtTHORNaethtTHORN

eth1THORN

Here t ranges over all trials for all subjects andacross all experiments in which square j was present Ifthe category reported in a given trial t was correct c(t)

is 1 otherwise c(t) is 0 The overall performance of thesubject who performed trial t is p(t) and that subjectrsquosoverall performance on trials with the same number offragments and the same asynchrony as trial t is given bya(t) Finally s(t) is the number of 116 squares of theimage present overall in trial t (that is three times thenumber of fragments in trial t) This score indicateshow reliably each particular 116 square led to correctclassification across all subjects We excluded trials inwhich any of the 116 squares shown had a diagnos-ticity score greater than 08 This excluded an averageof 337 6 03 of trials from Experiment 1 and172 6 03 and 231 6 03 from Experiments2A and 2B respectively Including these trials does notchange the conclusions reported here The uncertaintyreduction modelrsquos window of integration for individualsubjects (see below) when including all trials changedby only 13 6 3 Including trials containingdiagnostic squares does however raise performancedisproportionally at longer asynchronies due to thegreater number of trials in which a single fragment issufficient for categorization In decreasing the dynamicrange of performance the variances of the model fitsare increased

Probability summation

Assuming independent responses to each fragmentone may predict performance in trials with multiplefragments from the performance observed in single-fragment trials y1 We assume that the subject actuallyknew the correct image category in some fraction i ofthe trials and guessed at chance in the rest so that y1frac14i thorn (1 ndash i)4 (y1 frac14 14 if i frac14 0 and y1 frac14 1 if i frac14 1) Thechance that the subject fails to discern the categoryfrom a trial with n fragments is (1 i)n and the finalperformance predicted by probability summation isthen

1 eth1 iTHORNn thorn eth1 iTHORNn

4frac14 1 3eth1 iTHORNn

4 eth2THORN

Note that even for small values of i this expressioncan be appreciably higher than chance (025) Forexample a single fragment performance of y1frac14 030 isobtained from i frac14 00667 and this leads to an overallperformance of 043 when nfrac14 4 fragments We cannotassume independence however because multiple frag-ments from a single image contain correlated infor-mation and so this independent probability summationprediction overestimates expected performance Ob-serving the opposite pattern ie exceeding theperformance predicted by independent probabilitysummation is thus a particularly strong indicator ofsynergistic spatiotemporal integration of informationfrom multiple fragments

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 5

Results

We evaluated the sensitivity to temporal imprecisionduring visual object recognition by quantifying cate-gorization of asynchronously presented object frag-ments (Methods Figure 1)

Experiment 1

Overall performance in the task was above chance(chancefrac14 025) for all subjects numbers of fragmentsand SOA values Averaged across all SOA values andnumbers of fragments performance ranged from 037to 077 (060 6 003 mean 6 SEM) Performanceincreased as the number of fragments shown increased(Figure 2) All 16 subjectsrsquo performance was better forthe four-fragment condition than the three-fragmentcondition significantly so for 13 subjects (chi-squared

test p 005 compare triangles versus squares inFigure 3) Catch trial performance was essentially atceiling (097 6 001) Single-fragment performance wasslightly above chance levels (033 6 001) Overallperformance varied slightly but significantly by cate-gory (chi-squared test p 108 animals 059 people060 plants 054 vehicles 057) In sum all subjectswere able to correctly perform the task in spite of therapid presentations and fragmented objects

If subjects performed the task by independentlyevaluating each fragment the performance in single-fragment trials would be predictive of overall perfor-mance in trials with multiple fragments via independentprobability summation (Methods) Different fragmentsfrom a given image contain redundant informationwhich would bias an independent probability summa-tion prediction towards higher values Yet performanceat all SOAs was higher than that predicted byprobability summation dropping at the longest SOA tovalues consistent with probability summation (at SOA

Figure 3 Individual subjectsrsquo performance in Experiment 1 Error bars indicate Clopper-Pearson 95 confidence intervals The x axis

indicates SOA and the y axis indicates the average fraction correct performance for the three-fragment condition (triangles) four-

fragment condition (squares) and single fragment condition () Chance performance is 025 and the performance in catch trials was

097 6 001 The dashed lines show the fits from the uncertainty reduction model (Text) The mean value of the integration window

indicated by the uncertainty reduction mode is indicated by a vertical dotted line

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 6

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 6: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

Results

We evaluated the sensitivity to temporal imprecisionduring visual object recognition by quantifying cate-gorization of asynchronously presented object frag-ments (Methods Figure 1)

Experiment 1

Overall performance in the task was above chance(chancefrac14 025) for all subjects numbers of fragmentsand SOA values Averaged across all SOA values andnumbers of fragments performance ranged from 037to 077 (060 6 003 mean 6 SEM) Performanceincreased as the number of fragments shown increased(Figure 2) All 16 subjectsrsquo performance was better forthe four-fragment condition than the three-fragmentcondition significantly so for 13 subjects (chi-squared

test p 005 compare triangles versus squares inFigure 3) Catch trial performance was essentially atceiling (097 6 001) Single-fragment performance wasslightly above chance levels (033 6 001) Overallperformance varied slightly but significantly by cate-gory (chi-squared test p 108 animals 059 people060 plants 054 vehicles 057) In sum all subjectswere able to correctly perform the task in spite of therapid presentations and fragmented objects

If subjects performed the task by independentlyevaluating each fragment the performance in single-fragment trials would be predictive of overall perfor-mance in trials with multiple fragments via independentprobability summation (Methods) Different fragmentsfrom a given image contain redundant informationwhich would bias an independent probability summa-tion prediction towards higher values Yet performanceat all SOAs was higher than that predicted byprobability summation dropping at the longest SOA tovalues consistent with probability summation (at SOA

Figure 3 Individual subjectsrsquo performance in Experiment 1 Error bars indicate Clopper-Pearson 95 confidence intervals The x axis

indicates SOA and the y axis indicates the average fraction correct performance for the three-fragment condition (triangles) four-

fragment condition (squares) and single fragment condition () Chance performance is 025 and the performance in catch trials was

097 6 001 The dashed lines show the fits from the uncertainty reduction model (Text) The mean value of the integration window

indicated by the uncertainty reduction mode is indicated by a vertical dotted line

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 6

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 7: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

frac14 2941 ms 046 6 003 observed versus 044 6 004predicted and 050 6 003 observed versus 049 6 004predicted for three- and four-fragment conditionsrespectively Figure 4) Hence independent probabilitysummation is not a good model to describe overallperformance in this task That performance at all butthe longest asynchronies exceeded the independentprobability summation prediction despite the redun-dancy of the fragments within each trial suggests visualintegration which we quantify below

Categorization performance showed a strong de-crease with increasing values of SOA both for three-fragment trials and four-fragment trials (Figures 3ndash4)Performance declined with increasing temporal asyn-chrony for all 16 subjects (Figure 3) This change inperformance was significant for 11 and 14 subjects inthree-fragment and four-fragment trials respectively(chi-squared test p 005) The performance decre-ment between simultaneous presentations and asyn-chronous trials first reached significance at an SOA of294 ms (sign test pfrac14 005) with a difference of 23 6

14 (Figure 4) While a slight improvement inperformance was apparent in the mean values at 118ms SOA relative to simultaneous presentations it wasnot statistically significant (sign test p frac14 060) Theseobservations show that temporal imprecision of even 30ms disrupts recognition performance

To quantify performance (y) as a function of SOA(denoted A below) we started by performing a least-squares fit of an exponential curve to the data fromeach subject for each number of fragments

yethATHORN frac14 Bthorn R 2A=s eth3THORNThe three parameters were the time constant s the

baseline value Bfrac14 y(lsquo) and the range in performanceRfrac14 y(0) y(lsquo) where y(0) represents the performanceat 0 ms SOA and y(lsquo) denotes the asymptoticperformance at infinite SOA We adjusted these threeparameters (B R s) to minimize the squared errorbetween y(A) and the observed data y(A) Theexponential curves provided a reasonable fit to the data(root-mean-squared error RMSE of 0058 6 0004and 0045 6 0003 for three-fragment and four-fragment data respectively) As a measure of thewindow of integration we considered s the SOA atwhich the exponential curve achieved its half heightThe mean value of s across 16 subjects was 120 6 24ms and 104 6 19 ms in three-fragment and four-fragment trials respectively Thus shape integration atspatial scales of at least approximately 58 remainsevident over a time scale on the order of 100 ms

The exponential fits described the shapes of each ofthe response curves and experimental parametersseparately We sought a unifying model that couldcapture all the different experimental conditions andcould be realized by the firing rates of neurons Asdiscussed above probability summation failed inassuming that multiple fragments contribute to per-ception independently To include the interdependenceamong different fragments and to account for theincreasing number of possible two-fragment interac-tions as fragment count increases we developed amodel based on the notion that pairs of fragmentsinteract to reduce uncertainty about the image as anexponential function of their SOA In this model wedescribed a subjectrsquos observed performance in terms ofan underlying uncertainty u which is reduced byinteractions between presented fragments As abovewe assume that the subject actually knew the correctimage category in some fraction i of the trials andguessed at chance in the rest so that y(A)frac14 ithorn [(1 ndash i)4] The empirical uncertainty is then 1ndashi the chancethat the subject did not know the image category Weassume that a second fragment reduces the underlyinguncertainty u by some factor f(A) that depends on theasynchrony A with which it is presented relative to thefirst A third fragment reduces uncertainty by anotherfactor of f(A) and also by a factor of f(2A) because it ispresented A ms after the second fragment and 2A msafter the first fragment We assume independence sothe total uncertainty with three fragments is u f2(A)f(2A) Similarly the total uncertainty in a four-fragment trial is u f3(A) f2(2A) f(3A) If we denotethe total uncertainty with a particular number offragments as ursquo the predicted performance is then 1u0 thorn (u04) Note that u is conceptually related toperformance in single fragment trials but is not directlymathematically relatable The value of u also incorpo-

Figure 4 Summary of results from Experiment 1 Performance is

averaged across nfrac14 16 subjects (individual performance shown

in Figure 3) Conventions are the same as in Figure 3 horizontal

dashed lines indicate the performance level predicted by

probability summation based on single-fragment performance

Error bars show SEM across subjects

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 7

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 8: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

rates the fact that different fragments from a singleimage are not independent but that this dependency isfactored out in the consideration of pairs of fragmentsvia f(A)

For each subject we used their empirical perfor-mance to calculate the empirical uncertainty 1ndashi at eachSOA and each number of fragments We then foundthe least-squares best fits for the uncertainty reductionfunction f(A) and the underlying uncertainty u that wasmodified by f(A) We considered for f(A) exponentialfunctions of the form f(A)frac14 B0 R02As0 Here B0 frac14f(lsquo) is the asymptote which determines what might becalled lsquolsquocognitiversquorsquo or lsquolsquomemory-basedrsquorsquo integration theability to combine information from multiple frag-ments at an arbitrarily long SOA R0frac14 f(lsquo) f(0) is therange of uncertainty reduction spanned by f whichdescribes the benefit of presenting visual informationsimultaneously rather than with very large SOAFinally s0 is the time constant which describes theSOA at the uncertainty reduction functionrsquos half-height Given the exponential nature of this modelthere is some small amount of integration that persistsat arbitrarily long asynchronies decreasing asymptot-ically towards zero Along with the underlying uncer-tainty u this makes four total parameters to describe asubjectrsquos performance across all SOAs

The uncertainty reduction model fit the data quitewell root-mean-square errors (RMSE) for the model(0061 6 0004 and 0046 6 0002 for three-fragmentand four-fragment trials respectively) were not signif-icantly higher than the corresponding RMSE values forexponential fits to the raw data discussed above (00586 0004 and 0045 6 0003 respectively Wilcoxonrank sum test comparing the two models pfrac14 042 and061 respectively) even though the exponential fits tothe raw data have two more free parameters Giventhose extra parameters the corrected Akaike Infor-mation Criterion (Akaike 1974 Burnham amp Anderson2002) is lower for the uncertainty reduction model in 15out of 16 subjects on average the likelihood of theuncertainty reduction model is 374 6 051 times higherthan that of the exponential fits

We described the window of integration with theuncertainty reduction model exponentialrsquos half heightparameter s0 When we fit the model to the data fromindividual subjects we obtained a mean s0of 97 6 9 ms(Figure 3) In Figure 4 we also made use of the sameframework considering all subjects within the experi-ment together Rather than fitting to each subjectrsquosperformance separately we fit one uncertainty reduc-tion model to all subjectsrsquo data The values of the bestfit parameters for the uncertainty reduction model wereufrac14 043 6 002 B0frac14 099 6 001 R0frac14 015 6 002 ands0 frac14 109 ms R2 values between the data and thepredictions for the population uncertainty reductionmodel were 092 for three-fragment trials and 097 for

four-fragment trials This model succinctly and accu-rately described a subjectrsquos performance across allconditions based solely on the reduction of uncertaintyas a function of SOA The model used fewerparameters to explain a subjectrsquos performance than theset of exponential fits it could generalize acrossdifferent numbers of fragments and it suggested anunderlying psychological interpretation

We further considered a model based on signaldetection theory which assumes a probabilistic frame-work to describe each subjectrsquos performance in eachtrial Previous analyses of psychophysics data (Swets1961) have shown that high-threshold models like theuncertainty reduction model just described sometimesdo not capture subjectsrsquo behavior as well as modelsbased on signal detection theory (SDT) We thereforeconstructed a SDT model to test whether it would givedifferent results from the uncertainty reduction modelWe used a Markov chain Monte Carlo (MCMC)method for calculating d0 (a SDT measure of sensitivityto stimulus identity) with bias in four-alternative forcedchoice tasks (DeCarlo 2012) Models were fit usingJAGS (httpsourceforgenetprojectsmcmc-jags)with 10000 iterations of burn-in followed by 20000iterations to approximate d0 distributions Havingcalculated d0 for each subject each asynchrony andeach number of fragments we proceeded to fit a four-parameter model to each subjectrsquos data Let g be ascaled base-two exponential function with parameter s 00

and scale factor R 00 g(A)frac14R 002As 00 Then g describesthe increase in sensitivity brought about by having twofragments with an asynchrony of A The thirdparameter is v 00 the d0 value in single-fragment trials(constrained to lie within the 95 confidence intervalestimated from the MCMC fit) Finally we assumedthat adding additional fragments to a trial might notaid sensitivity in a perfectly efficient fashion (becausefragments contain correlated information) and wedescribed this inefficiency with a multiplicative factorc 00 that progressively reduced the contributions ofsuccessive fragments The model predicts d0 for nfragments with asynchrony A as follows

d0 frac14Xnifrac141

v 00 c 00i1 thorn ethi 1THORN g

A ethn ithorn 1THORN

eth4THORNWe fit the four parameters of this model using a

least-squares procedure and compared the time con-stant values (s 00) obtained with the SDT model againstthose obtained from the uncertainty reduction model(s0) The values of the best fits for the SDT model wereR 00 frac14 021 6 002 v 00 frac14 042 6 006 c 00 frac14 047 6 009and a cross-subject mean s 00 of 101 6 8 ms Given thatthe SDT results were similar to those of the uncertaintyreduction model and the higher computational re-

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 8

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 9: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

quirements for the SDT model we used only theuncertainty reduction model for the remainder of ouranalyses

As noted above performance differed slightlybetween categories To evaluate whether such inho-mogeneity contributed to the observed integration timeconstants we repeated the analyses after matchingperformance across categories by random subsamplingUnder these performance-matched conditions themean s0 was 95 6 9 ms and the population s0 was 105ms We also analyzed each category and each subjectseparately after matching performance The mean s0 forthe four categories were 125 6 23 ms 144 6 33 ms 896 21 ms and 131 6 26 ms respectively for animalspeople plants and vehicles Note that these fits werenoisier due to having fewer data While subjects coulduse different features to recognize images belonging todifferent categories the results were similar acrosscategories

Experiment 2

We extended the first experiment in two variantsaimed at evaluating two-fragment images and very longSOAs (Experiment 2A) and constant trial lengths andno fragment repetition (Experiment 2B) These exper-iment variants are described under Methods

In Experiment 2A the window of integration s0 was101 6 14 ms when two-fragment trials were included inthe fit and 91 6 13 ms when they were not included(Figure 5A) The other model parameters were ufrac14 0336 002 B0 frac14 0990 6 0003 and R0 frac14 0069 6 0006These values were not significantly different from eachother or from the results in Experiment 1 (Wilcoxonrank sum test pfrac14 055 comparing these two fits pfrac14099 and 068 respectively comparing Experiment 1 tothese results with and without two-fragment trials)Furthermore the model fit using only three- and four-fragment trials was a good fit for the two-fragment data(RMSE of 0091 6 0006 using only three- and four-fragment trials versus 0081 6 0004 using all trialsWilcoxon rank sum test pfrac14 020) This demonstratesthat the uncertainty reduction model was able togeneralize to fragment counts that were not used to fitthe model While performance at 2941 ms was slightlybetter than performance at 7059 ms (differing by 346 17) this difference was not significant (sign test pfrac14 016) RMSE and s0 values were unaffected byincluding the data from 7059 ms trials (Wilcoxon ranksum tests pfrac14 097 and 092 respectively) R2 valueswere 030 075 and 078 for two- three- and four-fragment trials respectively The two-fragment R2

value was low because the plot of performance againstasynchrony was almost flat The results of thisexperimental variant suggest that performance at long

SOAs in Experiment 1 was close to asymptote and thatthe uncertainty reduction model can extrapolate totwo-fragment trials

The overall performance in Experiment 2B (as wellas in the previous variant) was lower than inExperiment 1 (cf Figure 5 vs Figure 4) due to theinclusion of more difficult line drawings in the imagesets (Figure 5B) Considering only the trials thatincluded the same types of images across experiments(silhouette images) raised performance to levels com-parable to the ones in Experiment 1 without signif-icantly changing s0 R2 values for the uncertaintyreduction model were 064 and 081 for three- and four-fragment trials respectively The integration windowfrom the uncertainty reduction model in Experiment 2B

Figure 5 Mean performance across all subjects in Experiment 2

A Variant A (n frac14 17 subjects) which added a two-fragment

condition (circles) and data at 7059 ms SOA B Variant B (n frac1417 subjects) which controlled for influence of trial duration and

stimulus repetition (see Results and Methods) Conventions are

otherwise as in Figure 4

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 9

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 10: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

(s0 frac14 123 6 20 ms) was indistinguishable from thatobtained in Experiment 1 (Wilcoxon rank sum test pfrac14079) The other model parameters were ufrac14 030 6002 B0 frac14 0990 6 0003 and R0 frac14 0064 6 0009 Thesimilar s0 values suggest that the results of Experiment1 cannot be ascribed to differences in trial duration orslight familiarity with the images shown

Discussion

We measured the performance of 50 subjects indifferent variants of an experiment to evaluate theeffect of presenting object fragments with varyingdegrees of asynchrony By applying this externallyinduced temporal jitter we characterized the impor-tance of temporal precision for visual object recogni-tion

Subjects might in principle have made educatedguesses based on the content of individual fragmentsor combined such guesses independently across multi-ple fragments Instead we found that performance wassignificantly higher than predicted by independentprobability summation indicating that informationfrom different fragments was synergistically integratedRecognition performance decreased with increasingSOA and was well characterized by a simple modelbased on the reduction of uncertainty as a function ofthe temporal proximity of the presented image parts Inprinciple such a model could be realized by neuronswhose firing rate increases with the arrival of each newburst of information and then relaxes more slowlytowards baseline Neurons showing such firing rateincreases upon transient stimulus onset and decaydynamics over hundreds of milliseconds have beendescribed in recordings in the lateral intraparietalcortex during a motion discrimination task (Huk ampShadlen 2005)

While we focused on the uncertainty reductionmodel we also fit the data from Experiment 1 with amodel based on signal detection theory These twomodels reflect different paradigms of psychologicalprocessing In the former each additional piece ofinformation (provided by each successive fragment)multiplicatively reduces the subjectrsquos uncertainty aboutwhat was shown increasing the probability of giving acorrect answer In the latter each additional fragmentadds to the distance between the actual category andthe other categories in an implicit psychological spaceagain increasing the probability of discriminating thetrue category from the others There are importantdifferences between these paradigms (Swets 1961)though either could be realized via the integrativeneurons hypothesized above The uncertainty reductionmodel might be implemented by a set of downstream

neurons that act as thresholds on the evidenceaccumulated by the integrative neurons the firstdownstream neuron to cross threshold could report theidentity of the viewed image Support for one oranother image would manifest as increased firing ratesin integrative neurons feeding into the associateddownstream neurons increasing its chance of being thefirst to cross threshold The signal detection theorymight instead reflect a set of downstream neuronswhose activity reflected which integrative neuron wasmost active in a winner-take-all fashion The integra-tive neurons would then essentially be counting votesUltimately given that the two models yielded indis-tinguishable estimates of the window of temporalintegration it is difficult to draw any conclusions aboutwhich paradigm is more likely to be correct based onthe current data

The results demonstrated that asynchronies of a fewtens of milliseconds disrupted categorization perfor-mance This disruption is consistent with a body ofneurophysiological observations and computationalmodels of object recognition that are based on a rapidcascade of processing events along the ventral visualstream (DiCarlo et al 2012 Fukushima 1980 Hung etal 2005 Johnson amp Olshausen 2003 Kirchner ampThorpe 2006 Liu et al 2009 Potter amp Levy 1969Richmond et al 1990 Riesenhuber amp Poggio 2000Rolls 1991 Serre et al 2007 Thorpe et al 1996vanRullen amp Thorpe 2002) The response latenciesacross the ventral stream consistently increase by about10 to 20 ms at each stage from the retina to thethalamus to a cascade of cortical areas from primaryvisual cortex to inferior temporal cortex (Maunsell1987 Schmolesky et al 1998) The rapidity with whichvisual shape information progresses through the ventralvisual cortex and is converted into behavioral orperceptual output is consistent with the currentobservation that object recognition begins to sufferimpairment with even minor disruptions of stimulustiming

In addition to the impairment in shape recognitionthrough spatial integration due to temporal asynchro-ny timing judgments can be influenced by spatial cuesSpatial grouping cues that encourage the bindingtogether of separate parts can interfere with simulta-neity judgments at a time scale similar to that reportedhere (Cheadle et al 2008) Spatial judgments areinfluenced by temporal constraints and temporaljudgments are influenced by spatial constraints

While temporal asynchrony disrupted recognitionwe observed that performance was well above asymp-totic levels even at SOAs of approximately 100 msIntegration at these SOAs cannot be ascribed to trialduration eye movements remembering the stimulifrom previous trials or forgetting the fragments shownwithin a trial (Figure 5) Performance values at SOAs

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 10

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 11: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

of approximately 100 ms are consistent with theextended durations of responses in inferior temporalcortex (De Baene et al 2007) These extended responsedurations might instantiate a buffer allowing theintegration of visual shape information over time Theability to integrate visual information over brief spanscould underlie many critical visual functions Forexample percepts of camouflaged or partially occludedobjects could be assembled over time as the objectsmoved relative to their background or occluders(Nishida 2004) and information could be integratedacross saccades (Irwin Yantis amp Jonides 1983Jonides Irwin amp Yantis 1982 OrsquoRegan amp Levy-Schoen 1983)

The observed integration of form information over 100 ms is unlikely to reflect working memoryprocesses as the temporal scales involved in workingmemory span several seconds (Vogel Woodman ampLuck 2006) There is however sufficient time betweenstimulus presentation and the response period that the(integrated) image or category information could beheld in working memory The window of temporalintegration reported here has been observed in otherdomains and with other experimental paradigms(Caudek Domini amp Di Luca 2002 Forget Buiatti ampDehaene 2009 Nishida 2004) In particular perfor-mance that exceeds probability summation at shortSOAs is consistent with the temporal scales involved iniconic memory (Coltheart 1980 Di Lollo 1977Eriksen amp Collins 1968 Hogben amp Di Lollo 1974Sperling 1960) Prior work has shown that informationabout arrays of letters (Sperling 1960) or flashingsquares (Hogben amp Di Lollo 1974) remains accessiblefor a short time after presentation It is possible that theintegration we observed at short SOAs in the objectrecognition system arises from mechanisms similar tothose underlying the previously reported persistence ofsimple visual lsquolsquoiconsrsquorsquo though iconic memory has beenshown to be disrupted by masks (Di Lollo Clark ampHogben 1988) Visual persistence as instantiated bythe dynamics of retinal cells is generally terminated bya mask it is more likely that we are measuring neuralor informational persistence at a higher level of thevisual system (Coltheart 1980) While the dynamicnoise in the stimulus sequence could partly mask visualpersistence it certainly does not completely obliterateit Hence the integration constants reported hereconstitute an upper bound and it is conceivable thatvisual information could decay even more rapidly thanreported here

Two additional phenomena share similar character-istics with the dynamic rate of decay reported hereIntegration masking in which irrelevant informationpresented immediately before or after the presentationof a target inhibits perception of that target by addingnoise to it exhibits a similar time scale to that described

here and may reflect a similar process (Enns amp DiLollo 2000) Priming (in which a stimulus facilitatesthe perception or naming of a later stimulus) might alsoplay a part in these observations Priming would likelyincrease performance at longer SOA positive effects ofpriming typically are small or nonexistent at SOA closeto zero and increase with increasing SOA (La HeijDirkx amp Kramer 1990 Scharlau amp Neumann 2003)

A series of elegant studies has shown that detect-ability and perception can be influenced by the exacttime at which a stimulus is presented with respect toongoing endogenous oscillations (Rohenkohl amp Nobre2011 Van Dijk Schoffelen Oostenveld amp Jensen2008) In the context of the task presented here it isconceivable that the initial dynamic noise could resetsuch endogenous oscillations and that certain presen-tation times and SOAs could lead to enhancedrecognition by virtue of their specific phase The limitednumber and uneven sampling of SOAs precludes asystematic investigation of these effects here but it willbe interesting in future studies to examine thephysiological signals underlying recognition of asyn-chronously presented objects

Several studies have shown interactions betweenasynchronous parts of faces (Anaki et al 2007 Cheunget al 2011 Singer amp Sheinberg 2006) those findingsare better described in terms of higher level phenomenathan object recognition Complex images have alsobeen used in a study of subjective simultaneity (Loftusamp Hanna 1989) in which subjects were asked to ratehow complete or integrated images appeared to bewhen divided into two parts presented with varyingdurations and asynchronies Here we expand upon thisbody of previous work by showing that complex shapeinformation can be combined across both space andtime to enable object recognition Our results wereconsistent across a battery of controls designed toaccount for eye movements stimulus familiarity andmemory on longer time scales

Conclusions

Integration across image fragments begins to breakdown when as little as 30 ms separates the fragmentsand decays with a time constant of approximately 100ms The current characterization of the time coursewith which visual object recognition breaks down as afunction of temporal asynchrony provides informationcritical to building computer vision systems thatoperate beyond static snapshots to modeling visualobject recognition to designing experiments that probethe performance over time of the human visual systemand to the construction of theories of perception in adynamic world

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 11

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 12: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

Keywords object recognition temporal integrationfragmented images temporal sensitivity visual dynamics

Acknowledgments

The authors thank Neal Dach and Joanna Li forassistance with stimulus preparation and data collec-tion and John Maunsell Thomas Miconi and HanlinTang for their insightful comments on the manuscriptThis work was supported by NSF grants 0954570 andCCF-1231216 (GK)

Commercial relationships noneCorresponding author Jedediah Miller SingerEmail jedediahsingerchildrensharvardeduAddress Department of Ophthalmology BostonChildrenrsquos Hospital Harvard Medical School BostonMA USA

References

Akaike H A (1974) A new look at the statisticalmodel identification IEEE Transactions on Auto-matic Control 19 716ndash723

Anaki D Boyd J amp Moscovitch M (2007)Temporal integration in face perception Evidenceof configural processing of temporally separatedface parts Journal of Experimental PsychologyHuman Perception and Performance 33(1) 1ndash19doi 2007-01135-001 [pii] 1010370096-15233311

Aspell J E Wattam-Bell J amp Braddick O (2006)Interaction of spatial and temporal integration inglobal form processing Vision Research 46(18)2834ndash2841 doi S0042-6989(06)00118-0 [pii] 101016jvisres200602018

Bair W amp Movshon J A (2004) Adaptive temporalintegration of motion in direction-selective neuronsin macaque visual cortex Journal of Neuroscience24(33) 7305ndash7323 doi101523JNEUROSCI0554-042004 24339305 [pii]

Brainard D H (1997) The Psychophysics ToolboxSpatial Vision 10 433ndash436

Brockmole J R Wang R F amp Irwin D E (2002)Temporal integration between visual images andvisual percepts Journal of Experimental Psycholo-gy Human Perception and Performance 28(2) 315ndash334

Burnham K P amp Anderson D (2002) Modelselection and multi-model inference (2nd ed) NewYork Springer

Burr D C amp Santoro L (2001) Temporal integra-tion of optic flow measured by contrast andcoherence thresholds Vision Research 41(15)1891ndash1899 doi S0042-6989(01)00072-4 [pii]

Caudek C Domini F amp Di Luca M (2002) Short-term temporal recruitment in structure frommotion Vision Research 42 10

Cavanagh P Holcombe A O amp Chou W (2008)Mobile computation Spatiotemporal integration ofthe properties of objects in motion Journal ofVision 8(12)1 1ndash23 httpwwwjournalofvisionorgcontent8121 doi1011678121 [PubMed][Article]

Cheadle S Bauer F Parton A Muller H BonnehY S amp Usher M (2008) Spatial structure affectstemporal judgments Evidence for a synchronybinding code Journal of Vision 8(7)12 1ndash12httpjournalofvisionorgcontent8712 doi1011678712 [PubMed] [Article]

Cheung O S Richler J J Phillips W S ampGauthier I (2011) Does temporal integration offace parts reflect holistic processing PsychonomicBulletin and Review 18(3) 476ndash483 doi103758s13423-011-0051-7

Clifford C W Holcombe A O amp Pearson J (2004)Rapid global form binding with loss of associatedcolors Journal of Vision 4(12)8 1090ndash1101 httpwwwjournalofvisionorgcontent4128 doi1011674128 [PubMed] [Article]

Coltheart M (1980) Iconic memory and visiblepersistence Attention Perception amp Psychophysics27(3) 183ndash228 doi103758bf03204258

Connor C E Brincat S L amp Pasupathy A (2007)Transformation of shape information in the ventralpathway Current Opinion in Neurobiology 17(2)140ndash147

Cornelissen F W Peters E M amp Palmer J (2002)The Eyelink Toolbox Eye tracking with MATLABand the Psychophysics Toolbox Behavioral Re-search Methods Instruments and Computers 34 4

De Baene W Premereur E amp Vogels R (2007)Properties of shape tuning of macaque inferiortemporal neurons examined using rapid serialvisual presentation Journal of Neurophysiology97(4) 2900ndash2916 doi 007412006 [pii] 101152jn007412006

DeCarlo L T (2012) On a signal detection approachto m-alternative forced choice with bias withmaximum likelihood and bayesian approaches toestimation Journal of Mathematical Psychology56(3) 196ndash207

Deco G amp Rolls E T (2004) Computational

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 12

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 13: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

neuroscience of vision Oxford UK Oxford Uni-versity Press

Di Lollo V (1977) Temporal characteristics of iconicmemory Nature 267(5608) 241ndash243

Di Lollo V Clark C D amp Hogben J H (1988)Separating visible persistence from retinal afterim-ages Perception and Psychophysics 44(4) 363ndash368

DiCarlo J J Zoccolan D amp Rust N C (2012)How does the brain solve visual object recognitionNeuron 73(3) 415ndash434 doi S0896-6273(12)00092-X [pii] 101016jneuron201201010

Douglas R J amp Martin K A (2004) Neuronalcircuits of the neocortex Annual Review of Neuro-science 27 419ndash451

Enns J T amp Di Lollo V (2000) Whatrsquos new in visualmasking Trends in Cognitive Sciences 4(9) 345ndash352

Eriksen C W amp Collins J F (1968) Sensory tracesversus the psychological moment in the temporalorganization of form Journal of ExperimentalPsychology 77(3) 376ndash382

Felleman D J amp Van Essen D C (1991) Distributedhierarchical processing in the primate cerebralcortex Cerebral Cortex 1 1ndash47

Forget J Buiatti M amp Dehaene S (2009) Temporalintegration in visual word recognition Journal ofCognitive Neuroscience 22(5) 14

Fukushima K (1980) Neocognitron A self organizingneural network model for a mechanism of patternrecognition unaffected by shift in position Biolog-ical Cybernetics 36(4) 193ndash202

Hogben J H amp Di Lollo V (1974) Perceptualintegration and perceptual segregation of briefvisual stimuli Vision Research 14(11) 1059ndash1069

Huk A C amp Shadlen M N (2005) Neural activity inmacaque parietal cortex reflects temporal integra-tion of visual motion signals during perceptualdecision making The Journal of Neuroscience25(45) 16

Hung C P Kreiman G Poggio T amp DiCarlo J J(2005) Fast read-out of object identity frommacaque inferior temporal cortex Science 310863ndash866

Irwin D E Yantis S amp Jonides J (1983) Evidenceagainst visual integration across saccadic eyemovements Perception and Psychophysics 34(1)49ndash57

Johnson J S amp Olshausen B A (2003) Timecourseof neural signatures of object recognition Journalof Vision 3(7)4 499ndash512 httpwwwjournalofvisionorgcontent374 doi101167374 [PubMed] [Article]

Jonides J Irwin D E amp Yantis S (1982)Integrating visual information from successivefixations Science 215(4529) 192ndash194

Kirchner H amp Thorpe S J (2006) Ultra-rapid objectdetection with saccadic eye movements Visualprocessing speed revisited Vision Research 46(11)1762ndash1776 doi S0042-6989(05)00511-0 [pii] 101016jvisres200510002

La Heij W Dirkx J amp Kramer P (1990)Categorical interference and associative priming inpicture naming British Journal of Psychology81(4) 511ndash525

Liu H Agam Y Madsen J R amp Kreiman G(2009) Timing timing timing Fast decoding ofobject information from intracranial field poten-tials in human visual cortex Neuron 62(2) 281ndash290

Loftus G R amp Hanna A M (1989) The phenom-enology of spatial integration Data and modelsCognitive Psychology 21(3) 363ndash397 doi0010-0285(89)90013-3 [pii]

Logothetis N K amp Sheinberg D L (1996) Visualobject recognition Annual Review of Neuroscience19 577ndash621

Maunsell J H R (1987) Physiological evidence fortwo visual subsystems In L Vaina (Ed) Mattersof intelligence (pp 59ndash87) Dordrecht The Neth-erlands Reidel Press

Neri P Morrone M C amp Burr D C (1998) Seeingbiological motion Nature 395(6705) 894ndash896 doi10103827661

Nishida S (2004) Motion-based analysis of spatialpatterns by the human visual system CurrentBiology 14 830ndash839

OrsquoRegan J K amp Levy-Schoen A (1983) Integratingvisual information from successive fixations Doestrans-saccadic fusion exist Vision Research 23(8)765ndash768

Pasupathy A amp Connor C E (1999) Responses tocontour features in macaque area V4 Journal ofNeurophysiology 82(5) 2490ndash2502

Pelli D G (1997) The VideoToolbox software forvisual psychophysics Transforming numbers intomovies Spatial Vision 10 437ndash442

Potter M C amp Levy E (1969) Recognition memoryfor a rapid sequence of pictures Journal ofExperimental Psychology 81(1) 10ndash15

Richmond B J Optican L M amp Spitzer H (1990)Temporal encoding of two-dimensional patterns bysingle units in primate primary visual cortex IStimulus-response relations Journal of Neurophys-iology 64(2) 351ndash369

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 13

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1
Page 14: Short temporal asynchrony disrupts visual object recognition · Short temporal asynchrony disrupts visual object recognition Jedediah M. Singer $ Department of Ophthalmology, Boston

Riesenhuber M amp Poggio T (2000) Models of objectrecognition Nature Neuroscience 3(Suppl) 1199ndash1204

Ringach D L Hawken M J amp Shapley R (2003)Dynamics of orientation tuning in macaque V1The role of global and tuned suppression Journalof Neurophysiology 90(1) 342ndash352 doi101152jn010182002 010182002 [pii]

Rohenkohl G amp Nobre A C (2011) Alphaoscillations related to anticipatory attention followtemporal expectations The Journal of Neurosci-ence 31(40) 14076ndash14084

Rolls E T (1991) Neural organization of highervisual functions Current Opinion in Neurobiology1 274ndash278

Sanocki T (2001) Interaction of scale and time duringobject identification Journal of Experimental Psy-chology Human Perception and Performance 27(2)290ndash302

Scharlau I amp Neumann O (2003) Temporalparameters and time course of perceptual latencypriming Acta Psychologica 113(2) 185ndash203

Schmolesky M T Wang Y C Hanes D PThompson K G Leutgeb S Schall J D Leventhal A G (1998) Signal timing across themacaque visual system Journal of Neurophysiology79(6) 3272ndash3278

Schyns P G Petro L S amp Smith M L (2007)Dynamics of visual information integration in thebrain for categorizing facial expressions CurrentBiology 17(18) 1580ndash1585 doi S0960-9822(07)01904-5 [pii] 101016jcub200708048

Serre T Kreiman G Kouh M Cadieu CKnoblich U amp Poggio T (2007) A quantitativetheory of immediate visual recognition Progress inBrain Research 165C 33ndash56

Singer J M amp Sheinberg D L (2006) Holisticprocessing unites face parts across time VisionResearch 46(11) 1838ndash1847 doi S0042-6989(05)00563-8 [pii] 101016jvisres200511005

Sperling G (1960) The information available in briefvisual presentations Washington DC AmericanPsychological Association

Swets J A (1961) Is there a sensory thresholdScience 134(3473) 168ndash177

Tanaka K (1996) Inferotemporal cortex and objectvision Annual Review of Neuroscience 19 109ndash139

Thorpe S Fize D amp Marlot C (1996) Speed ofprocessing in the human visual system Nature 381520ndash522

Van Dijk H Schoffelen J M Oostenveld R ampJensen O (2008) Prestimulus oscillatory activity inthe alpha band predicts visual discriminationability The Journal of Neuroscience 28(8) 1816ndash1823

vanRullen R amp Thorpe S J (2002) Surfing a spikewave down the ventral stream Vision Research 422593ndash2615

Vogel E K Woodman G F amp Luck S J (2006)The time course of consolidation in visual workingmemory Journal of Experimental PsychologyHuman Perception and Performance 32(6) 1436ndash1451 doi1010370096-15233261436

Journal of Vision (2014) 14(5)7 1ndash14 Singer amp Kreiman 14

  • Introduction
  • Methods
  • f01
  • f02
  • e01
  • e02
  • Results
  • f03
  • e03
  • f04
  • e04
  • f05
  • Discussion
  • Conclusions
  • Akaike1
  • Anaki1
  • Aspell1
  • Bair1
  • Brainard1
  • Brockmole1
  • Burnham1
  • Burr1
  • Caudek1
  • Cavanagh1
  • Cheadle1
  • Cheung1
  • Clifford1
  • Coltheart1
  • Connor1
  • Cornelissen1
  • DeBaene1
  • DeCarlo1
  • Deco1
  • DiLollo1
  • DiLollo2
  • DiCarlo2
  • Douglas1
  • Enns1
  • Eriksen1
  • Felleman1
  • Forget1
  • Fukushima1
  • Hogben1
  • Huk1
  • Hung1
  • Irwin1
  • Johnson1
  • Jonides1
  • Kirchner1
  • LaHeij1
  • Liu1
  • Loftus1
  • Logothetis1
  • Maunsell1
  • Neri1
  • Nishida1
  • ORegan1
  • Pasupathy1
  • Pelli1
  • Potter1
  • Richmond1
  • Riesenhuber1
  • Ringach1
  • Rohenkohl1
  • Rolls1
  • Sanocki1
  • Scharlau1
  • Schmolesky1
  • Schyns1
  • Serre1
  • Singer1
  • Sperling1
  • Swets1
  • Tanaka1
  • Thorpe1
  • VanDijk1
  • vanRullen1
  • Vogel1