22
LETTER Communicated by Laurence Abbott Synergy in a Neural Code Naama Brenner NEC Research Institute, Princeton, NJ 08540, U.S.A. Steven P. Strong Institute for Advanced Study, Princeton, NJ 08544, and NEC Research Institute, Princeton, NJ 08540, U.S.A. Roland Koberle Instituto di F´ õsica de S˜ ao Carlos, Universidade de S˜ ao Paulo, 13560–970 S˜ ao Carlos, SP Brasil, and NEC Research Institute, Princeton, NJ 08540, U.S.A. William Bialek Rob R. de Ruyter van Steveninck NEC Research Institute, Princeton, NJ 08540, U.S.A. We show that the information carriedby compound events in neural spike trains—patterns of spikes across time or across a population of cells— can be measured, independent of assumptions about what these patterns might represent. By comparing the information carried by a compound pattern with the information carried independently by its parts, we di- rectly measure the synergy among these parts. We illustrate the use of these methods by applying them to experiments on the motion-sensitive neuron H1 of the y’s visual system, where we con rm that two spikes close together in time carry far more than twice the information carried by a single spike. We analyze the sources of this synergy and provide evidence that pairs of spikes close together in time may be especially important patterns in the code of H1. 1 Introduction Throughout the nervous system, information is encoded in sequences of identical action potentials or spikes. The representation of sense data by these spike trains has been studied for 70 years (Adrian, 1928), but there remain many open questions about the structure of this code. A full un- derstanding of the code requires that we identify its elementary symbols and characterize the messages that these symbols represent. Many different possible elementary symbols have been considered, implicitly or explicitly, in previous work on neural coding. These include the numbers of spikes in time windows of xed size and individual spikes themselves. In cells that Neural Computation 12, 1531–1552 (2000) c ° 2000 Massachusetts Institute of Technology

Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

LETTER Communicated by Laurence Abbott

Synergy in a Neural Code

Naama BrennerNEC Research Institute Princeton NJ 08540 USA

Steven P StrongInstitute for Advanced Study Princeton NJ 08544 and NEC Research InstitutePrinceton NJ 08540 USA

Roland KoberleInstituto di Fotildesica de Sao Carlos Universidade de Sao Paulo 13560ndash970 Sao CarlosSP Brasil and NEC Research Institute Princeton NJ 08540 USA

William BialekRob R de Ruyter van SteveninckNEC Research Institute Princeton NJ 08540 USA

We show that the information carriedby compound events in neural spiketrainsmdashpatterns of spikes across time or across a population of cellsmdashcan be measured independent of assumptions about what these patternsmight represent By comparing the information carried by a compoundpattern with the information carried independently by its parts we di-rectly measure the synergy among these parts We illustrate the use ofthese methods by applying them to experiments on the motion-sensitiveneuron H1 of the yrsquos visual system where we conrm that two spikesclose together in time carry far more than twice the information carriedby a single spike We analyze the sources of this synergy and provideevidence that pairs of spikes close together in time may be especiallyimportant patterns in the code of H1

1 Introduction

Throughout the nervous system information is encoded in sequences ofidentical action potentials or spikes The representation of sense data bythese spike trains has been studied for 70 years (Adrian 1928) but thereremain many open questions about the structure of this code A full un-derstanding of the code requires that we identify its elementary symbolsand characterize the messages that these symbols represent Many differentpossible elementary symbols have been considered implicitly or explicitlyin previous work on neural coding These include the numbers of spikes intime windows of xed size and individual spikes themselves In cells that

Neural Computation 12 1531ndash1552 (2000) cdeg 2000 Massachusetts Institute of Technology

1532 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

produce bursts of action potentials these bursts might be special symbolsthat convey information in addition to that carried by single spikes Yet an-other possibility is that patterns of spikesmdashacross time in one cell or acrossa population of cellsmdashcan have a special signicance a possibility that hasreceived renewed attention as techniques emerge for recording the activityof many neurons simultaneously

In many methods of analysis questions about the symbolic structure ofthe code are mixed with questions about what the symbols represent Thusin trying to characterize the feature selectivity of neurons one often makesthe a priori choice to measure the neural response as the spike count orrate in a xed window Conversely in trying to assess the signicance ofsynchronous spikes from pairs of neurons or bursts of spikes from a sin-gle neuron one might search for a correlation between these events andsome particular stimulus features In each case conclusions about one as-pect of the code are limited by assumptions about another aspect Here weshow that questions about the symbolic structure of the neural code can beseparated out and answered in an information-theoretic framework usingdata from suitably designed experiments This framework allows us to ad-dress directly the signicance of spike patterns or other compound spikingevents How much information is carried by a compound event Is there re-dundancy or synergy among the individual spikes Are particular patternsof spikes especially informative

Methods to assess the signicance of spike patterns in the neural codeshare a common intuitive basis

Patterns of spikes can play a role in representing stimuli if and only ifthe occurrence of patterns is linked to stimulus variations

The patterns have a special role only if this correlation between sensorysignals and patterns is not decomposable into separate correlationsbetween the signals and the pieces of the pattern such as the individualspikes

We believe that these statements are not controversial Difculties arisewhen we try to quantify this intuitive picture What is the correct measure ofcorrelation How much correlation is signicant Can we make statementsindependent of models and elaborate null hypotheses

The central claim of this article is that many of these difculties canbe resolved using ideas from information theory Shannon (1948) provedthat entropy and information provide the only measures of variability andcorrelation that are consistent with simple and plausible requirements Fur-ther while it may be unclear how to interpret for example a 20 increasein correlation between spike trains an extra bit of information carried bypatterns of spikes means precisely that these patterns provide a factor-of-two increase in the ability of the system to distinguish among differ-ent sensory inputs In this work we show that there is a direct method

Synergy in a Neural Code 1533

of measuring the information (in bits) carried by particular patterns ofspikes under given stimulus conditions independent of models for thestimulus features that these patterns might represent In particular we cancompare the information conveyed by spike patterns with the informationconveyed by the individual spikes that make up the pattern and deter-mine quantitatively whether the whole is more or less than the sum of itsparts

While this method allows us to compute unambiguously how much in-formation is conveyed by the patterns it does not tell us what particularmessage these patterns convey Making the distinction between two issuesthe symbolic structure of the code and the transformation between inputsand outputs we address only the rst of these two Constructing a quanti-tative measure for the signicance of compound patterns is an essential rststep in understanding anything beyond the single spike approximation andit is especially crucial when complex multi-neuron codes are consideredSuch a quantitative measure will be useful for the next stage of modelingthe encoding algorithm in particular as a control for the validity of modelsThis is a subject of ongoing work and will not be discussed here

2 Information Conveyed by Compound Patterns

In the framework of information theory (Shannon 1948) signals are gener-ated by a source with a xed probability distribution and encoded intomessages by a channel The coding is in general probabilistic and thejoint distribution of signals and coded messages determines all quanti-ties of interest in particular the information transmitted by the channelabout the source is an average over this joint distribution In studying asensory system the signals generated by the source are the stimuli pre-sented to the animal and the messages in the communication channelare sequences of spikes in a neuron or a population of neurons Both thestimuli and the spike trains are random variables and they convey in-formation mutually because they are correlated The problem of quan-tifying this information has been discussed from several points of view(MacKay amp McCulloch 1952 Optican amp Richmond 1987 Rieke Warlandde Ruyter van Steveninck amp Bialek 1997 Strong Koberle de Ruyter vanSteveninck amp Bialek 1998) Here we address the question of how muchinformation is carried by particular ldquoeventsrdquo or combinations of actionpotentials

21 Background A discrete event E in the spike train is dened as aspecic combination of spikes Examples are a single spike a pair of spikesseparated by a given time spikes from two neurons that occur in synchronyand so on Information is carried by the occurrence of events at particulartimes and not others implying that they are correlated with some stim-ulus features and not with others Our task is to express the information

1534 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

conveyed on average by an event E in terms of quantities that are easilymeasured experimentally1

In experiments as in nature the animal is exposed to stimuli at eachinstant of time We can describe this sensory input by a function s(t0 ) whichmay have many components to parameterize the time dependence of mul-tiple stimulus features In general the information gained about s(t0 ) byobserving a set of neural responses is

I DX

responses

ZDs(t0 )P[s(t0 ) amp response] log2

iexclP[s(t0 ) amp response]P[s(t0 )]P[response]

cent (21)

where information is measured in bits It is useful to note that this mutualinformation can be rewritten in two complementary forms

I D iexclZ

Ds(t0 )P[s(t0 )] log2 P[s(t0 )]

CX

responsesP[response]

ZDs(t0 )P[s(t0 ) |response] log2 P[s(t0 )|response]

D S[P(s)] iexcl hS[P(s|response)]iresponse (22)

or

I D iexclX

responsesP(response) log2 P(response)

CZ

Ds(t0 )P[s(t0 )]X

responsesP[response|s(t0)] log2 P[response|s(t0 )]

D S[P(response)] iexcl hS[P(response |s)]is (23)

where S denotes the entropy of a distribution by hcent cent centis we mean an av-erage over all possible values of the sensory stimulus weighted by theirprobabilities of occurrence and similarly for hcent cent centiresponses In the rst formequation 22 we focus on what the responses are telling us about the sen-sory stimulus (de Ruyter van Steveninck amp Bialek 1988) different responsespoint more or less reliably to different signals and our average uncertaintyabout the sensory signal is reduced by observing the neural response In thesecond form equation 23 we focus on the variability and reproducibilityof the neural response (de Ruyter van Steveninck Lewen Strong Koberle

1 In our formulation an event E is a random variable that can have several outcomesfor example it can occur at different times in the experiment The information conveyed bythe event about the stimulus is an average over the joint distribution of stimuli and eventoutcomes One can also associate an information measure with individual occurrences ofevents (DeWeese amp Meister 1998)The averageof this measure over the possible outcomesis the mutual information as dened by Shannon (1948) and used in this article

Synergy in a Neural Code 1535

amp Bialek 1997 Strong et al 1998) The range of possible responses providesthe system with a capacity to transmit information and the variability thatremains when the sensory stimulus is specied constitutes noise the differ-ence between the capacity and the noise is the information

We would like to apply this second form to the case where the neuralresponse is a particular type of event When we observe an event E in-formation is carried by the fact that it occurs at some particular time tEThe range of possible responses is then the range of times 0 lt tE lt T inour observation window Alternatively when we observe the response in aparticular small time bin of size D t information is carried by the fact thatthe event E either occurs or does not The range of possible responses thenincludes just two possibilities Both of these points of view have an arbitraryelement the choice of bin size D t and the window size T Characterizingthe properties of the system as opposed to our observation power requirestaking the limit of high time resolution (D t 0) and long observation times(T 1) As will be shown in the next section in this limit the two pointsof view give the same answer for the information carried by an event

22 Information and Event Rates A crucial role is played by the eventrate rE(t) the probability per unit time that an event of type E occurs attime t given the stimulus history s(t0 ) Empirical construction of the eventrate rE(t) requires repetition of the same stimulus history many times sothat a histogram can be formed (see Figure 1) For the case where events aresingle spikes this is the familiar time-dependent ring rate or poststimulustime histogram (see Figure 1c) the generalization to other types of eventsis illustrated by Figures 1d and 1e Intuitively a uniform event rate impliesthat no information is transmitted whereas the presence of sharply denedfeatures in the event rate implies that much information is transmitted bythese events (see for example the discussion by Vaadia et al 1995) We nowformalize this intuition and show how the average information carried bya single event is related quantitatively to the time-dependent event rate

Let us take the rst point of view about the neural response variable inwhich the range of responses is described by the possible arrival times tof the event 2 What is the probability of nding the event at a particulartime t Before we know the stimulus history s(t0 ) all we can say is that theevent can occur anywhere in our experimental window of size T so thatthe probability is uniform P(response) D P(t) D 1 T with an entropy ofS[P(t)] D log2 T Once we know the stimulus we also know the event raterE(t) and so our uncertainty about the occurrence time of the event is re-duced Events will occur preferentially at times where the event rate is largeso the probability distribution should be proportional to rE(t) with proper

2 We assume that the experiment is long enough so that a typical event is certain tooccur at some time

1536 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 1 Generalized event rates in the stimulus-conditional response ensem-ble A time-dependent visual stimulus is shown to the y (a) with the timeaxis dened to be zero at the beginning of the stimulus This stimulus runs for10 s and is repeatedly presented 360 times The responses of the H1 neuron to60 repetitions are shown as a raster (b) in which each dot represents a singlespike From these responses time-dependent event rates rE(t) are estimated(c) the ring rate (poststimulus time histogram) (d) the rate for spike pairs withinterspike time t D 3 sect 1 ms and (e) for pairs with t D 17 sect 1 ms (e) These ratesallow us to compute directly the information transmitted by the events usingequation 25

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 2: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1532 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

produce bursts of action potentials these bursts might be special symbolsthat convey information in addition to that carried by single spikes Yet an-other possibility is that patterns of spikesmdashacross time in one cell or acrossa population of cellsmdashcan have a special signicance a possibility that hasreceived renewed attention as techniques emerge for recording the activityof many neurons simultaneously

In many methods of analysis questions about the symbolic structure ofthe code are mixed with questions about what the symbols represent Thusin trying to characterize the feature selectivity of neurons one often makesthe a priori choice to measure the neural response as the spike count orrate in a xed window Conversely in trying to assess the signicance ofsynchronous spikes from pairs of neurons or bursts of spikes from a sin-gle neuron one might search for a correlation between these events andsome particular stimulus features In each case conclusions about one as-pect of the code are limited by assumptions about another aspect Here weshow that questions about the symbolic structure of the neural code can beseparated out and answered in an information-theoretic framework usingdata from suitably designed experiments This framework allows us to ad-dress directly the signicance of spike patterns or other compound spikingevents How much information is carried by a compound event Is there re-dundancy or synergy among the individual spikes Are particular patternsof spikes especially informative

Methods to assess the signicance of spike patterns in the neural codeshare a common intuitive basis

Patterns of spikes can play a role in representing stimuli if and only ifthe occurrence of patterns is linked to stimulus variations

The patterns have a special role only if this correlation between sensorysignals and patterns is not decomposable into separate correlationsbetween the signals and the pieces of the pattern such as the individualspikes

We believe that these statements are not controversial Difculties arisewhen we try to quantify this intuitive picture What is the correct measure ofcorrelation How much correlation is signicant Can we make statementsindependent of models and elaborate null hypotheses

The central claim of this article is that many of these difculties canbe resolved using ideas from information theory Shannon (1948) provedthat entropy and information provide the only measures of variability andcorrelation that are consistent with simple and plausible requirements Fur-ther while it may be unclear how to interpret for example a 20 increasein correlation between spike trains an extra bit of information carried bypatterns of spikes means precisely that these patterns provide a factor-of-two increase in the ability of the system to distinguish among differ-ent sensory inputs In this work we show that there is a direct method

Synergy in a Neural Code 1533

of measuring the information (in bits) carried by particular patterns ofspikes under given stimulus conditions independent of models for thestimulus features that these patterns might represent In particular we cancompare the information conveyed by spike patterns with the informationconveyed by the individual spikes that make up the pattern and deter-mine quantitatively whether the whole is more or less than the sum of itsparts

While this method allows us to compute unambiguously how much in-formation is conveyed by the patterns it does not tell us what particularmessage these patterns convey Making the distinction between two issuesthe symbolic structure of the code and the transformation between inputsand outputs we address only the rst of these two Constructing a quanti-tative measure for the signicance of compound patterns is an essential rststep in understanding anything beyond the single spike approximation andit is especially crucial when complex multi-neuron codes are consideredSuch a quantitative measure will be useful for the next stage of modelingthe encoding algorithm in particular as a control for the validity of modelsThis is a subject of ongoing work and will not be discussed here

2 Information Conveyed by Compound Patterns

In the framework of information theory (Shannon 1948) signals are gener-ated by a source with a xed probability distribution and encoded intomessages by a channel The coding is in general probabilistic and thejoint distribution of signals and coded messages determines all quanti-ties of interest in particular the information transmitted by the channelabout the source is an average over this joint distribution In studying asensory system the signals generated by the source are the stimuli pre-sented to the animal and the messages in the communication channelare sequences of spikes in a neuron or a population of neurons Both thestimuli and the spike trains are random variables and they convey in-formation mutually because they are correlated The problem of quan-tifying this information has been discussed from several points of view(MacKay amp McCulloch 1952 Optican amp Richmond 1987 Rieke Warlandde Ruyter van Steveninck amp Bialek 1997 Strong Koberle de Ruyter vanSteveninck amp Bialek 1998) Here we address the question of how muchinformation is carried by particular ldquoeventsrdquo or combinations of actionpotentials

21 Background A discrete event E in the spike train is dened as aspecic combination of spikes Examples are a single spike a pair of spikesseparated by a given time spikes from two neurons that occur in synchronyand so on Information is carried by the occurrence of events at particulartimes and not others implying that they are correlated with some stim-ulus features and not with others Our task is to express the information

1534 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

conveyed on average by an event E in terms of quantities that are easilymeasured experimentally1

In experiments as in nature the animal is exposed to stimuli at eachinstant of time We can describe this sensory input by a function s(t0 ) whichmay have many components to parameterize the time dependence of mul-tiple stimulus features In general the information gained about s(t0 ) byobserving a set of neural responses is

I DX

responses

ZDs(t0 )P[s(t0 ) amp response] log2

iexclP[s(t0 ) amp response]P[s(t0 )]P[response]

cent (21)

where information is measured in bits It is useful to note that this mutualinformation can be rewritten in two complementary forms

I D iexclZ

Ds(t0 )P[s(t0 )] log2 P[s(t0 )]

CX

responsesP[response]

ZDs(t0 )P[s(t0 ) |response] log2 P[s(t0 )|response]

D S[P(s)] iexcl hS[P(s|response)]iresponse (22)

or

I D iexclX

responsesP(response) log2 P(response)

CZ

Ds(t0 )P[s(t0 )]X

responsesP[response|s(t0)] log2 P[response|s(t0 )]

D S[P(response)] iexcl hS[P(response |s)]is (23)

where S denotes the entropy of a distribution by hcent cent centis we mean an av-erage over all possible values of the sensory stimulus weighted by theirprobabilities of occurrence and similarly for hcent cent centiresponses In the rst formequation 22 we focus on what the responses are telling us about the sen-sory stimulus (de Ruyter van Steveninck amp Bialek 1988) different responsespoint more or less reliably to different signals and our average uncertaintyabout the sensory signal is reduced by observing the neural response In thesecond form equation 23 we focus on the variability and reproducibilityof the neural response (de Ruyter van Steveninck Lewen Strong Koberle

1 In our formulation an event E is a random variable that can have several outcomesfor example it can occur at different times in the experiment The information conveyed bythe event about the stimulus is an average over the joint distribution of stimuli and eventoutcomes One can also associate an information measure with individual occurrences ofevents (DeWeese amp Meister 1998)The averageof this measure over the possible outcomesis the mutual information as dened by Shannon (1948) and used in this article

Synergy in a Neural Code 1535

amp Bialek 1997 Strong et al 1998) The range of possible responses providesthe system with a capacity to transmit information and the variability thatremains when the sensory stimulus is specied constitutes noise the differ-ence between the capacity and the noise is the information

We would like to apply this second form to the case where the neuralresponse is a particular type of event When we observe an event E in-formation is carried by the fact that it occurs at some particular time tEThe range of possible responses is then the range of times 0 lt tE lt T inour observation window Alternatively when we observe the response in aparticular small time bin of size D t information is carried by the fact thatthe event E either occurs or does not The range of possible responses thenincludes just two possibilities Both of these points of view have an arbitraryelement the choice of bin size D t and the window size T Characterizingthe properties of the system as opposed to our observation power requirestaking the limit of high time resolution (D t 0) and long observation times(T 1) As will be shown in the next section in this limit the two pointsof view give the same answer for the information carried by an event

22 Information and Event Rates A crucial role is played by the eventrate rE(t) the probability per unit time that an event of type E occurs attime t given the stimulus history s(t0 ) Empirical construction of the eventrate rE(t) requires repetition of the same stimulus history many times sothat a histogram can be formed (see Figure 1) For the case where events aresingle spikes this is the familiar time-dependent ring rate or poststimulustime histogram (see Figure 1c) the generalization to other types of eventsis illustrated by Figures 1d and 1e Intuitively a uniform event rate impliesthat no information is transmitted whereas the presence of sharply denedfeatures in the event rate implies that much information is transmitted bythese events (see for example the discussion by Vaadia et al 1995) We nowformalize this intuition and show how the average information carried bya single event is related quantitatively to the time-dependent event rate

Let us take the rst point of view about the neural response variable inwhich the range of responses is described by the possible arrival times tof the event 2 What is the probability of nding the event at a particulartime t Before we know the stimulus history s(t0 ) all we can say is that theevent can occur anywhere in our experimental window of size T so thatthe probability is uniform P(response) D P(t) D 1 T with an entropy ofS[P(t)] D log2 T Once we know the stimulus we also know the event raterE(t) and so our uncertainty about the occurrence time of the event is re-duced Events will occur preferentially at times where the event rate is largeso the probability distribution should be proportional to rE(t) with proper

2 We assume that the experiment is long enough so that a typical event is certain tooccur at some time

1536 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 1 Generalized event rates in the stimulus-conditional response ensem-ble A time-dependent visual stimulus is shown to the y (a) with the timeaxis dened to be zero at the beginning of the stimulus This stimulus runs for10 s and is repeatedly presented 360 times The responses of the H1 neuron to60 repetitions are shown as a raster (b) in which each dot represents a singlespike From these responses time-dependent event rates rE(t) are estimated(c) the ring rate (poststimulus time histogram) (d) the rate for spike pairs withinterspike time t D 3 sect 1 ms and (e) for pairs with t D 17 sect 1 ms (e) These ratesallow us to compute directly the information transmitted by the events usingequation 25

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 3: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1533

of measuring the information (in bits) carried by particular patterns ofspikes under given stimulus conditions independent of models for thestimulus features that these patterns might represent In particular we cancompare the information conveyed by spike patterns with the informationconveyed by the individual spikes that make up the pattern and deter-mine quantitatively whether the whole is more or less than the sum of itsparts

While this method allows us to compute unambiguously how much in-formation is conveyed by the patterns it does not tell us what particularmessage these patterns convey Making the distinction between two issuesthe symbolic structure of the code and the transformation between inputsand outputs we address only the rst of these two Constructing a quanti-tative measure for the signicance of compound patterns is an essential rststep in understanding anything beyond the single spike approximation andit is especially crucial when complex multi-neuron codes are consideredSuch a quantitative measure will be useful for the next stage of modelingthe encoding algorithm in particular as a control for the validity of modelsThis is a subject of ongoing work and will not be discussed here

2 Information Conveyed by Compound Patterns

In the framework of information theory (Shannon 1948) signals are gener-ated by a source with a xed probability distribution and encoded intomessages by a channel The coding is in general probabilistic and thejoint distribution of signals and coded messages determines all quanti-ties of interest in particular the information transmitted by the channelabout the source is an average over this joint distribution In studying asensory system the signals generated by the source are the stimuli pre-sented to the animal and the messages in the communication channelare sequences of spikes in a neuron or a population of neurons Both thestimuli and the spike trains are random variables and they convey in-formation mutually because they are correlated The problem of quan-tifying this information has been discussed from several points of view(MacKay amp McCulloch 1952 Optican amp Richmond 1987 Rieke Warlandde Ruyter van Steveninck amp Bialek 1997 Strong Koberle de Ruyter vanSteveninck amp Bialek 1998) Here we address the question of how muchinformation is carried by particular ldquoeventsrdquo or combinations of actionpotentials

21 Background A discrete event E in the spike train is dened as aspecic combination of spikes Examples are a single spike a pair of spikesseparated by a given time spikes from two neurons that occur in synchronyand so on Information is carried by the occurrence of events at particulartimes and not others implying that they are correlated with some stim-ulus features and not with others Our task is to express the information

1534 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

conveyed on average by an event E in terms of quantities that are easilymeasured experimentally1

In experiments as in nature the animal is exposed to stimuli at eachinstant of time We can describe this sensory input by a function s(t0 ) whichmay have many components to parameterize the time dependence of mul-tiple stimulus features In general the information gained about s(t0 ) byobserving a set of neural responses is

I DX

responses

ZDs(t0 )P[s(t0 ) amp response] log2

iexclP[s(t0 ) amp response]P[s(t0 )]P[response]

cent (21)

where information is measured in bits It is useful to note that this mutualinformation can be rewritten in two complementary forms

I D iexclZ

Ds(t0 )P[s(t0 )] log2 P[s(t0 )]

CX

responsesP[response]

ZDs(t0 )P[s(t0 ) |response] log2 P[s(t0 )|response]

D S[P(s)] iexcl hS[P(s|response)]iresponse (22)

or

I D iexclX

responsesP(response) log2 P(response)

CZ

Ds(t0 )P[s(t0 )]X

responsesP[response|s(t0)] log2 P[response|s(t0 )]

D S[P(response)] iexcl hS[P(response |s)]is (23)

where S denotes the entropy of a distribution by hcent cent centis we mean an av-erage over all possible values of the sensory stimulus weighted by theirprobabilities of occurrence and similarly for hcent cent centiresponses In the rst formequation 22 we focus on what the responses are telling us about the sen-sory stimulus (de Ruyter van Steveninck amp Bialek 1988) different responsespoint more or less reliably to different signals and our average uncertaintyabout the sensory signal is reduced by observing the neural response In thesecond form equation 23 we focus on the variability and reproducibilityof the neural response (de Ruyter van Steveninck Lewen Strong Koberle

1 In our formulation an event E is a random variable that can have several outcomesfor example it can occur at different times in the experiment The information conveyed bythe event about the stimulus is an average over the joint distribution of stimuli and eventoutcomes One can also associate an information measure with individual occurrences ofevents (DeWeese amp Meister 1998)The averageof this measure over the possible outcomesis the mutual information as dened by Shannon (1948) and used in this article

Synergy in a Neural Code 1535

amp Bialek 1997 Strong et al 1998) The range of possible responses providesthe system with a capacity to transmit information and the variability thatremains when the sensory stimulus is specied constitutes noise the differ-ence between the capacity and the noise is the information

We would like to apply this second form to the case where the neuralresponse is a particular type of event When we observe an event E in-formation is carried by the fact that it occurs at some particular time tEThe range of possible responses is then the range of times 0 lt tE lt T inour observation window Alternatively when we observe the response in aparticular small time bin of size D t information is carried by the fact thatthe event E either occurs or does not The range of possible responses thenincludes just two possibilities Both of these points of view have an arbitraryelement the choice of bin size D t and the window size T Characterizingthe properties of the system as opposed to our observation power requirestaking the limit of high time resolution (D t 0) and long observation times(T 1) As will be shown in the next section in this limit the two pointsof view give the same answer for the information carried by an event

22 Information and Event Rates A crucial role is played by the eventrate rE(t) the probability per unit time that an event of type E occurs attime t given the stimulus history s(t0 ) Empirical construction of the eventrate rE(t) requires repetition of the same stimulus history many times sothat a histogram can be formed (see Figure 1) For the case where events aresingle spikes this is the familiar time-dependent ring rate or poststimulustime histogram (see Figure 1c) the generalization to other types of eventsis illustrated by Figures 1d and 1e Intuitively a uniform event rate impliesthat no information is transmitted whereas the presence of sharply denedfeatures in the event rate implies that much information is transmitted bythese events (see for example the discussion by Vaadia et al 1995) We nowformalize this intuition and show how the average information carried bya single event is related quantitatively to the time-dependent event rate

Let us take the rst point of view about the neural response variable inwhich the range of responses is described by the possible arrival times tof the event 2 What is the probability of nding the event at a particulartime t Before we know the stimulus history s(t0 ) all we can say is that theevent can occur anywhere in our experimental window of size T so thatthe probability is uniform P(response) D P(t) D 1 T with an entropy ofS[P(t)] D log2 T Once we know the stimulus we also know the event raterE(t) and so our uncertainty about the occurrence time of the event is re-duced Events will occur preferentially at times where the event rate is largeso the probability distribution should be proportional to rE(t) with proper

2 We assume that the experiment is long enough so that a typical event is certain tooccur at some time

1536 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 1 Generalized event rates in the stimulus-conditional response ensem-ble A time-dependent visual stimulus is shown to the y (a) with the timeaxis dened to be zero at the beginning of the stimulus This stimulus runs for10 s and is repeatedly presented 360 times The responses of the H1 neuron to60 repetitions are shown as a raster (b) in which each dot represents a singlespike From these responses time-dependent event rates rE(t) are estimated(c) the ring rate (poststimulus time histogram) (d) the rate for spike pairs withinterspike time t D 3 sect 1 ms and (e) for pairs with t D 17 sect 1 ms (e) These ratesallow us to compute directly the information transmitted by the events usingequation 25

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 4: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1534 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

conveyed on average by an event E in terms of quantities that are easilymeasured experimentally1

In experiments as in nature the animal is exposed to stimuli at eachinstant of time We can describe this sensory input by a function s(t0 ) whichmay have many components to parameterize the time dependence of mul-tiple stimulus features In general the information gained about s(t0 ) byobserving a set of neural responses is

I DX

responses

ZDs(t0 )P[s(t0 ) amp response] log2

iexclP[s(t0 ) amp response]P[s(t0 )]P[response]

cent (21)

where information is measured in bits It is useful to note that this mutualinformation can be rewritten in two complementary forms

I D iexclZ

Ds(t0 )P[s(t0 )] log2 P[s(t0 )]

CX

responsesP[response]

ZDs(t0 )P[s(t0 ) |response] log2 P[s(t0 )|response]

D S[P(s)] iexcl hS[P(s|response)]iresponse (22)

or

I D iexclX

responsesP(response) log2 P(response)

CZ

Ds(t0 )P[s(t0 )]X

responsesP[response|s(t0)] log2 P[response|s(t0 )]

D S[P(response)] iexcl hS[P(response |s)]is (23)

where S denotes the entropy of a distribution by hcent cent centis we mean an av-erage over all possible values of the sensory stimulus weighted by theirprobabilities of occurrence and similarly for hcent cent centiresponses In the rst formequation 22 we focus on what the responses are telling us about the sen-sory stimulus (de Ruyter van Steveninck amp Bialek 1988) different responsespoint more or less reliably to different signals and our average uncertaintyabout the sensory signal is reduced by observing the neural response In thesecond form equation 23 we focus on the variability and reproducibilityof the neural response (de Ruyter van Steveninck Lewen Strong Koberle

1 In our formulation an event E is a random variable that can have several outcomesfor example it can occur at different times in the experiment The information conveyed bythe event about the stimulus is an average over the joint distribution of stimuli and eventoutcomes One can also associate an information measure with individual occurrences ofevents (DeWeese amp Meister 1998)The averageof this measure over the possible outcomesis the mutual information as dened by Shannon (1948) and used in this article

Synergy in a Neural Code 1535

amp Bialek 1997 Strong et al 1998) The range of possible responses providesthe system with a capacity to transmit information and the variability thatremains when the sensory stimulus is specied constitutes noise the differ-ence between the capacity and the noise is the information

We would like to apply this second form to the case where the neuralresponse is a particular type of event When we observe an event E in-formation is carried by the fact that it occurs at some particular time tEThe range of possible responses is then the range of times 0 lt tE lt T inour observation window Alternatively when we observe the response in aparticular small time bin of size D t information is carried by the fact thatthe event E either occurs or does not The range of possible responses thenincludes just two possibilities Both of these points of view have an arbitraryelement the choice of bin size D t and the window size T Characterizingthe properties of the system as opposed to our observation power requirestaking the limit of high time resolution (D t 0) and long observation times(T 1) As will be shown in the next section in this limit the two pointsof view give the same answer for the information carried by an event

22 Information and Event Rates A crucial role is played by the eventrate rE(t) the probability per unit time that an event of type E occurs attime t given the stimulus history s(t0 ) Empirical construction of the eventrate rE(t) requires repetition of the same stimulus history many times sothat a histogram can be formed (see Figure 1) For the case where events aresingle spikes this is the familiar time-dependent ring rate or poststimulustime histogram (see Figure 1c) the generalization to other types of eventsis illustrated by Figures 1d and 1e Intuitively a uniform event rate impliesthat no information is transmitted whereas the presence of sharply denedfeatures in the event rate implies that much information is transmitted bythese events (see for example the discussion by Vaadia et al 1995) We nowformalize this intuition and show how the average information carried bya single event is related quantitatively to the time-dependent event rate

Let us take the rst point of view about the neural response variable inwhich the range of responses is described by the possible arrival times tof the event 2 What is the probability of nding the event at a particulartime t Before we know the stimulus history s(t0 ) all we can say is that theevent can occur anywhere in our experimental window of size T so thatthe probability is uniform P(response) D P(t) D 1 T with an entropy ofS[P(t)] D log2 T Once we know the stimulus we also know the event raterE(t) and so our uncertainty about the occurrence time of the event is re-duced Events will occur preferentially at times where the event rate is largeso the probability distribution should be proportional to rE(t) with proper

2 We assume that the experiment is long enough so that a typical event is certain tooccur at some time

1536 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 1 Generalized event rates in the stimulus-conditional response ensem-ble A time-dependent visual stimulus is shown to the y (a) with the timeaxis dened to be zero at the beginning of the stimulus This stimulus runs for10 s and is repeatedly presented 360 times The responses of the H1 neuron to60 repetitions are shown as a raster (b) in which each dot represents a singlespike From these responses time-dependent event rates rE(t) are estimated(c) the ring rate (poststimulus time histogram) (d) the rate for spike pairs withinterspike time t D 3 sect 1 ms and (e) for pairs with t D 17 sect 1 ms (e) These ratesallow us to compute directly the information transmitted by the events usingequation 25

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 5: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1535

amp Bialek 1997 Strong et al 1998) The range of possible responses providesthe system with a capacity to transmit information and the variability thatremains when the sensory stimulus is specied constitutes noise the differ-ence between the capacity and the noise is the information

We would like to apply this second form to the case where the neuralresponse is a particular type of event When we observe an event E in-formation is carried by the fact that it occurs at some particular time tEThe range of possible responses is then the range of times 0 lt tE lt T inour observation window Alternatively when we observe the response in aparticular small time bin of size D t information is carried by the fact thatthe event E either occurs or does not The range of possible responses thenincludes just two possibilities Both of these points of view have an arbitraryelement the choice of bin size D t and the window size T Characterizingthe properties of the system as opposed to our observation power requirestaking the limit of high time resolution (D t 0) and long observation times(T 1) As will be shown in the next section in this limit the two pointsof view give the same answer for the information carried by an event

22 Information and Event Rates A crucial role is played by the eventrate rE(t) the probability per unit time that an event of type E occurs attime t given the stimulus history s(t0 ) Empirical construction of the eventrate rE(t) requires repetition of the same stimulus history many times sothat a histogram can be formed (see Figure 1) For the case where events aresingle spikes this is the familiar time-dependent ring rate or poststimulustime histogram (see Figure 1c) the generalization to other types of eventsis illustrated by Figures 1d and 1e Intuitively a uniform event rate impliesthat no information is transmitted whereas the presence of sharply denedfeatures in the event rate implies that much information is transmitted bythese events (see for example the discussion by Vaadia et al 1995) We nowformalize this intuition and show how the average information carried bya single event is related quantitatively to the time-dependent event rate

Let us take the rst point of view about the neural response variable inwhich the range of responses is described by the possible arrival times tof the event 2 What is the probability of nding the event at a particulartime t Before we know the stimulus history s(t0 ) all we can say is that theevent can occur anywhere in our experimental window of size T so thatthe probability is uniform P(response) D P(t) D 1 T with an entropy ofS[P(t)] D log2 T Once we know the stimulus we also know the event raterE(t) and so our uncertainty about the occurrence time of the event is re-duced Events will occur preferentially at times where the event rate is largeso the probability distribution should be proportional to rE(t) with proper

2 We assume that the experiment is long enough so that a typical event is certain tooccur at some time

1536 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 1 Generalized event rates in the stimulus-conditional response ensem-ble A time-dependent visual stimulus is shown to the y (a) with the timeaxis dened to be zero at the beginning of the stimulus This stimulus runs for10 s and is repeatedly presented 360 times The responses of the H1 neuron to60 repetitions are shown as a raster (b) in which each dot represents a singlespike From these responses time-dependent event rates rE(t) are estimated(c) the ring rate (poststimulus time histogram) (d) the rate for spike pairs withinterspike time t D 3 sect 1 ms and (e) for pairs with t D 17 sect 1 ms (e) These ratesallow us to compute directly the information transmitted by the events usingequation 25

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 6: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1536 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 1 Generalized event rates in the stimulus-conditional response ensem-ble A time-dependent visual stimulus is shown to the y (a) with the timeaxis dened to be zero at the beginning of the stimulus This stimulus runs for10 s and is repeatedly presented 360 times The responses of the H1 neuron to60 repetitions are shown as a raster (b) in which each dot represents a singlespike From these responses time-dependent event rates rE(t) are estimated(c) the ring rate (poststimulus time histogram) (d) the rate for spike pairs withinterspike time t D 3 sect 1 ms and (e) for pairs with t D 17 sect 1 ms (e) These ratesallow us to compute directly the information transmitted by the events usingequation 25

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 7: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1537

normalization P(response |s) D P(t|s) D rE(t) (TNrE) Then the conditionalentropy is

S[P(t|s)] D iexclZ T

0dt P(t|s) log2 P(t|s)

D iexcl1T

Z T

0dt

rE(t)NrE

log2

iexclrE(t)NrET

cent (24)

In principleone should average this quantity over the distribution of stimulis(t0 ) however if the time T is long enough and the stimulus history suf-ciently rich the ensemble average and the time average are equivalent Thevalidity of this assumption can be checked experimentally (see Figure 2c)The reduction in entropy is then the gain in information so

I(EI s) D S[P(t)] iexcl S[P(t|s)]

D1T

Z T

0dt

iexclrE(t)NrE

centlog2

iexclrE(t)NrE

cent (25)

or equivalently

I(EI s) Dfrac12 iexclrE(t)

NrE

centlog2

iexclrE(t)NrE

centfrac34

s (26)

where here the average is over all possible value of s weighted by theirprobabilities P(s)

In the second view the neural response is a binary random variablesE 2 f0 1g marking the occurrence or nonoccurrence of an event of type E ina small time bin of size D t Suppose for simplicity that the stimulus takes ona nite set of values s with probabilities P(s) These in turn induce the event Ewith probability pE(s) D P(sE D 1 |s) D rE(s)D t with an average probabilityfor the occurrenceof the event NpE D P

s P(s) rE(s)D t D NrED t The informationis the difference between the prior entropy and the conditional entropyI(EI s) D S(s) iexcl hS(s |sE)i where the conditional entropy is an average overthe two possible values of sE The conditional probabilities are found fromBayesrsquo rule

P(s |sE D 1) DP(s)pE(s)

NpE

P(s |sE D 0) DP(s)(1 iexcl pE(s))

(1 iexcl NpE) (27)

and with these one nds the information

I(EI s) D iexclX

sP(s) log2 P(s) C

X

sED01P(s|sE) log2 P(s |sE)

DX

sP(s)

micropE(s) log2

iexclpE(s)

NpE

centC (1 iexcl pE(s)) log2

iexcl1 iexcl pE(s)1 iexcl NpE

centpara (28)

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 8: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1538 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Taking the limit D t 0 consistent with the requirement that the eventcan occur at most once one nds the average information conveyed in asmall time bin dividing by the average probability of an event one obtainsequation 26 as the information per event

Equation 26 and its time-averaged form equation 25 are exact formulasthat are valid in any situation where a rich stimulus history can be presented

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 9: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1539

repeatedly It enables the evaluation of the information for arbitrarily com-plex events independent of assumptions about the encoding algorithmThe information is an average over the joint distribution of stimuli and re-sponses But rather than constructing this joint distribution explicitly andthen averaging equation 25 expresses the information as a direct empiricalaverage by estimating the function rE(t) as a histogram we are sampling thedistribution of the responses given a stimulus whereas by integrating overtime we are sampling the distribution of stimuli This formula is generaland can be applied for different systems under different experimental con-ditions The numerical result (measured in bits) will of course depend onthe neural system as well as on the properties of the stimulus distributionThe error bars on the measurement of the information are affected by theniteness of the data for example sampling must be sufcient to constructthe rates rE(t) reliably These and other practical issues in using equation 25are illustrated in detail in Figure 2

23 The Special Case of Single Spikes Let us consider in more detailthe simple case where events are single spikes The average informationconveyed by a single spike becomes an integral over the time-dependentspike rate r(t)

I(1 spikeI s) D1T

Z T

0dt

iexclr(t)Nrcent

log2

iexclr(t)Nrcent

(29)

It makes sense that the information carried by single spikes should be re-lated to the spike rate since this rate as a function of time gives a complete

Figure 2 Facing page Finite size effects in the estimation of the informationconveyed by single spikes (a) Information as a function of the bin size D t usedfor computing the time-dependent rate r(t) from all 360 repetitions (circles) andfrom 100 of the repetitions (crosses) A linear extrapolation to the limit D t 0 isshown for the case where all repetitions were used (solid line) (b) Informationas a function of the inverse number of repetitions N for a xed bin size D t D2 ms (c) Systematic errors due to nite duration of the repeated stimulus s(t)The full 10-second length of the stimulus was subdivided into segments ofduration T Using equation 29 the information was calculated for each segmentand plotted as a function of 1 T (circles) If the stimulus is sufciently long thatan ensemble average is well approximated by a time average the convergenceof the information to a stable value as T 1 should be observable Furtherthe standard deviation of the information measured from different segments oflength T (error bars) should decrease as a square root law s 1

pT (dashed

line) These data provide an empirical verication that for the distribution usedin this experiment the repeated stimulus time is long enough to approximateergodic sampling

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 10: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1540 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

description of the ldquoone-bodyrdquo statistics of the spike train in the same waythat the single particle density describes the one-body statistics of a gasor liquid This is true independent of any models or assumptions and wesuggest the use of equations 29 and 25 to test directly in any given experi-mental situation whether many body effects increase or reduce informationtransmission

We note that considerable attention has been given to the problem ofmaking accurate statistical models for spike trains In particular the sim-plest modelmdashthe modulated Poisson processmdashis often used as a standardagainst which real neurons are compared It is tempting to think that Pois-son behavior is equivalent to independent transmission of information bysuccessive spikes but this is not the case (see below) Of course no real neu-ron is precisely Poisson and it is not clear which deviations from Poissonbehavior are signicant Rather than trying to t statistical models to thedata and then to try to understand the signicance of spike train statistics forneural coding we see that it is possible to quantify directly the informationcarried by single spikes and compound patterns Thus without referenceto statistical models we can determine whether successive spikes carry in-dependent redundant or synergistic information

Several previous works have noted the relation between spike rate andinformation and the formula in equation 29 has an interesting historyFor a spike train generated by a modulated Poisson process equation 29provides an upper bound on information transmission (per spike) by thespike train as a whole (Bialek 1990) Even in this simple case spikes do notnecessarily carry independent information slow modulations in the ringrate can cause them to be redundant In studying the coding of location bycells in the rat hippocampus Skaggs McNaughton Gothard and Markus(1993) assumed that successive spikes carried independent informationand that the spike rate was determined by the instantaneous location Theyobtained equation 29 with the time average replaced by an average overlocations DeWeese (1996) showed that the rate of information transmissionby a spike train could be expanded in a series of integrals over correlationfunctions where successive terms would be small if the number of spikesper correlation time were small the leading term which would be exact ifspikes were uncorrelated is equation 29 Panzeri Biella Rolls Skaggs andTreves (1996) show that equation 29 multiplied by the mean spike rate togive an information rate (bits per second) is the correct information rateif we count spikes in very brief segments of the neural response which isequivalent to asking for the information carried by single spikes For furtherdiscussion of the relation to previous work see appendix A

A crucial point here is the generalization to equation 25 and this re-sult applies to the information content of any point events in the neuralresponsemdashpairs of spikes with a certain separation coincident spikes fromtwo cells and so onmdashnot just single spikes In the analysis of experimentswe will emphasize the use of this formula as an exact result for the infor-

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 11: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1541

mation content of single events rather than an approximate result for thespike train as a whole and this approach will enable us to address questionsconcerning the structure of the code and the role played by various typesof events

3 Experiments in the Fly Visual System

In this section we use our formalism to analyze experiments on the move-ment-sensitive cell H1 in the visual system of the blowy Calliphora vicinaWe address the issue of the information conveyed by pairs of spikes in thisneuron as compared to the information conveyed independently by sin-gle spikes The quantitative results of this sectionmdashinformation content inbits effects of synergy and redundancy among spikesmdashare specic to thissystem and to the stimulus conditions used The theoretical approach how-ever is valid generally and can be applied similarly to other experimentalsystems to nd out the signicance of various patterns in single cells or ina population

31 Synergy Between Spikes In the experiment (see appendix B fordetails) the horizontal motion across the visual eld is the input sensorystimulus s(t) drawn from a probability distribution P[s(t)] The spike trainrecorded from H1 is the neural response Figure 1a shows a segment of thestimulus presented to the y and Figure 1b illustrates the response to manyrepeated presentations of this segment The histogram of spike times acrossthe ensemble of repetitions provides an estimate of the spike rate r(t) (seeFigure 1c) and equation 25 gives the information carried by a single spikeI(1 spikeI s) D 153 sect 005 bits Figure 2 illustrates the details of how theformula was used with an emphasis on the effects of niteness of the dataIn this experiment a stimulus of length T D 10 sec was repeated 360 timesAs seen from Figure 2 the calculation converges to a stable result withinour nite data set

If each spike were to convey information independently then with themean spike rate Nr D 37 spikes per second the total information rate wouldbe Rinfo D 56 bits per second We used the variability and reproducibilityof continuous segments in the neural response (de Ruyter van Stevenincket al 1997 Strong et al 1998) to estimate the total information rate in thespike train in this experiment and found that Rinfo D 75 bits per secondThus the information conveyed by the spike train as a whole is larger thanthe sum of contributions from individual spikes indicating cooperativeinformation transmission by patterns of spikes in time This synergy amongspikes motivates the search for especially informative patterns in the spiketrain

We consider compound events that consist of two spikes separated bya time t with no constraints on what happens between them Figure 1shows segments of the event rate rt (t) for t D 3 (sect1) ms (see Figure 1d)

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 12: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1542 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 3 Information about the signal transmitted by pairs of spikes computedfrom equation 25 as a function of the time separation between the two spikesThe dotted line shows the information that would be transmitted by the twospikes independently (twice the single-spike information)

and for t D 17 (sect1) ms (see Figure 1e) The information carried by spikepairs as a function of the interspike time t computed from equation 25 isshown in Figure 3 For large t spikes contribute independent informationas expected This independence is established within approximately 30 to40 ms comparable to the behavioral response times of the y (Land ampCollett 1974) There is a mild redundancy (about 10ndash20) at intermediateseparations and a very large synergy (up to about 130) at small t

Related results were obtained using the correlation of spike patterns withstimulus features (de Ruyter van Steveninck amp Bialek 1988) There the infor-mation carried by spikepatterns was estimated from the distribution of stim-uli given each pattern thus constructing a statistical model of what the pat-terns ldquostand forrdquo (see the details in appendix A) Since the time-dependentstimulus is in general of high dimensionality its distribution cannot besampled directly and some approximations must be made de Ruyter vanSteveninck and Bialek (1988) made the approximation that patterns of a fewspikes encode projections of the stimulus onto low-dimensional subspacesand further that the conditional distributions P[s(t)|E] are gaussian The in-formation obtained with these approximations provides a lower bound tothe true information carried by the patterns as estimated directly with themethods presented here

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 13: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1543

32 Origins of Synergy Synergy means quite literally that two spikestogether tell us more than two spikes separately Synergistic coding is oftendiscussed for populations of cells where extra information is conveyedby patterns of coincident spikes from several neurons (Abeles BergmannMargalit amp Vaadia 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)Here we see direct evidence for extra information in pairs of spikes acrosstime The mathematical framework for describing these effects is the sameand a natural question is What are the conditions for synergistic coding

The average synergy Syn[E1 E2I s] between two events E1 and E2 is thedifference between the information about the stimulus s conveyed by thepair and the information conveyed by the two events independently

Syn[E1 E2I s] D I[E1 E2I s] iexcl (I[E1I s] C I[E2I s]) (31)

We can rewrite the synergy as

Syn[E1 E2I s] D I[E1I E2 |s] iexcl I[E1I E2] (32)

The rst term is the mutual information between the events computed acrossan ensemble of repeated presentations of the same stimulus history It de-scribes the gain in information due to the locking of compound event (E1 E2)to particular stimulus features If events E1 and E2 are correlated individ-ually with the stimulus but not with one another this term will be zeroand these events cannot be synergistic on average The second term is themutual information between events when the stimulus is not constrainedor equivalently the predictability of event E2 from E1 This predictabilitylimits the capacity of E2 to carry information beyond that already conveyedby E1 Synergistic coding (Syn gt 0) thus requires that the mutual informa-tion among the spikes is increased by specifying the stimulus which makesprecise the intuitive idea of ldquostimulus-dependent correlationsrdquo (Abeles etal 1993 Hopeld 1995 Meister 1995 Singer amp Gray 1995)

Returning to our experimental example we identify the events E1 and E2as the arrivals of two spikes and consider the synergy as a function of thetime t between them In terms of event rates we compute the informationcarried by a pair of spikes separated by a time t equation 25 as well as theinformation carried by two individual spikes The difference between thesetwo quantities is the synergy between two spikes which can be written as

Syn(t ) D iexcl log2

iexclNrt

Nr2

centC

1T

Z T

0dt

rt (t)Nrt

log2

micro rt (t)r(t)r(t iexcl t )

para

C1T

Z T

0dt

micro rt (t)

Nrt

Crt (t C t )

Nrt

iexcl 2r(t)

Nr

paralog2[r(t)] (33)

The rst term in this equation is the logarithm of the normalized correlationfunction and hence measures the rarity of spike pairs with separation t

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 14: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1544 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

the average of this term over t is the mutual information between events(the second term in equation 32) The second term is related to the localcorrelation function and measures the extent to which the stimulus mod-ulates the likelihood of spike pairs The average of this term over t givesthe mutual information conditional on knowledge of the stimulus (the rstterm in equation 32) The average of the third term over t is zero and nu-merical evaluation of this term from the data shows that it is negligible atmost values of t

We thus nd that the synergy between spikes is approximately a sumof two terms whose averages over t are the terms in equation 32 A spikepair with a separation t then has two types of contributions to the extrainformation it carries the two spikes can be correlated conditional on thestimulus or the pair could be a rare and thus surprising event The rarityof brief pairs is related to neural refractoriness but this effect alone is in-sufcient to enhance information transmission the rare events must alsobe related reliably to the stimulus In fact conditional on the stimulus thespikes in rare pairs are strongly correlated with each other and this is visiblein Figure 1a from trial to trial adjacent spikes jitter together as if connectedby a stiff spring To quantify this effect we nd for each spike in one trialthe closest spike in successive trials and measure the variance of the arrivaltimes of these spikes Similarly we measure the variance of the interspiketimes Figure 4a shows the ratio of the interspike time variance to the sum ofthe arrival time variances of the spikes that make up the pair For large sep-arations this ratio is unity as expected if spikes are locked independentlyto the stimulus but as the two spikes come closer it falls below one-quarter

Both the conditional correlation among the members of the pair (see Fig-ure 4a) and the relative synergy (see Figure 4b) depend strongly on the inter-spike separation This dependence is nearly invariant to changes in imagecontrast although the spike rate and other statistical properties are stronglyaffected by such changes Brief spike pairs seem to retain their identity asspecially informative symbols over a range of input ensembles If particulartemporal patterns are especially informative then we would lose informa-tion if we failed to distinguish among different patterns Thus there are twonotions of time resolution for spike pairs the time resolution with which theinterspike time is dened and the absolute time resolution with which theevent is marked Figure 5 shows that for small interspike times the infor-mation is much more sensitive to changes in the interspike time resolution(open symbols) than to the absolute time resolution (lled symbols) This isrelated to the slope in Figure 2 in regions where the slope is large eventsshould be nely distinguished in order to retain the information

33 Implications of Synergy The importance of spike timing in the neu-ral code has been under debate for some time now We believe that someissues in this debate can be claried using a direct information-theoretic ap-proach Following MacKay and McCulloch (1952) we know that marking

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 15: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1545

Figure 4 (a) Ratio between the variance of interspike time and the sum of vari-ances of the two spike times Variances are measured across repeated presenta-tions of same stimulus as explained in the text This ratio is plotted as a func-tion of the interspike time t for two experiments with different image contrast(b) Extra information conveyed cooperatively by pairs of spikes expressed as afraction of the information conveyed by the two spikes independently While thesingle-spike information varies with contrast (15 bits per spike for C D 01 com-pared to 13 bits per spike for C D 1) the fractional synergy is almost contrastindependent

spike arrival times with higher resolution provides an increased capacityfor information transmission The work of Strong et al (1998) shows thatfor the yrsquos H1 neuron the increased capacity associated with spike timingindeed is used with nearly constant efciency down to millisecond resolu-tion This efciency can be the result of a tight locking of individual spikesto a rapidly varying stimulus and it could also be the result of temporalpatterns providing information beyond rapid rate modulations The anal-

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 16: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1546 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Figure 5 Information conveyed by spike pairs as a function of time resolutionAn eventmdashpair of spikesmdashcan be described by two times the separation be-tween spikes (relative time) and the occurrence time of the event with respectto the stimulus (absolute time) The information carried by the pair dependson the time resolution in these two dimensions both specied by the bin sizeD t along the abscissa Open symbols are measurements of the information fora xed 2 ms resolution of absolute time and a variable resolution D t of relativetime Closed symbols correspond to a xed 2 ms resolution of relative time anda variable resolution D t of absolute time For short intervals the sensitivity tocoarsening of the relative time resolution is much greater than to coarsening ofthe absolute time resolution In contrast sensitivity to relative and absolute timeresolutions is the same for the longer nonsynergistic interspike separations

ysis given here shows that for H1 pairs of spikes can provide much moreinformation than two individual spikes and information transmission ismuch more sensitive to the relative timing of spikes than to their absolutetiming The synergy is an inherent property of particular spike pairs as itpersists when averaged over all occurrences of the pair in the experiment

One may ask about the implication of synergy to behavior Flies canrespond to changes in a motion stimulus over a time scale of approximately30 to 40 ms (Land amp Collett 1974) We have found that the spike train of H1as a whole conveyed about 20 bits per second more than the informationconveyed independently by single spikes (see Section 31) This implies thatover a behavioral response time synergistic effects provide about 08 bitof additional information equivalent to almost a factor of two in resolvingpower for distinguishing different trajectories of motion across the visualeld

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 17: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1547

4 Summary

Information theory allows us to measure synergy and redundancy amongspikes independent of the rules for translating between spikes and stimuliIn particular this approach tests directly the idea that patterns of spikesare special events in the code carrying more information than expected byadding the contributions from individual spikes It is of practical impor-tance that the formulas rely on low-order statistical measures of the neuralresponse and hence do not require enormous data sets to reach meaningfulconclusions The method is of general validity and is applicable to patternsof spikes across a population of neurons as well as across time Specicresults (existence of synergy or redundancy number of bits per spike etc)depend on the neural system as well as on the stimulus ensemble used ineach experiment

In our experiments on the y visual system we found that an eventcomposed of a pair of spikes can carry far more than the information carriedindependently by its parts Two spikes that occur in rapid succession appearto be special symbols that have an integrity beyond the locking of individualspikes to the stimulus This is analogous to the encoding of sounds in writtenEnglish each of the compound symbols th sh and ch stands for sounds thatare not decomposable into sounds represented by each of the constituentletters For spike pairs to act effectively as special symbols mechanismsfor ldquoreadingrdquo them must exist at subsequent levels of processing Synaptictransmission is sensitive to interspike times in the 2ndash20 ms range (Magelby1987) and it is natural to suggest that synaptic mechanisms on this timescaleplay a role in such reading Recent work on the mammalian visual system(Usrey Reppas amp Reid 1998) provides direct evidence that pairs of spikesclose together in time can be especially efcient in driving postsynapticneurons

Appendix A Relation to Previous Work

Patterns of spikes and their relation to sensory stimuli have been quantiedin the past using correlation functions The event rates that we have denedhere which are connected directly to the information carried by patternsof spikes through equation 25 are just properly normalized correlationfunctions The event rate for pairs of spikes from two separate neurons isrelated to the joint poststimulus time histogram (PSTH)dened by AertsenGerstein Habib and Palm (1989 Vaadia et al 1995) Making this connectionexplicit is also an opportunity to see how the present formalism applies toevents dened across two cells

Consider two cells A and B generating spikes at times ftAi g and ftB

i grespectively It will be useful to think of the spike trains as sums of unit

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 18: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1548 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

impulses at the spike times

rA(t) D

X

id(t iexcl tA

i ) (A1)

rB(t) D

X

id(t iexcl tB

i ) (A2)

Then the time-dependent spike rates for the two cells are

rA(t) D hr A(t)itrials (A3)

rB(t) D hr B(t)itrials (A4)

where hcent cent centitrials denotes an average over multiple trials in which the sametime-dependent stimulus s(t0 ) or the same motor behavior observed Thesespike rates are the probabilities per unit time for the occurrence of a singlespike in either cell A or cell B also called the PSTH We can dene theprobability per unit time for a spike in cell A to occur at time t and a spikein cell B to occur at time t0 and this will be the joint PSTH

JPSTHAB(t t0 ) D hr A(t)r B(t0 )itrials (A5)

Alternatively we can consider an event E dened by a spike in cell A attime t and a spike in cell B at time t iexcl t with the relative time t measuredwith a precision of D t Then the rate of these events is

rE(t) DZ D t 2

iexclD t 2dt0 JPSTHAB(t t iexcl t C t0 ) (A6)

frac14 D tJPSTHAB(t t iexcl t ) (A7)

where the last approximation is valid if our time resolution is sufcientlyhigh Applying our general formula for the information carried by singleevents equation 25 the information carried by pairs of spikes from two cellscan be written as an integral over diagonal ldquostripsrdquo of the JPSTH matrix

I(EIs) D1T

Z T

0dt

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )itlog2

JPSTHAB(t t iexcl t )

hJPSTHAB(t t iexcl t )it

(A8)

where hJPSTHAB(t tiexclt )it is an average of the JPSTH over time this averageis equivalent to the standard correlation function between the two spiketrains

The discussion by Vaadia et al (1995) emphasizes that modulations ofthe JPSTH along the diagonal strips allow correlated ring events to conveyinformation about sensory signals or behavioral states and this information

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 19: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1549

is quantied by equation A8 The information carried by the individualcells is related to the corresponding integrals over spike rates equation 29The difference between the the information conveyed by the compoundspiking events E and the information conveyed by spikes in the two cellsindependently is precisely the synergy between the two cells at the giventime lag t For t D 0 it is the synergymdashor extra informationmdashconveyed bysynchronous ring of the two cells

We would like to connect this approach with previous work that focusedon how events reduce our uncertainty about the stimulus (de Ruyter vanSteveninck amp Bialek 1988) Before we observe the neural response all weknow is that stimuli are chosen from a distribution P[s(t0 )] When we ob-serve an event E at time tE this should tell us something about the stimulusin the neighborhood of this time and this knowledge is described by theconditional distribution P[s(t0 )|tE] If we go back to the denition of the mu-tual information between responses and stimuli we can write the averageinformation conveyed by one event in terms of this conditional distribution

I(EI s) DZ

Ds(t0 )Z

dtEP[s(t0 ) tE] log2

iexcl P[s(t0 ) tE]P[s(t0 )]P[tE]

cent

DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A9)

If the system is stationary then the coding should be invariant under timetranslations

P[s(t0 ) |tE] D P[s(t0 C D t0 ) |tE C D t0] (A10)

This invariance means that the integral over stimuli in equation A9 is inde-pendent of the event arrival time tE so we can simplify our expression forthe information carried by a single event

I(EI s) DZ

dtEP[tE]Z

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent

DZ

Ds(t0 )P[s(t0 ) |tE] log2

iexclP[s(t0 ) |tE]P[s(t0 )]

cent (A11)

This formula was used by de Ruyter van Steveninck and Bialek (1988) Toconnect with our work here we express the information in equation A11as an average over the stimulus

I(EI s) Dfrac12 iexclP[s(t0) |tE]

P[s(t0 )]

centlog2

iexclP[s(t0 )|tE]P[s(t0 )]

centfrac34

s (A12)

Using Bayesrsquo rule

P[s(t0 ) |tE]P[s(t0 )]

DP[tE |s(t0 )]

P[tE]D

rE(tE)

NrE (A13)

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 20: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1550 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

where the last term is a result of the distributions of event arrival times beingproportional to the event rates as dened above Substituting this back toequation A11 one nds equation 26

Appendix B Experimental Setup

In the experiment we used a female blowy which was a rst-generationoffspring of a wild y caught outside The y was put inside a plastic tubeand immobilized with wax with the head protruding out The probosciswas left free so that the y could be fed regularly with some sugar waterA small hole was cut in the back of the head close to the midline on theright side Through this hole a tungsten electrode was advanced into thelobula plate This area which is several layers back from the compound eyeincludes a group of largemotion-detector neurons with wide receptive eldsand strong direction selectivity We recorded spikes extracellularly from oneof these the contralateral H1 neuron Franceschini Riehle amp le Nestour1989 Hausen amp Egelhaaf 1989) The electrode was positioned such thatspikes from H1 could be discriminated reliably and converted into pulsesby a simple threshold discriminator The pulses were fed into a CED 1401interface (Cambridge Electronic Design) which digitized the spikes at 10 m sresolution To keep exact synchrony over the duration of the experimentthe spike timing clock was derived from the same internal CED 1401 clockthat dened the frame times of the visual stimulus

The stimulus was a rigidly moving bar pattern displayed on a Tektronix608 high-brightness display The radiance at average intensity NI was about180 mW(m2 pound sr) which amounts to about 5 pound104 effectively transducedphotons per photoreceptor per second (Dubs Laughlin amp Srinivasan 1984)The bars were oriented vertically with intensities chosen at random to beNI(1 sect C) where C is the contrast The distance between the y and the screenwas adjusted so that the angular subtense of a bar equaled the horizontalinterommatidial angle in the stimulated part of the compound eye This set-ting was found by determining the eyersquos spatial Nyquist frequency throughthe reverse reaction (Gotz 1964) in the response of the H1 neuron For thisy the horizontal interommatidial angle was 145 degrees and the distanceto the screen 105 mm The y viewed the display through a round 80 mmdiameter diaphragm showing approximately 30 bars From this we esti-mate the number of stimulated ommatidia in the eyersquos hexagonal raster tobe about 612

Frames of the stimulus pattern were refreshed every 2 ms and with eachnew frame the pattern was displayed at a new position This resulted in anapparent horizontal motion of the bar pattern which is suitable to excite theH1 neuron The pattern position was dened by a pseudorandom sequencesimulating independent random displacements at each time step uniformlydistributed between iexcl047 degrees and C047 degrees (equivalent to iexcl032to C032 omm horizontal ommatidial spacings) This corresponds to a dif-

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 21: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

Synergy in a Neural Code 1551

fusion constant of 181(plusmn)2 per second or 86 omm2 per second The sequenceof pseudorandom numbers contained a repeating part and a nonrepeatingpart each 10 seconds long with the same statistical parameters Thus ineach 20 second cycle the y saw a 10 second movie that it had seen 20seconds before followed by a 10 second movie that was generated inde-pendently

Acknowledgments

We thank G Lewen and A Schweitzer for their help with the experimentsand N Tishby for many helpful discussions Work at the IAS was supportedin part by DOE grant DEndashFG02ndash90ER40542 and work at the IFSC was sup-ported by the Brazilian agencies FAPESP and CNPq

References

Abeles M Bergmann H Margalit E amp Vaadia E (1993) Spatiotemporalring patterns in the frontal cortex of behaving monkeys J Neurophysiol 701629ndash1638

Adrian E (1928) The basis of sensation London ChristophersAertsen A Gerstein G HabibM amp Palm G (1989)Dynamics of neuronal r-

ing correlation modulation of ldquoeffective connectivityrdquo J Neurophysiol 61(5)900ndash917

Bialek W (1990) Theoretical physics meets experimental neurobiology In EJen (Ed) 1989 Lectures in Complex Systems (pp 513ndash595) Menlo Park CAAddison-Wesley

de Ruyter van Steveninck R amp Bialek W (1988) Real-time performance ofa movement-sensitive neuron in the blowy visual system coding and in-formation transfer in short spike sequences Proc R Soc Lond Ser B 234379

de Ruyter van Steveninck R R Lewen G Strong S Koberle R amp BialekW (1997)Reproducibility and variability in neural spike trains Science 275275

DeWeese M (1996) Optimization principles for the neural code Network 7325ndash331

DeWeese M amp Meister M (1998) How to measure information gained fromone symbol APS Abstracts

Dubs A Laughlin S amp Srinivasan M (1984)Single photon signals in y pho-toreceptors and rst order interneurones at behavioural threshold J Physiol317 317ndash334

Franceschini N Riehle A amp le Nestour A (1989) Directionally selective mo-tion detection by insect neurons In D Stavenga amp R Hardie (Eds) Facets ofVision page 360 Berlin Springer-Verlag

Gotz K (1964) Optomotorische Untersuchung des Visuellen Systems einigerAugenmutanten der Fruchtiege Drosophila Kybernetik 2 77ndash92

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999

Page 22: Synergy in a Neural Code - Princeton Universitywbialek/our_papers/brenner+al_00b.pdfSynergy in a Neural Code 1533 of measuring the information (in bits) carried by particular patterns

1552 Brenner Strong Koberle Bialek and de Ruyter van Steveninck

Hausen K amp Egelhaaf M (1989)Neural mechanisms of visual course control ininsects In D Stavenga amp R Hardie (Eds)Facets of Vision page 391 Springer-Verlag

Hopeld J J (1995)Pattern recognitin computation using action potential tim-ing for stimulus representation Nature 376 33ndash36

Land M F amp Collett T S (1974)Chasing behavior of houseies (Fannia canic-ulaus) A description and analysis J Comp Physiol 89 331ndash357

MacKay D amp McCulloch W S (1952) The limiting information capacity of aneuronal link Bull Math Biophys 14 127ndash135

Magelby K (1987) Short-term synaptic plasticity In G M Edelman V E Gallamp K M Cowan (Eds) Synaptic function (pp 21ndash56) New York Wiley

Meister M (1995)Multineuronal codes in retinal signaling Proc Nat Acad Sci(USA) 93 609ndash614

Optican L M amp Richmond B J (1987)Temporal encodingof two-dimensionalpatterns by single units in primate inferior temporal cortex III Informationtheoretic analysis J Neurophys 57 162ndash178

Panzeri S Biella G Rolls E T Skaggs W E amp Treves A (1996) Speednoise information and the graded nature of neuronal responses Network 7365ndash370

Rieke F Warland D de Ruyter van Steveninck R amp Bialek W (1997)SpikesExploring the neural code Cambridge MA MIT Press

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJ 27 379ndash423 623ndash656

Singer W amp Gray C M (1995) Visual feature integration and the temporalcorrelation hypothesis Ann Rev Neurosci 555ndash586

Skaggs W E McNaughton B L Gothard K M amp Markus E J (1993) Aninformation-theoretic approach to deciphering the hippocampal code InS J Hanson J D Cowan amp C L Giles (Eds) Advances in neural informa-tion processing 5 (pp 1030ndash1037) San Mateo CA Morgan Kaufmann

Strong S P Koberle R de Ruyter van Steveninck R R amp Bialek W (1998)Entropy and information in neural spike trains Phys Rev Lett 80 197ndash200

Usrey W M Reppas J B amp Reid R C (1998) Paired-spike interactions andsynaptic efciency of retinal inputs to the thalamus Nature 395 384

Vaadia E Haalman I Abeles M Bergman H Prut Y Slovin H amp AertsenA (1995) Dynamics of neural interactions in monkey cortex in relation tobehavioural events Nature 373 515ndash518

Received February 22 1999 accepted May 20 1999