Decoding Poisson Spike Trains by Gaussian Filtering - SNLsidney/Papers/LehkyGaussianSmoothing2010.pdf · LETTER Communicated by Paul Tiesinga Decoding Poisson Spike Trains by Gaussian

LETTER Communicated by Paul Tiesinga

Decoding Poisson Spike Trains by Gaussian Filtering

Sidney R. [email protected] Neuroscience Lab, Salk Institute, La Jolla, CA 92037, U.S.A.

The temporal waveform of neural activity is commonly estimated bylow-pass filtering spike train data through convolution with a gaussiankernel. However, the criteria for selecting the gaussian width σ are notwell understood. Given an ensemble of Poisson spike trains generatedby an instantaneous firing rate function λ(t), the problem was to recoveran optimal estimate of λ(t) by gaussian filtering. We provide equationsdescribing the optimal value of σ using an error minimization criterionand examine how the optimal σ varies within a parameter space definingthe statistics of inhomogeneous Poisson spike trains. The process wasstudied both analytically and through simulations. The rate functionsλ(t) were randomly generated, with the three parameters defining spikestatistics being the mean of λ(t), the variance of λ(t), and the exponentα of the Fourier amplitude spectrum 1/f α of λ(t). The value of σopt fol-lowed a power law as a function of the pooled mean interspike interval I ,σopt = aI b, where a was inversely related to the coefficient of variation CV

of λ(t), and b was inversely related to the Fourier spectrum exponent α.Besides applications for data analysis, optimal recovery of an analog sig-nal waveform λ(t) from spike trains may also be useful in understandingneural signal processing in vivo.

1 Introduction

Spike trains act as carriers by which information is transmitted from neuronto neuron. They typically form the raw data from perceptual, cognitive, andmotor experiments using microelectrode techniques. Given a spike train,we would like to decode it by extracting some parameter that expressesits functional content in a more explicit form. Although what aspect of thespike train is the relevant information carrier is a matter of debate, we shallconsider here estimation of the instantaneous spike rate.

We treat spike trains as being generated by an inhomogeneous Poissonprocess, in which the spike rate varies as a function of time in accord withan instantaneous firing rate function λ(t). An example firing rate function isshown in Figure 1, together with a set of Poisson spike trains generated bythat same rate function. The problem is to invert this process and estimateλ(t) given a sample of spike trains. If one normalizes the area under λ(t) to

Neural Computation 22, 1245–1271 (2010) C© 2009 Massachusetts Institute of Technology

1246 S. Lehky

0 400 800

4

8

12

16

time (ms)

tria

l

b.

0 400 8000

20

40

60

time (ms)

spik

es/

sec

a.

Figure 1: (a) Example of a spike rate function λ(t). (b) Set of 16 Poisson spiketrains generated by λ(t).

one, it becomes a probability density function for the occurrence of a spikeat time t. Hence, instantaneous firing rates and probabilistic spike timingsare the same thing aside from a multiplicative constant. Here we discussthe problem using the firing rate formalism.

The temporal waveform of the instantaneous firing rate obviously con-veys more information than average firing rate calculated over an extendedperiod (Rieke, Warland, de Ruyter van Steveninck, & Bialek, 1997), withthe amount of information dependent on the autocorrelation of λ(t) (or,equivalently, on the frequency bandwidth of λ(t)). The extent to which thisadditional information is actually used is uncertain. Possibly the overalltemporal pattern of the response may be functionally significant, or it maybe just some particular aspect of the temporal pattern, such as neural la-tency or peak response time. In any case, when analyzing spike train data,one would often like to have an accurate estimate of the response at a highertemporal resolution than provided by the average response. Once the tem-poral aspect of the neural response is more accurately characterized, onemay be in a better position to interpret that response in terms of the sensorystimuli or the behavioral activity producing it.

A common approach for estimating the time course of neural responsesfrom spike trains is to construct a peristimulus time histogram (PSTH). Arelated but superior method that avoids the binning artifacts of histogramsis kernel smoothing (Bowman & Azzalini, 1997; Silverman, 1986; Wand& Jones, 1995), in which the spike train is convolved with some kernelfunction. For a gaussian kernel, for example, this has the effect of replacingeach spike (idealized as an instantaneous pulse) by a gaussian curve havingunit area. The set of gaussians is then summed to produce the estimate ofλ(t) (see Figure 2). This kernel intensity estimate (instantaneous firing rate)becomes a kernel density estimate (probabilistic spike timing) when thearea under the summed curve is normalized.

Kernel smoothing is the approach used in this study. Kernel smoothingalso goes under the name of the Parzen window method, after its origina-tor (Parzen, 1962). As this technique involves interpolating a continuous

Decoding Poisson Spike Trains by Gaussian Filtering 1247

0 400 8000

10

20

30

40

time (ms)

spik

es/

sec

σ=30 ms

Figure 2: Recovering an estimated spike rate function λ(t) by gaussian smooth-ing. Each spike is convolved with a gaussian kernel (dotted lines) and the resultssummed to produce the estimate λ(t).

function between the events of a point process (spikes), it is an example ofradial basis function interpolation when using a symmetric kernel such asa gaussian function.

A problem with applying kernel smoothing is choosing the kernel widthσ (analogous to choosing bin size when plotting PSTHs), as the shape ofthe estimated λ(t) will strongly depend on this parameter (see Figure 3).The primary concern in this study is to provide on objective basis for se-lecting an optimal kernel width that minimizes the error between actualand estimated λ(t). Although automatic bandwidth selection algorithmsexist (Jones, Marron, & Sheather, 1996), they are general-purpose statisti-cal methods without any domain-specific knowledge built into them. Ahope here is that by studying how various neurophysiological parame-ters affect the selection of bandwidth for spike train smoothing, a special-purpose model can be constructed that does better than the nonspecificmethods.

A notable previous theoretical treatment of this problem was by Bialek,Rieke, de Ruyter van Steveninck, and Warland (1991), studying movement-sensitive neurons in the blowfly visual system. However, their approach toestimating the smoothing kernel is valid only for the special case in whichthere is a high correlation between the temporal waveform of the sensorystimulus and the temporal waveform of the sensory neuron response. Suchwould occur in a linear system or one with soft nonlinearities (saturationeffects). For mammalian visual processing, for instance, that would limitapplicability of those methods to relatively peripheral structures. As onemoves centrally through multiple cortical areas (over 30 have been identi-fied in the monkey visual system; Felleman & Van Essen, 1991), the wave-form of the neural response becomes decorrelated from that of the stimulusthrough the influence of multiple layers of nonlinear recurrent interactions.

1248 S. Lehky

0 400 800time (ms)

estimatedactual

σ=256

d.

0 400 8000

20

40

60

time (ms)

spik

es/

sec

σ=64

estimatedactual

c.

estimatedactual

σ=16

b.

0

20

40

60

spik

es/

sec

σ=4

estimatedactual

a.

Figure 3: The spike rate function λ(t) estimated from a set of spike trains de-pends on the width σ of the gaussian smoothing kernel used. Shown are theactual λ(t) (dotted line), as well as the estimated λ(t) generated by four differentvalues of σ . Each estimated λ(t) was produced by convolving 100 spike trains,generated by the same λ(t), with a gaussian kernel of the specified σ . The resultsfor all trials were then averaged.

In inferotemporal cortex, a visual stimulus displayed temporally as an ex-tended rectangular pulse typically elicits a complex multipeaked neuralresponse. Furthermore, in central neurons, the temporal aspect of the neu-ral response may not even in principle be coding the temporal aspect of thestimulus. For example, it has been suggested that the temporal waveformsof monkey inferotemporal neurons are encoding information about stim-ulus shape (Richmond, Optican, Podell, & Spitzer, 1987). Thus, for thesereasons, the cross-correlational approach used by Bialek et al. (1991) is in-adequate in the general case. In this study, we used a different procedurethat attempts to estimate the rate function using information intrinsic tothe neural signal, without knowledge of inputs from the external environ-ment. Related theoretical treatments of spike train decoding include Augustand Levy (1996), Goldberg and Andreou (2007), and Paulin and Hoffman(2001).


2 Methods

Our approach was to examine the problem from an analytic perspective aswell as to conduct simulations. For the simulations, Poisson spike trainswere generated from a known λ(t), and then a family of estimates λ(t|σ )was recovered from those spikes by gaussian smoothing, using a range ofvalues for the kernel width σ . A global error measure ET (σ ) was calculateddependent on the difference between actual and estimated λ(t) for differentvalues of σ . The minimum of the ET (σ ) curve indicated the optimal σ . Theshape of the ET (σ ) curve (and therefore the optimal σ ) depended on thestochastic characteristics of λ(t), and the nature of that dependence is themajor concern here. (See the appendix for a list of symbols.)

We held kernel shape fixed in the form of a gaussian curve and in-vestigated only effects of kernel width, as kernel smoothing estimates aregenerally more sensitive to changes in kernel size than shape (Bowman &Azzalini, 1997). By using a gaussian kernel, we were performing acausalfiltering, thus making these results more directly applicable to data analysisrather than a biophysical account of neural processing, though the generalconceptual framework may transfer into in vivo situations.

Details of generating the Poisson spike train from λ(t) have been dis-cussed elsewhere (Lehky, 2004). The spike train was represented as a setof Dirac δ functions occurring at times ti . Therefore, the shape and dura-tion of biological spikes were abstracted away. For purposes of numericalsimulations, time was discretized to 1 msec bins. That is, each 1 msec bincontained either a δ function pulse or a zero (in practice, mostly zeros withan occasional pulse). This is similar to the convention in systems neuro-science data files of representing spike trains as strings of ones and zeros, aone being recorded for each spike when the voltage of the spike first entersa particular voltage window. A 3 ms refractory period was included in thePoisson generation process, so that variability of the spike count in a trialwas slightly lower than for a pure Poisson process (i.e., the Fano factor wasless than one).

The mathematical form of the spike train representation was

S(t | ti ) =∑

ti

δ(t − ti ) (2.1)

where δ(t − ti ) represented a spike occurring at time ti , and the range oft was 0 to 1000 ms, the duration of spike trains in the simulations. Thegaussian smoothing kernel was

K (t | ti , σ ) = 1

σ√

2πe

−(t−ti )2

2σ2 , (2.2)

1250 S. Lehky

and the estimate of the instantaneous firing rate given by the convolutionof the spike train by the smoothing kernel:

λ(t | ti , σ ) = S ∗ K =∑

ti

K (t | ti , σ ) (2.3)

(taking advantage of the sifting property of the δ impulse).To deal with edge effects during the convolution, we used spike trains

with reflecting boundaries. Appended to both ends of the spike trains weretime-reflected copies of the train, forming an extended spike train. That wasequivalent to reflecting the tail of the kernel back into the spike train periodif it fell beyond that period. If the kernel was so broad that it went beyondeven the extended spike train, the kernel tail was truncated, and the areaunder the truncated kernel was renormalized to one.

Estimating the rate function by convolving the spike train with a smooth-ing kernel is equivalent to low-pass-filtering the spike train. The transferfunction of the gaussian kernel in the frequency domain is

H(ω | σ ) = e− 12 (σω/1000)2

, (2.4)

where ω is radial frequency and σ is in milliseconds. The bandwidth (half-power) of the gaussian filtering is

ωBW = 1000√

log(2)σ

. (2.5)

Thus, there is an inverse relation between the width of the kernel in thetime and frequency domains. Broad kernels in the time domain correspondto narrow-width low-pass filters.

The rate function λ(t) was characterized by three parameters: (1) themean rate,

〈λ(t)〉t = 1tD

∫ tD

0λ(t) dt, (2.6)

where 〈〉t is the mean value operator with respect to time and tD is spiketrain duration; (2) the variance Var(λ(t)); and (3) an exponent α defining apower law Fourier amplitude spectrum for λ(t), which was proportional to1/ f α .

The rate function was computed by generating gaussian white noise fol-lowed by low-pass filtering in the frequency domain with the appropriatepower law 1/ f α . The spectrum was truncated to zero for frequencies above100 Hz for biophysical reasons. The function was then returned to the timedomain by an inverse Fourier transform. At this point, the function was


0 400 800time (ms)

α=3.0c. black noise

0 400 800time (ms)

α=2.0b. brown noise

0 400 80010

15

20

25

30

time (ms)

spik

es/s

ec

α=1.0a. pink noise

Figure 4: Example rate functions λ(t) with different Fourier amplitude spectra.The Fourier spectra were proportional to 1/ f α , with the values of α indicated ineach panel. Besides the Fourier spectral exponent α, the other two parametersdefining the randomly generated λ(t) were their mean and variance.

linearly scaled to give the desired mean and variance. This procedure pro-duced rate functions λ(t) that were random functions in the form of colorednoise, with α = 1 being pink noise, α = 2 being brown noise, and α = 3sometimes called black noise (Schroeder, 1991). Some example rate func-tions are shown in Figure 4 for different values of α. In addition to a smallerhigh-frequency contribution, larger values of α lead to a broader autocor-relation function and can be considered “burstier.” Intracellular recordingsshow that neuronal firing rates can show rapid fluctuations similar to thoseseen in these rate functions, with the firing rates mirroring noisy currentinputs to the cell (Carandini, Mechler, Leonard, & Movshon, 1996; Nowak,Sanchez-Vivez, & McCormick, 1997).

A simulation “experiment” consisted of p trials, each trial using an iden-tical rate function λ(t) but with a different random spike train derived fromthat rate function (see Figure 1). This is analogous to the way the samestimulus is presented multiple times in actual experiments. The value ofp in simulations ranged from 1 to 256. The estimated rate function for anexperiment, ελ(t), was calculated by averaging the estimated rate functionsfor individual trials:

ελ(t) = 1p

p∑k=1

λk(t). (2.7)

We shall be concerned exclusively with rate estimates based on entire exper-iments, ελ(t), not individual trials, λk(t) (except for the special case p = 1).

Each experiment was replicated m times with the same rate function λ(t)to form a set called an experiment replication. Each experiment replica-tion, in turn, was repeated n times, each with a different λ(t), to form an

1252 S. Lehky

experiment group. The n exemplars of λ(t) within each group were ran-domly generated using the same triplet of parameters (mean, variance, andspectral exponent α). Thus, for each experimental group, we were interestedin estimation errors associated with the statistical characteristics underly-ing λ(t), not with individual examples of λ(t). Each experimental groupwas therefore associated with one point in the statistical parameter spacedefining λ(t), and the parameter space was sampled with simulation datafrom a set of experimental groups.

The total error function was the mean integrated square error (MISE)between actual and estimated rate functions:

ET (σ ) = 1n

n∑i=1

⎛⎝ 1

m

m∑j=1

(∫ tD

t=0(λi (t) − ελi j (t | σ ))2 dt

)⎞⎠, (2.8)

where tD was the duration of the spike train. Taking advantage of theergodicity of λ(t), total error in practice was calculated across time ratherthan across an ensemble at a point in time.

3 Results

Plots of example total error curves ET (σ ) are shown in Figure 5, with adifferent value of mean rate 〈λ(t)〉t in each panel. The curves generally havea dipper shape, first declining and then either increasing or remaining at aplateau. The error curves within each panel change shape as the number oftrials per experiment p is increased. The initial descending portion of thecurve shifts downward proportionally to p. The ascending portion of thecurve shifts downward to an asymptotic limit as p → ∞.

Total error can be expressed as the sum of components related to varianceerror and bias squared error (Silverman, 1986):

ET (σ ) = EV(σ ) + EB(σ ). (3.1)

EV(σ ) is the integrated variance error,

EV(σ ) = 1n

n∑i=1

⎛⎝ 1

m

m∑j=1

(∫ tD

t=0Var(ελi j (t | σ )) dt

)⎞⎠, (3.2)

averaged over m instantiations of spike trains from the same λ(t), as wellas n exemplars of different λ(t) functions randomly generated using thesame triplet of parameter values. Variance here refers to variance across an


100

101

102

103

100

102

104

erro

r

64 spikes/sec

c.

1 trial 4 trials 16 trials 64 trials256 trials

100

102

104

erro

r

16 spikes/sec

b.


100

102

104

erro

r

4 spikes/sec

a.


Figure 5: Error between actual rate function λ(t) and estimated rate function λ(t)as a function of the width σ of the gaussian smoothing kernel used. Error wasdefined as mean integrated square error (MISE), with units of (spikes/sec)2.Curves were generated through simulations. Each panel shows error curvesfor a different mean firing rate of λ(t). Within individual panels, each curvecorresponds to a different number of trials per experiment p, with the curvereflecting estimation error after pooling the p replications. In all cases, varianceof λ(t) was equal to 0.1 times the mean firing rate, with Fourier exponent α = 2.Dotted lines are asymptotic bias error curves as p → ∞.

ensemble of ελ(t) at each point in time. EB(σ ) is the integrated bias squarederror,

EB(σ ) = 1n

n∑i=1

⎛⎜⎝∫ tD

t=0

⎛⎝λi (t) − 1

m

m∑j=1

ελi j (t | σ )

⎞⎠

2

dt

⎞⎟⎠, (3.3)

1254 S. Lehky

100

102

104

erro

r

4 spikes/sec16 trials

total errorvariance errorbias error

a.

100

102

104

erro

r



b.

100

101

102

103

100

102

104

erro

r



c.

Figure 6: Decomposition of total error curves (as in Figure 5) into a varianceerror component and a bias error component. Each panel shows curves corre-sponding to a different mean firing rate of λ(t). In all cases, variance of λ(t) wasequal to 0.1 times the mean firing rate, with Fourier exponent α = 2 and thenumber of replication trials per experiment p = 16. The empty circle symboltoward the right of each curve is the error if λ(t) is estimated as the mean firingrate over trial duration (effectively setting σ → ∞).

averaged in the same way, again based on ensemble calculations. For con-venience, we shall refer to EV(σ ) and EB(σ ) by the shorthand expressions“variance error” and “bias error,” keeping in mind that we are actuallyreferring to integrated errors. Example ET (σ ) curves decomposed into theirEV(σ ) and EB(σ ) components are shown in Figure 6. The variance error isassociated with the descending portion of the ET (σ ) curve to the left, while


the bias error is associated with the flat or ascending portion of the curveto the right.

An analytic expression for EV(σ ) can be derived. Consider a spike trainof duration tD with a single spike at time ti that has been smoothed by agaussian kernel with width σ . The smoothed spike train (estimated ratefunction λ(t)) is then given by

λ(t | σ ) = 1

σ√

2πe

−(t−ti )2

2σ2 t = [0, tD]. (3.4)

For the EV(σ ) curve, we want to know the integrated Var(λ(t | σ )) as afunction of σ . Note we are taking the variance of gaussian curve valueswith respect to the vertical axis. A standard formula for variance is

Var(λ) = 〈λ2〉t − 〈λ〉2t (3.5)

(taking advantage of the ergodicity of λ(t) to go from ensemble averagesto time averages). For σ small relative to tD in equation 3.4, 〈λ2〉t 〈λ〉2

t .Therefore,

Var(λ) ≈ 〈λ2〉t. (3.6)

For an experiment (see equation 2.7) averaging data from p one-spike trialswith the spike at random ti for each trial, the resulting ελ(t) is the sum ofp gaussian curves, each gaussian being attenuated in height by a factor of1/p (variance of each gaussian attenuated by a factor of 1/p2). That gives

Var(ελ) ≈ p(

1p2 〈λ2〉t

)= 1

p〈λ2〉t, (3.7)

assuming that the firing rate and kernel width are both small enough suchthat the p gaussians do not substantially overlap. Inserting this expressioninto equation 3.2,

EV(σ ) = 1n

n∑i=1

⎛⎝ 1

m

m∑j=1

(1p

∫ tD

t=0

⟨λ2

i j

⟩t dt

)⎞⎠. (3.8)

Developing an expression for 〈λ2〉t inside the integral, we square equation3.4:

λ(t | σ )2 = 12πσ 2 e− (t−ti )2

σ2 . (3.9)

1256 S. Lehky

The value of 〈λ2〉t , or mean value of λ(t | σ )2, is the area under λ(t | σ )2

divided by the length of the time period tD:

〈λ(t | σ )2〉t = 12π tD

∫ tD

t=0

1σ 2 e− (t−ti )2

σ2 dt. (3.10)

Equation 3.10 in turn is equal to

〈λ(t | σ )2〉t = erf( tD−ti

σ

) + erf( ti

σ

)4√

π tDσ≈ 1

2√

π tDσ(3.11)

as the error function erf approaches one for σ small relative to tD − ti andti . If we have a spike train with s spikes instead of one, then equation 3.11becomes

〈λ(t | σ )2〉t ≈ s2√

π tDσ, (3.12)

assuming the spike rate is low enough that convolution kernels for thespikes do not substantially overlap. Substituting equation 3.12 into equation3.8 and moving constant terms out of the integral,

EV(σ ) =(

12√

π p1σ

)1n

n∑i=1

⎛⎝ 1

m

m∑j=1

((si j

tD

) ∫ tD

t=0dt

)⎞⎠

=(

tD

2√

π p1σ

)1n

n∑i=1

⎛⎝ 1

m

m∑j=1

(si j

tD

)⎞⎠. (3.13)

The expected value of si j/tD (spikes per unit time) is equal to the meanof the rate function 〈λ(t)〉t , giving us the final result for the variance errorcurve:

EV(σ ) = tD〈λ(t)〉t

2√

π p1σ

. (3.14)

The variance error proportionality constant is

βV = tD〈λ(t)〉t

2√

π p. (3.15)

The inverse relation between variance error and σ seen in equation 3.14is an idealization that is perturbed by two factors in real situations: edgeeffects from a finite duration spike train and overlap between kernels as the


Table 1: Fitted Parameter Values for the Empirical Variance Error Curve,Equation 3.16, for Different Mean Firing Rates 〈λ(t)〉t .

〈λ(t)〉t η0 η1 η2 η3 η4

4 −0.082 −0.534 −0.428 0.130 −0.0126 −0.051 −0.665 −0.338 0.108 −0.0118 −0.057 −0.658 −0.342 0.109 −0.011

16 −0.050 −0.717 −0.310 0.102 −0.01032 0.023 −1.102 −0.051 0.041 −0.00664 0.110 −1.652 0.307 −0.044 0.001

spike rate increases. The idealization holds for σ small relative to the spiketrain duration tD, and for low spike rates such that σ is small relative to themean interspike interval. In a log-log plot, the ideal EV(σ ) is a straight linewith a slope of −1. In the EV(σ ) plots of Figure 6, we see deviations fromthis ideal for large σ . An empirical expression for the actual EV(σ ) is givenby

Ev(σ ) = βV exp

(4∑

i=0

ηi log(σ )i

), (3.16)

with values of ηi in Table 1 fitted from simulations by nonlinear regression.This is a polynomial in log-log space. Deviations from the ideal in theempirical EV(σ ) curve are specific for the particular boundary conditionsin effect.

The right side of the total error curve ET (σ ), which is dominated bybias error, approaches a limit as the number of trials per experimentp → ∞ (dashed lines in Figure 5). That limit is the asymptotic bias error,given by

EasyB(σ ) = 1n

n∑i=1

(∫ tD

t=0(λi (t) − λi (t) ∗ K (t | σ ))2 dt

), (3.17)

where K (t | σ ) is the smoothing kernel with width σ . Essentially EasyB(σ ) isthe integrated bias squared error between λ(t) and a smoothed version ofitself. The equation for asymptotic bias is analogous to that of the regularbias EB(σ ), except that smoothing is performed directly on λ(t) and not onthe spike trains generated from λ(t). As there is no stochastic mapping fromλ(t) to spike trains in this case, the averaging over m experiment replicationscan be omitted.

EasyB(σ ) curves have a sigmoidal form whose shape depends entirelyon Fourier spectral exponent α of λ(t), and not on 〈λ(t)〉t or Var(λ(t)). Plotsof EasyB(σ ) for different values of α are given in Figure 7, normalized on a

1258 S. Lehky

Table 2: Fitted Parameter Values for the Asymptotic Bias Error Curve,Equation 3.18, for Different Values of the Fourier Spectral Exponent α.

α ν1 ν2 ν3 ν4 ν5

1.0 0.75 4.11 1.55 2.33 3.091.5 0.64 5.20 0.86 1.19 22.062.0 0.69 5.46 0.71 1.45 38.742.5 0.44 5.80 0.63 1.67 94.903.0 0.43 5.84 0.62 2.13 88.39

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1.0

α=1.0α=1.5α=2.0α=2.5α=3.0

no

rma

lize

d e

rro

r

Figure 7: Asymptotic bias error curves, showing how curve shape changes asa function of Fourier spectral exponent α. These curves have been normalizedto a height of one. The asymptotic bias error curves indicate the limit of biaserror as the number of trials per experiment p → ∞ (see the dotted curves inFigure 5).

scale of zero to one. The EasyB(σ ) curves are empirically approximated bythe following equation:

EasyB(σ ) = βasyB

[ν1

ν3√

2π

∫ σ

x=0

1x

e−(ln x−ν2)2

2ν23 dx

+ 1 − ν1

νν45 �(ν4)

∫ σ

x=0xν4−1e− x

ν5 dx

], (3.18)

where � is the gamma function. The parameters ν1,...,5 for the curves, foundby nonlinear regression of equation 3.18 on the simulations, are given inTable 2. This equation is the weighed sum of the equations for the log-normal cumulative distribution function (cdf) and the gamma cdf. Thosecdfs have no probabilistic interpretation here but were empirically cho-sen for their ability to fit the sigmoid EasyB(σ ) curves that arose throughsimulation.


The proportionality constant βasyB in equation 3.18 can be derived analyt-ically. The value of βasyB is limσ→∞ EasyB(σ ). As the width of the gaussiankernel approaches infinity, convolving λ(t) with the kernel produces themean of λ(t):

limσ→∞ [λ(t) ∗ K (t | σ )] = 〈λ(t)〉t. (3.19)

This converts equation 3.17 to

limσ→∞ E(σ ) = βasyB = 1

n

n∑i=1

( ∫ tD

t=0(λi (t) − 〈λi (t)〉t)2 dt

). (3.20)

Multiplying equation 3.20 by tD/tD = 1,

βasyB = 1n

n∑i=1

(tD

(1tD

∫ tD

t=0(λi (t) − 〈λi (t)〉t)2 dt

)). (3.21)

The term in the inner parentheses is the mean of (λi (t) − 〈λi (t)〉)2, so

βasyB = 1n

n∑i=1

(tD〈(λi (t) − 〈λi (t)〉t)2〉t). (3.22)

From the definition of variance, equation 3.22 becomes

βasyB = 1n

n∑i=1

(tD Var (λi (t))). (3.23)

As all λi (t) are defined to have identical variance, we get the final result:

βasyB = tD Var(λ(t)). (3.24)

Empirically, we shall treat the bias error curve as having the same shapeas the asymptotic bias error curve but translated upward. This amounts toreplacing in equation 3.18 the proportionality constant βasyB by βB :

E B(σ ) = βB

[β1

β3√

2π

∫ σ

x=−∞

1x

e−(ln x−β2)2

2β23 dx

+ 1 − β1

ββ45 �(β4)

∫ σ

x=−∞xβ4−1e

xβ5 dx

]. (3.25)

1260 S. Lehky

An analytic expression can be derived for βB , which is equal tolimσ→∞ EB(σ ). As σ → ∞, the variance error EB(σ ) goes to zero (see equa-tion 3.14), so that by equation 3.1, limσ→∞ EB(σ ) = limσ→∞ ET (σ ). From theexpression for ET (σ ) in equation 2.8,

βB = limσ→∞ EB(σ ) = lim

σ→∞ ET (σ )

= limσ→∞

⎛⎝ 1

n

n∑i=1

⎛⎝ 1

m

m∑j=1

( ∫ tD

t=0(λi (t) − ελi j (t | σ ))2 dt

)⎞⎠

⎞⎠ . (3.26)

As the width of the gaussian kernel approaches infinity, convolving thespike train S(t | ti ) with the kernel K (t | σ ) produces the mean of S(t | ti ),which is equal to the number of spikes on the kth trial sk divided by thespike train duration tD, or the mean rate rk for trial k:

λ(t | ti , σ ) = S ∗ K = 〈S(t | ti )〉 = sk

tD= rk . (3.27)

Taking εri j as the mean of rk over p trials in the i, j th experiment convertsequation 3.26 to

βB = 1n

n∑i=1

⎛⎝ 1

m

m∑j=1

( ∫ tD

t=0(λi (t) − εri j )2dt

)⎞⎠. (3.28)

Adding the term 〈λi (t)〉t − 〈λi (t)〉t = 0 into equation 3.28 and rearranginggives

βB = 1n

n∑i=1

(1m

m∑j=1

( ∫ tD

t=0[(λi (t) − 〈λi (t)〉t) − (εri j − 〈λi (t)〉t)]2 dt

)).

(3.29)

Expanding the square within the integral,

βB = 1n

n∑i=1

(1m

m∑j=1

( ∫ tD

t=0(λi (t) − 〈λi (t)〉t)2 dt − 2

∫ tD

t=0(λi (t) − 〈λi (t)〉t)

× (εri j − 〈λi (t)〉t) dt +∫ tD

t=0(εri j − 〈λi (t)〉t)2 dt

)). (3.30)


The εri j − 〈λi (t)〉t term in the middle integral is a constant, and the λi (t) −〈λi (t)〉t term integrates to zero, so the middle integral disappears, leaving

βB = 1n

n∑i=1

(1m

m∑j=1

( ∫ tD

t=0(λi (t) − 〈λi (t)〉t)2 dt

+∫ tD


)). (3.31)

The first integral in equation 3.31 is βasyB (see equations 3.20 and 3.24), so:

βB =βasyB + 1n

n∑i=1

⎛⎝ 1

m

m∑j=1

( ∫ tD


)⎞⎠

= tDVar(λ(t)) + 1n

n∑i=1

⎛⎝ 1

m

m∑j=1

( ∫ tD


)⎞⎠. (3.32)

Both εri j and 〈λi (t)〉t are constants as a function of time, so they may betaken outside the integral, producing

βB = tDVar(λ(t)) + tD1n

n∑i=1

⎛⎝ 1

m

m∑j=1

(εri j − 〈λi (t)〉t)2

⎞⎠. (3.33)

As the mean rate 〈λi (t)〉t is defined as identical for all i , the expected valueof 〈εr〉t = 〈λi (t)〉t , where 〈εr〉t is the observed mean firing rate over m · nexperiments. Therefore:

βB = tD Var(λ(t)) + tD1n

n∑i=1

⎛⎝ 1

m

m∑j=1

(εri j − 〈εr〉t)2

⎞⎠

= tD Var(λ(t)) + tD〈Var(εr )〉t

= tD Var(λ(t)) + tD Var(εr ), (3.34)

where Var(εr ) is the variance of the mean firing rate among replicated exper-iments. Since Var(εr ) results from averaging over p trials per experiment,in terms of a single-trial r , equation 3.34 becomes

βB = tD Var(λ(t)) + tDVar(r )

p, (3.35)

where Var(r ) is the variance of the mean rate between individual trials.

1262 S. Lehky

Var(r ) in the above expression can be reexpressed in terms of mean rate〈r〉t using the Fano factor. The Fano factor with respect to the number ofspikes per trial s is

Fs = Var(s)〈s〉t

. (3.36)

For a kth-order gamma process, Fs = 1/k, with Poisson processes being aspecial case with k = 1. It should be emphasized that this is the Fano factorfor the underlying stochastic spiking process and not the rate function λ(t).In other words, it is the Fano factor for a spike train when the spike rate isconstant over the entire trial. Going to the Fano factor for spike count (Fr )from the Fano factor for spike rate (Fs) gives

Fr = Var(r )〈r〉t

= Var(s/tD)〈s/tD〉t

=1t2D

Var(s)1tD

〈s〉t= 1

tD

Var(s)〈s〉t

= Fs

tD. (3.37)

From equation 3.37,

Var(r ) = Fr 〈r〉t = Fs

tD〈r〉t. (3.38)

Substituting equation 3.38 into equation 3.35 produces

βB = tD Var(λ(t)) + Fs

p〈r〉t. (3.39)

The expected value of the mean firing rate 〈r〉t estimated over m · n · p trialsis equal to the actual mean firing rate 〈λ(t)〉t , giving the final result:

βB = tDVar(λ(t)) + Fs

p〈λ(t)〉t. (3.40)

From an examination of equations 3.24 and 3.40, as trials per experimentp → ∞, βB → βasyB , consistent with the simulations shown in Figure 5.

At this point we introduce the concept of the spike density of an exper-iment. As described earlier, each experiment consists of p trials, with alltrials having an identical rate function λ(t) with mean 〈λ(t)〉t . The expectedspike density δ for the experiment is

δ = p · 〈λ(t)〉t. (3.41)


Spike density is associated with an experiment as a whole, whereas spikerate is associated with individual trials. The inverse of spike density is thepooled mean interspike interval I for an experiment:

I = 1δ. (3.42)

Given that bit of nomenclature, we can now turn to the issue of the optimalvalue of σ , which minimizes the ET (σ ) curve.

Under certain conditions, changing mean rate 〈λ(t)〉t translates the to-tal error curve ET (σ ) vertically while leaving its shape invariant under alogarithmic scale. The significance of maintaining the shape invariance ofET (σ ) is that the value of σ that minimizes ET (σ ), σopt , remains constant. Forerror curve shape invariance to happen, the ratio of the variance error andbias error proportionality constants, βB/βV , must be constant as a functionof 〈λ(t)〉t . Using the definition of spike density δ (see equation 3.41), theproportionality constants βV (see equation 3.15) and βB (see equation 3.40)become

βV = tD

2√

π

〈λ(t)〉2t

δ(3.43)

βB = tD Var(λ(t)) + Fn〈λ(t)〉2

t

δ. (3.44)

Their ratio is

βB

βV= 2

√π

(δ Var(λ(t))

〈λ(t)〉2t

+ Fn

tD

)

= 2√

π

(δ

(Std(λ(t))〈λ(t)〉t

)2

+ Fn

tD

). (3.45)

The term within the inner bracket, the ratio of the standard deviation to themean of λ(t), is the coefficient of variation of λ(t):

CV(λ) = Std(λ(t))〈λ(t)〉t

. (3.46)

By substitution, equation 3.45 becomes

βB

βV= 2

√π

(δC2

V + Fn

tD

). (3.47)

1264 S. Lehky

100

101

102

103

100

102

104

err

or

4 spikes/sec x 256 trials16 spikes/sec x 64 trials64 spikes/sec x 16 trials

Figure 8: A set of error curves produced when the following three parameterswere held constant: the spike density δ of the data set, the coefficient of variationCV , and the Fourier exponent α of the rate function. This shows that the shapesof error curves are nearly invariant when these parameters are held constant,and consequently have their minima σopt at approximately the same values.Curves are vertically shifted from each other as a function of mean firing rate.Spike density is the mean of the firing rate function 〈λ(t)〉t times the numberof trials per experiment p. For all three curves, spike density was fixed at1024 spikes/sec. The coefficient of variation CV of λ(t) was set to 0.1 for thesecurves, and the Fourier exponent α was 2.

Thus, a condition for shape invariance of the ET (σ ) curve at different meanfiring rates is for the spike density δ (or, equivalently, its inverse I ) tobe constant as well as for CV of λ(t) to be constant. In addition to thesetwo parameters, which affect the shape of ET (σ ) by vertically shifting therelative positions of the underlying bias and variance error componentswithout changing the shapes of the component curves themselves, thereis a third parameter (not included in equation 3.47) that affects ET (σ ) bychanging the shape of the EB(σ ) component. That third parameter is theFourier spectral exponent α of λ(t), which should also be constant in orderto hold the shape of ET (σ ) constant. These three variables controlling theshape of the ET (σ ) curve (δ, CV , and α) are all parameters describing thespike rate function λ(t). In addition, from equation 3.47, we see that theFano factor Fn of the spike generation process (whether it is a Poissonprocess, gamma process, or something else) also affects the ET (σ ) curveshape, independent of the characteristics of λ(t), although it will be shownbelow that this parameter has a relatively small effect.

Shape invariance of ET (σ ) is illustrated in Figure 8, where the mean rate〈λ(t)〉t is varied while δ, CV , and α are held constant (using a Poisson spike-generation process). The shape invariance is approximate due to deviationsfrom the ideal of both the EV(σ ) and EB(σ ) components of ET (σ ) at higherspike rates. By determining the parameters that control the shape of the


100

101

102

optim

al s

igm

a (

ms)

C v=0.05C v=0.10C v=0.20

α=2.0

b.

pooled mean interspike interval (ms)

optim

al s

igm

a (

ms)

10-2

10-1

100

101

α=1.0α=2.0α=3.0

C v=0.10

100

101

102

a.

Figure 9: The optimal value of the gaussian width σopt plotted as a functionof the pooled mean interspike time I . The pooled mean interspike time wascalculated after combining multiple replication trials in an experiment andis the reciprocal of spike density. (a) Three σopt versus I curves generated bychanging the coefficient of variation CV of the rate function λ(t). (b) Threeσopt versus I curves generated by changing the Fourier spectral exponent α

of the rate function λ(t). In both panels, the curves show a discontinuity toσopt → ∞ at the point when the intermediate dip in the error curves disappears,and instead they monotonically decline to some asymptotic value.

total error curve ET (σ ), we therefore know the parameters that potentiallyaffect the value of σ where the curve minimum is located, which is σopt .Thus, we have identified three parameters that may affect σopt : which are δ

(or I ), CV , and α.The value of σopt from simulation data is plotted in Figure 9 as a function

of the pooled mean interspike interval I . The straight lines in the log-logplot indicate a power law relationship:

σopt = a I b . (3.48)

1266 S. Lehky

Table 3: Functional Relationships Underlying the Power Law σopt = a I b Gov-erning the Optimal Smoothing Width, Equation 3.48, for First-Order (Poissonprocess), Second-Order, and Third-Order Gamma Processes.

Order a b

1 6.41C−1.17v 0.87α−0.57

2 5.54C−1.24v 0.84α−0.57

3 4.52C−1.16v 0.88α−0.51

Figure 9a illustrates that a is primarily a function of CV and is inverselyproportional to it, although the Fourier spectral exponent α also affectsit to a lesser extent. Figure 9b illustrates that b primarily is a function ofα and is inversely proportional to it, although affected to a lesser extentby CV .

Formulas for a and b in equation 3.48 were estimated through nonlinearfitting of the simulation data and are given in Table 3. That includes resultsfor a Poisson process (first-order gamma process), as well as second- andthird-order gamma processes.

The results in Table 3 show that different renewal processes lead tosimilar functions for σopt , suggesting that the characteristics of the spikegeneration process are of much less importance for setting σopt than thestatistics of the rate function λ(t). That is not surprising when one considersthat none of the analytic expressions derived here is dependent on thenature of the spike generation process, with the exception of the Fano factorin the equation for βB—equation 3.39.

If σopt is indeed largely independent of the spike generation process, thenit would be expected that the power law estimates provided here wouldalso be valid if there was a deterministic relationship between spike tim-ing and membrane voltage (Mainen & Sejnowski, 1995; Tiesinga, Fellous,& Sejnowski, 2008) rather than a stochastic one. In any case, even with adeterministic spike generation process, responses of a neuron to repeatedtrials of the same stimulus would remain effectively stochastic due to varia-tions in the dynamical state of the large-scale network in which the neuronis embedded (Arieli, Sterkin, Grinvald, & Aertsen, 1996).

Although the power law in equation 3.48 is an approximation, evensubstantial uncertainties in estimating σopt will generally not lead ET (σ )values far from the minimum, because ET (σ ) curves are very broad andshallow (see Figure 5). A noteworthy aspect of equation 3.48 is that σopt canbe either greater than or less than the pooled mean interspike interval I ,depending on parameter values.

Equation 3.48 breaks down for small values of βB/βV , which, for exam-ple, would occur for data with small spike densities. For βB/βV > 0.012(using the ideal equation for βV , equation 3.15), the intermediate minima


100

101

102

10-2

10-1

100

101

102

α=2

C v=0.10

error curve min

pooled mean interspike interval (ms)

optim

al s

igm

a (

ms)

Figure 10: Comparison of optimal gaussian width σopt determined from theminima of total error curves (taken from Figure 9a) with σopt determined by theSheather-Jones algorithm.

seen in the curves of Figure 8 disappear, so that the curves monotonicallydecline to some asymptote. (Examples of monotonically declining curvescan be seen in Figure 5.) That leads to a discontinuity from the relationshipbetween σopt and I , causing σopt to go to infinity abruptly. When the em-pirical equation for βV is used (see equation 3.16), the breakdown is moregradual and occurs for approximately βB/βV > 0.005. These discontinu-ities are indicated in Figure 9, occurring when the pooled mean interspikeinterval becomes sufficiently large.

The estimates of σopt in Figure 9 are based on simulations that gener-ated entire error curves as a function of σ and then found the minima ofthose curves. However, there exist algorithms for estimating the optimalwidth of the smoothing kernel directly from a single data sample, of whichthe Sheather-Jones algorithm (Sheather & Jones, 1991) is currently amongthe most widely used. A comparison of our simulation results with theoutput of the Sheather-Jones algorithm is shown in Figure 10. The compar-ison indicates that the Sheather-Jones algorithm underestimates σopt , andthe underestimation become more pronounced as the pooled mean inter-spike interval of the data increases (spike density decreases). Moreover, theSheather-Jones algorithm fails to pick up the discontinuity at very low spikedensities where σopt → ∞, at which point the best estimate of the temporalwaveform of λ(t) is a constant equal to its mean 〈λ(t)〉t .

4 Conclusion

It has been generally understood in the past that σopt is directly related tomean interspike interval (or modifying that concept here, the pooled mean

1268 S. Lehky

interspike spike interval of a multitrial experiment). Refining that, we showhere a power law relationship between σopt and the interspike intervalI , and highlight two additional parameters affecting σopt : the coefficientof variation CV and the Fourier spectral exponent α of the rate functionλ(t). Of the three parameters, I and CV can be estimated from the spiketrain data set itself. The third one, α, would need auxiliary data to assigntypical values, possibly based on the Fourier spectra of somal membranevoltages obtained using the whole-cell patch clamping technique (Ferster& Jagadeesh, 1992).

A limitation of this analysis is that it treats λ(t) as if it were statisticallyhomogeneous (i.e., the statistical parameters underlying the rate functiondo not vary as a function of time). In reality, λ(t) may change its positionin parameter space if the functional state of the neuron (and the networkin which it is embedded) changes, as might occur for changed perceptualinput, cognitive state, or motor output of the organism. An approach todealing with this would be to select the kernel width σ adaptively so that itis no longer a constant but becomes σ (t). That requires that trials be brokeninto multiple time periods and σ (t) chosen based on local conditions withineach period. Implicit in the use of low CV values in this study (ranging from0.05 to 0.2) was the assumption of a local analysis. An entire trial in whichthe firing rate ranged from spontaneous activity to a vigorous responsewould typically have a much higher CV .

Although the emphasis here has been on accurately recovering thetemporal waveform of a neural response for data analysis purposes, thesame conceptual framework might be transferable to issues in neural sig-nal transmission in vivo. The overall transfer function between a spiketrain and the postsynaptic dendritic voltage waveform, lumping togethermultiple elements of the synaptic transmission process, can be treated asa low-pass filter or, equivalently, convolution of the spike train by somesmoothing kernel (neglecting active membrane properties, if any, withinthe dendrite). That is analogous to the process we have been studyinghere. Within this framework, one question to ask is, How accurately isthe somatic voltage waveform of the presynaptic cell recovered in thedendritic voltage waveform? A related question is, What are the char-acteristics of the optimal synaptic transfer function (smoothing kernel)that minimizes error in communicating analog signals between neuronsthrough an intervening spike train (granting that analog signals aris-ing from nearby synapses may interact nonlinearly within the dendritictree)?

By use of both simulations and mathematical analysis, we have gained agreater understanding of how to optimally recover the analog rate functionunderlying an ensemble of spike trains using kernel smoothing. This haspractical applications for data analysis of spike train data and may also berelevant to understanding neural signal processing in vivo.


Appendix: Symbols Used

σ Width of gaussian smoothing kernelσopt Optimal smoothing widthλ(t) Instantaneous spike rate as a function of timeλ(t, σ ) Estimated spike rate function using kernel width σ

ελ(t, σ ) Estimated spike rate function for an experiment ε, averagingover multiple trials

f Frequency (Hz)α Exponent of the power law describing the Fourier amplitude

spectrum of λ(t)ET (σ ) Mean integrated square error (total error) between actual and

estimated spike rate functions, as a function of kernel width σ

EV(σ ) Integrated variance error component of total errorEB(σ ) Integrated bias squared error component of total errorEasyB(σ ) Integrated asymptotic bias squared errorS(t | ti ) Spike train with spikes occurring at times tis Spike count for a spike trainrk Mean spike rate for trial k of an experimentK (t | σ ) Smoothing kernel in the time domainH(ω | σ ) Smoothing kernel in the frequency domaint TimetD Spike train duration (or trial duration) in secondsp Number of trials per experiment, using the same λ(t) in each

trialm Number of experiment replications, using the same λ(t) for each

replicationn Number of experiment groups, using different λ(t) for each

groupβV Integrated variance error proportionality constantηi Parameters for the empirical variance error functionβB Integrated bias error proportionality constantβasyB Integrated asymptotic bias error proportionality constantνi Parameters for the empirical asymptotic bias error functionFs Fano factor of spike generation process with respect to spike

countFr Fano factor of spike generation process with respect to spike

rateδ Pooled spike density in spikes/sec, adding (not averaging) data

from multiple trials in an experimentI Pooled mean interspike interval, equal to 1/δ

CV Coefficient of variation of λ(t)〈〉t Mean value operator with respect to timeVar ( ) VarianceStd ( ) Standard deviation

1270 S. Lehky

References

Arieli, A., Sterkin, A., Grinvald, A., & Aertsen, A. (1996). Dynamics of ongoingactivity: Explanation of the large variability in evoked cortical responses. Science,273, 1868–1871.

August, D. A., & Levy, W. B. (1996). A simple spike train decoder inspired by thesampling theorem. Neural Computation, 8, 67–84.

Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R., & Warland, D. (1991). Readinga neural code. Science, 252, 1854–1857.

Bowman, A. W., & Azzalini, A. (1997). Applied smoothing techniques for data analysis.Oxford: Oxford University Press.

Carandini, M., Mechler, F., Leonard, C. S., & Movshon, J. A. (1996). Spike trainencoding by regular-spiking cells of the visual cortex. Journal of Neurophysiology,76, 3425–3441.

Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in theprimate cerebral cortex. Cerebral Cortex, 1, 1–47.

Ferster, D., & Jagadeesh, B. (1992). EPSP-IPSP interactions in cat visual cortex studiedwith in vivo whole-cell patch recording. Journal of Neuroscience, 12, 1262–1274.

Goldberg, D. H., & Andreou, A. G. (2007). Distortion of neural signals by spikecoding. Neural Computation, 19, 2797–2839.

Jones, M. C., Marron, J. S., & Sheather, S. J. (1996). A brief survey of bandwidthselection for density estimation. Journal of the American Statistical Society, 91, 401–407.

Lehky, S. R. (2004). Bayesian estimation of stimulus responses in Poisson spike trains.Neural Computation, 16, 1325–1343.

Mainen, Z. F., & Sejnowski, T. J. (1995). Reliability of spike timing in neocorticalneurons. Science, 268, 1503–1506.

Nowak, L. G., Sanchez-Vivez, M. V., & McCormick, D. A. (1997). Influence of lowand high frequency inputs on spike timing in visual cortical neurons. CerebralCortex, 7, 487–501.

Parzen, E. (1962). On estimation of a probability density function and mode. Annalsof Mathematical Statistics, 33, 1065–1076.

Paulin, M. G., & Hoffman, L. F. (2001). Optimal firing rate estimation. Neural Net-works, 14, 877–881.

Richmond, B. J., Optican, L. M., Podell, M., & Spitzer, H. (1987). Temporal encodingof two-dimensional patterns by single units in primate inferior temporal cortex.I. Response characteristics. Journal of Neurophysiology, 57, 132–146.

Rieke, F., Warland, D., de Ruyter van Steveninck, R., & Bialek, W. (1997). Spikes:Exploring the neural code. Cambridge, MA: MIT Press.

Schroeder, M. (1991). Fractals, chaos, and power laws: Minutes from an infinite universe.New York: Freeman.

Sheather, S. J., & Jones, M. C. (1991). A reliable data-based bandwidth selectionmethod for kernel density estimation. Journal of the Royal Statistical Society B, 53,683–690.

Silverman, B. W. (1986). Density estimation for statistics and data analysis. Boca Raton,FL: Chapman & Hall/CRC Press.


Tiesinga, P., Fellous, J. M., & Sejnowski, T. J. (2008). Regulation of spike timing invisual cortical circuits. Nature Reviews Neuroscience, 9, 97–107.

Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. Boca Raton, FL: Chapman &Hall/CRC Press.

Received July 22, 2008; accepted September 14, 2009.

Documents

Decoding Poisson Spike Trains by Gaussian Filtering - SNLsidney/Papers/LehkyGaussianSmoothing2010.pdf · LETTER Communicated by Paul Tiesinga Decoding Poisson Spike Trains by Gaussian