13
Cortical Tracking of Surprisal during Continuous Speech Comprehension Hugo Weissbart 1 , Katerina D. Kandylaki 1,2 , and Tobias Reichenbach 1 Abstract Speech comprehension requires rapid online processing of a continuous acoustic signal to extract structure and meaning. Previous studies on sentence comprehension have found neural correlates of the predictability of a word given its context, as well as of the precision of such a prediction. However, they have fo- cused on single sentences and on particular words in those sen- tences. Moreover, they compared neural responses to words with low and high predictability, as well as with low and high precision. However, in speech comprehension, a listener hears many suc- cessive words whose predictability and precision vary over a large range. Here, we show that cortical activity in different frequency bands tracks word surprisal in continuous natural speech and that this tracking is modulated by precision. We obtain these results through quantifying surprisal and precision from naturalistic speech using a deep neural network and through relating these speech features to EEG responses of human volunteers acquired during auditory story comprehension. We find significant cortical tracking of surprisal at low frequencies, including the delta band as well as in the higher frequency beta and gamma bands, and observe that the tracking is modulated by the precision. Our results pave the way to further investigate the neurobiology of natural speech comprehension. INTRODUCTION To understand spoken language, a listener must rapidly process information that unfolds over several timescales, including the duration of syllables at around 150 msec, words of about 300 msec, and phrases of 1 sec (Giraud & Poeppel, 2012). Recent studies have shown that corti- cal activity in the delta, theta, and gamma frequency bands tracks acoustic features of speech such as the speech envelope as well as phonemic features (Ding et al., 2018; Di Liberto, OSullivan, & Lalor, 2015; Ding & Simon, 2014; Zion Golumbic et al., 2013; Lakatos, Chen, OConnell, Mills, & Schroeder, 2007). This cortical tracking of speech features has accordingly been pro- posed to reflect neural mechanisms of speech process- ing, for instance, an online segmentation of speech into acoustic speech tokens such as phonemes that occur on the timescale of a few hundreds of milliseconds (Hyafil, Fontolan, Kabdebon, Gutkin, & Giraud, 2015; Giraud & Poeppel, 2012). The processing of higher level linguistic information in speech may employ cortical tracking as well. Recent find- ings showed that cortical activity in the delta and theta frequency bands synchronized to sequential cues such as the rhythm of phrases and sentences in continuous speech (Keitel, Gross, & Kayser, 2018; Ding, Melloni, Zhang, Tian, & Poeppel, 2016), to hierarchical cues such as context-free grammar structure (Brennan & Hale, 2019), as well as to the semantic dissimilarity between successive words (Broderick, Anderson, Di Liberto, Crosse, & Lalor, 2018). An important property of word sequences is that they can allow the prediction of an upcoming word, resulting in a word expectation. The degree to which a word can be predicted is referred to as precision and reflects the certainty with which a neural population generates its prediction. Predictions and precision are both closely re- lated to putative implementations of predictive process- ing (Heilbron & Chait, 2018; Kanai, Komura, Shipp, & Friston, 2015; Feldman & Friston, 2010). Behavioral stud- ies have indeed corroborated that the brain makes pre- dictions about upcoming speech segments: Words can be better distinguished from noise when transition prob- abilities between words are high rather than low (Miller, Heise, & Lichten, 1951), and a highly expected word can be perceived as heard even when obscured by noise (Miller & Isard, 1963). Neurophysiological research on ERPs elicited by a word in a sentence has shown that the brain response to a word reflects the word expectancy through modula- tion of the N400 response (Kutas & Hillyard, 1984). Although this response has not been found to be fur- ther modulated by the precision of the prediction (Federmeier, Wlotko, De Ochoa-Dewald, & Kutas, 2007), precision can influence the neural power in the alpha and theta bands (Rommers, Dickson, Norton, Wlotko, & Federmeier, 2017). The power in the beta fre- quency band has been found to be reduced by semantic 1 Imperial College London, 2 Maastricht University © 2019 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 32:1, pp. 155166 https://doi.org/10.1162/jocn_a_01467

Cortical Tracking of Surprisal during Continuous Speech

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cortical Tracking of Surprisal during Continuous Speech

Cortical Tracking of Surprisal duringContinuous Speech Comprehension

Hugo Weissbart1 Katerina D Kandylaki12 and Tobias Reichenbach1

Abstract

Speech comprehension requires rapid online processing of acontinuous acoustic signal to extract structure and meaningPrevious studies on sentence comprehension have found neuralcorrelates of the predictability of a word given its context as wellas of the precision of such a prediction However they have fo-cused on single sentences and on particular words in those sen-tences Moreover they compared neural responses to words withlow and high predictability as well as with low and high precisionHowever in speech comprehension a listener hears many suc-cessive words whose predictability and precision vary over a largerange Here we show that cortical activity in different frequency

bands tracks word surprisal in continuous natural speech and thatthis tracking is modulated by precision We obtain these resultsthrough quantifying surprisal and precision from naturalisticspeech using a deep neural network and through relating thesespeech features to EEG responses of human volunteers acquiredduring auditory story comprehension We find significant corticaltracking of surprisal at low frequencies including the delta bandas well as in the higher frequency beta and gamma bands andobserve that the tracking is modulated by the precision Ourresults pave the way to further investigate the neurobiology ofnatural speech comprehension

INTRODUCTION

To understand spoken language a listener must rapidlyprocess information that unfolds over several timescalesincluding the duration of syllables at around 150 msecwords of about 300 msec and phrases of 1 sec (Giraudamp Poeppel 2012) Recent studies have shown that corti-cal activity in the delta theta and gamma frequencybands tracks acoustic features of speech such as thespeech envelope as well as phonemic features (Dinget al 2018 Di Liberto OrsquoSullivan amp Lalor 2015 Dingamp Simon 2014 Zion Golumbic et al 2013 LakatosChen OrsquoConnell Mills amp Schroeder 2007) This corticaltracking of speech features has accordingly been pro-posed to reflect neural mechanisms of speech process-ing for instance an online segmentation of speech intoacoustic speech tokens such as phonemes that occur onthe timescale of a few hundreds of milliseconds (HyafilFontolan Kabdebon Gutkin amp Giraud 2015 Giraud ampPoeppel 2012)The processing of higher level linguistic information in

speech may employ cortical tracking as well Recent find-ings showed that cortical activity in the delta and thetafrequency bands synchronized to sequential cues suchas the rhythm of phrases and sentences in continuousspeech (Keitel Gross amp Kayser 2018 Ding MelloniZhang Tian amp Poeppel 2016) to hierarchical cues suchas context-free grammar structure (Brennan amp Hale

2019) as well as to the semantic dissimilarity betweensuccessive words (Broderick Anderson Di LibertoCrosse amp Lalor 2018)

An important property of word sequences is that theycan allow the prediction of an upcoming word resultingin a word expectation The degree to which a word canbe predicted is referred to as precision and reflects thecertainty with which a neural population generates itsprediction Predictions and precision are both closely re-lated to putative implementations of predictive process-ing (Heilbron amp Chait 2018 Kanai Komura Shipp ampFriston 2015 Feldman amp Friston 2010) Behavioral stud-ies have indeed corroborated that the brain makes pre-dictions about upcoming speech segments Words canbe better distinguished from noise when transition prob-abilities between words are high rather than low (MillerHeise amp Lichten 1951) and a highly expected word canbe perceived as heard even when obscured by noise(Miller amp Isard 1963)

Neurophysiological research on ERPs elicited by aword in a sentence has shown that the brain responseto a word reflects the word expectancy through modula-tion of the N400 response (Kutas amp Hillyard 1984)Although this response has not been found to be fur-ther modulated by the precision of the prediction(Federmeier Wlotko De Ochoa-Dewald amp Kutas2007) precision can influence the neural power in thealpha and theta bands (Rommers Dickson NortonWlotko amp Federmeier 2017) The power in the beta fre-quency band has been found to be reduced by semantic1Imperial College London 2Maastricht University

copy 2019 Massachusetts Institute of Technology Journal of Cognitive Neuroscience 321 pp 155ndash166httpsdoiorg101162jocn_a_01467

and syntactic violations and may therefore relate to wordexpectation as well (Kielar Meltzer Moreno Alain ampBialystok 2014 Bastiaansen Magyari amp Hagoort 2010Davidson amp Indefrey 2007) Gamma power has been ob-served to increase when a word is highly predictable butnot when its predictability is low (Molinaro Barraza ampCarreiras 2013 Wang Zhu amp Bastiaansen 2012)

However these prior studies on neural correlates ofword expectancy and precision have focused on specificwords in single sentences contrasting words with highand low expectancy as well as with high and low precisionBut natural speech often consists of many sentences andthe expectancy and the corresponding precision of suc-cessive words take a range of values that do not fall in onlytwo classes of ldquohighrdquo and ldquolowrdquo It therefore remains un-clear how neural responses to word expectancy and pre-cision correlate with this graded variability

Furthermore assessing the cortical responses to thelinguistic features of successive words in naturalisticstories allows to quantify the cortical tracking of these fea-tures A recent investigation on word predictability and hi-erarchical structure in naturalistic speech used such anapproach to show cortical tracking of word surprisal butdid not investigate an influence of precision and did notinvestigate power modulation in higher frequency bands(Brennan amp Hale 2019 Frank amp Willems 2017)

Here we therefore set out to investigate cortical track-ing including through power modulation in higher fre-quency bands of word surprisal and the precision ofword prediction in naturalistic stories The surprisal of aword denotes the log-transformed conditional probabilityof a word based on the preceding context The surprisalhas been argued to relate to processing load (Levy 2008)and predicts reading time (Frank Otten Galli ampVigliocco 2015 Smith amp Levy 2013) Precision is the in-verse of the entropy of the conditional probability distri-bution over a close vocabulary set We quantified wordsurprisal and precision from naturalistic stories using lan-guage modeling as estimated by a recurrent neural net-work and then related the obtained word features toEEG responses of volunteers who listened to the stories

METHODS

Participants

Thirteen participants (aged 25 plusmn 3 years six women) par-ticipated in the experiment The volunteers were allright-handed native English speakers They had no his-tory of hearing or neurological impairment All participantsprovided written informed consent The experimentalprocedures were approved by the Imperial CollegeResearch Ethics Committee

Experimental Design

We used naturalistic speech narratives in the participantsrsquonative language (English) The experiment consisted of one

session in which we measured EEG responses to the shortstories Gilrayrsquos Flower Pot and My Brother Henry by J MBarrie as well as An Undergraduatersquos Aunt by F Anstey(Patten 1910) The stimuli were sourced from the publicdomain librivoxorg and were spoken by a male voice Thecorresponding text was obtained from Project Gutenberg(wwwgutenbergorgebooks32846) The audio materialwas presented in 15 parts each of which were 26 plusmn043 min long The total length of the stories was 40 minAfter each part of a story participants answered compre-hension questions about what they just heard These ques-tions were presented as multiple-choice questions on amonitor Participants were asked 30 questions in total

Language Modeling

We used computational linguistics methods to quantifylinguistic features in the stories Specifically we em-ployed statistical language modeling to compute wordfrequency entropy and suprisal from the text of thestoriesWord frequency is a property of each individual word

out of context which was computed from GoogleN-grams by using only the unigram values This word fea-ture is an estimate of the unconditional probability of theoccurrence of a word w P(w) We use the negative log-arithm of this probability such that all our information-theoretical word features are expressed in the same unitBoth entropy and surprisal follow from conditional

probabilities of a particular word given the precedingwords We denote by P(wm|w1 hellip wmminus1) the condition-al probability of themth word in the sequence wm giventhe previous m minus 1 words w1 w2 hellip wmminus1 Taking thenegative logarithm of this probability yields the ldquosurpris-alrdquo value S(wm) for that word

S wmeth THORN frac14 minus log P wm w1 hellip wmminus1j THORNeth THORNeth (1)

The surprisal also referred to as self-information or in-formation content quantifies the information gain thatan upcoming word generates with respect to the priorsequence of words It can be related to how unexpecteda word is given the previous words in the sentenceInasmuch as surprisal informs about expected wordsprecision relates to the confidence about the predictionsmade (Koelsch Vuust amp Friston 2018) A high precisiontranslates into a high confidence about a word expecta-tion meaning that the word is predictableThe entropy E(m) of the prediction of the mth word

wm that is the uncertainty for predicting the word wm

from the context (w1 hellip wmminus1) is given by the sumof the conditional probabilities for each possible wordwk weighted by the logarithm of this probability Inother words the entropy is the expected surprisal

E meth THORN frac14X

wk

p wk w1 hellip wmminus1j THORN log P wk w1 hellipwmminus1j THORNeth frac12eth

(2)

156 Journal of Cognitive Neuroscience Volume 32 Number 1

The precision of the mth word wm follows as the inverseof entropy 1E(m) We note that the precision of the mthword is not a function of that word itself but of theprobability distribution of the words at that positionThe conditional probabilities for the different words in

the sequence given the preceding words were com-puted through a recurrent neural network languagemodel (Graves 2013 Bengio Ducharme Vincent ampJauvin 2003) The network had a hidden layer with recur-rent connections to encode previous input Such net-works are particularly useful for processing sequencesand have previously been successfully applied to languagemodeling (Graves 2013 Bengio et al 2003) In particulara recurrent neural network can capture long-term depen-dencies of variable length by encoding preceding wordsthrough its recurrent connection into the state of the hid-den neurons This is enabled by a careful balance betweenshort- and long-term memory and means that there is inprinciple no limit on the number of preceding words thatsuch a network can take into account (Pascanu Mikolov ampBengio 2013) This contrasts withN-gram languagemodelsfor instance that are limited to a context window ofNminus 1words (Brown Desouza Mercer Pietra amp Lai 1992)The network was implemented using the feature-

augmented recurrent neural network language modelingtoolkit (Mikolov Kombrink Burget Černockyacute ampKhudanpur 2011) To decrease the computational timerequired for training this toolbox assigns words to clas-ses and factorizes the output layer into a part that de-scribes the probability of each class given the previouswords as well as another part that describes the proba-bility of each word within a class given the previouswords This factorization yields a significant decrease intraining time at a small cost to accuracy importantlythe network still computes the probability of individualwords following the previous words (Mikolov et al2011) We employed 300 classes As an embedding layerwe used the pretrained global vectors for word represen-tation trained on the Wikipedia 2014 and the Gigaword5 data sets (Pennington Socher amp Manning 2014)The recurrent layer encompassed 350 hidden unitsThe source code was customized to compute the entropyof each word a feat that the original code did not allowThe neural network was then trained on the text8 data setthat consists of 100 MB of data from Wikipedia (Mahoney2011) using back propagation through time truncated tofive words with a starting learning rate of 01 The datawere cleaned to remove punctuation html tags capitali-zation and numbers before training Because the networkcan only train well on words that appear frequentlyenough in the training data to allow meaningful trainingwe limited the vocabulary to the 35000 most commonwords in the training data set The remaining words weremapped to an ldquounknownrdquo token Infrequent words in thestories such as compound nouns used for style that ap-peared repeatedly throughout the stories did thereforenot obscure the results

The output of the recurrent neural network was ob-tained from a softmax function and could therefore beinterpreted as the probability distribution for an upcom-ing word given the preceding words in the input se-quence The network was therefore trained to predictthe next word that is to compute an output that wasas close as possible to a probability distribution thatwas one for the actual upcoming word and zero for allremaining ones The trained network was then run onthe stories that the participants heard Precision and sur-prisal of each word were determined from the networkrsquoscomputed probability distribution at the correspondingword through Equations (1) and (2)

Speech Features

To relate surprisal and entropy to the EEG data we con-structed a time series for each linguistic feature We firstaligned each word of the speech to the acoustic signalthrough forced alignment using the Prosodylab-Alignersoftware (Gorman Howell amp Wagner 2011) We therebyobtained the time at which each word began To con-struct features for surprisal and for precision that werealigned with the speech stimuli we assigned each ofthe time points where a new word started a spike of amagnitude that corresponded to the surprisal and preci-sion of that word (Figure 1A) A similar procedure hasbeen employed recently for assessing neural responsesto the semantic dissimilarity of consecutive words(Broderick et al 2018)

Because surprisal and precision are high-level linguisticfeatures of speech we sought to ascertain that any puta-tive cortical tracking of them could not be explained bylower level features To this end we added three low-level speech features First cortical activity can trackthe onset of words which can partly be based on changesin the acoustics at word boundaries and partly result fromthe brainrsquos parsing of the acoustic signal to form discretelinguistic units (Brodbeck Presacco amp Simon 2018Ding amp Simon 2014) To account for this onset responsewe constructed a word onset feature as a series of spikeseach of which had unit amplitude and was located at theonset of a word Second we computed the word positionwithin a sentence The latter can be correlated with pre-cision as the entropy tends to decrease across wordswithin the sentence The word position feature thereforeserved as a control to ensure that the neural response toprecision is distinct from any incremental processing oc-curring throughout a sentence Third the frequency of aword in a given language outside its context is a linguis-tic feature that acts as a prior probability for computingthe probability of a word in a sequence (Brodbeck et al2018) Word frequency can also interfere with surprisalLess frequent words may indeed often bemore surprisingTo capture the share of the neural response that could beexplained away by word frequency we included the latteras a third linguistic feature This feature was computed by

Weissbart Kandylaki and Reichenbach 157

scaling the amplitude of the spike at each word onset bythe negative logarithmof the frequency of the correspond-ing word The logarithm was used such that word fre-quency and surprisal were expressed in the same units

Finally to investigate a possible modulating effect thatprecision may have on surprisal we added an interactionterm ldquoSurprisal times Precisionrdquo This was computed bymultiplying precision values with surprisal such that theinteraction feature effectively stands as a confidence-weighted version of surprisal

In summary we computed five speech features oneacoustic feature word onset and four linguistic featuresword position in its sentence word frequency precisionand surprisal To those we added the interaction term be-tween surprisal and precision Each feature was a time se-ries of spikes with each spike being located at the onsetof a word The amplitude of the spike was constant for theword onset feature For each other feature it was scaledto the corresponding value for each respective linguisticfeature All values of the different linguistic features werestandardized to have unit variance and zero mean

EEG Acquisition and Preprocessing

We recorded brain activity using 64 active electrodes(actiCAP BrainProducts) and a multichannel EEG ampli-fier (actiCHamp BrainProducts) The presented soundwas recorded simultaneously through an acoustic

adapter (Acoustical Stimulator Adapter and StimTrakBrainProducts) and was used for aligning the EEG record-ings to the audio signals Both the EEG and the audio datawere acquired at a sampling rate of 1 kHz The left earlobe was used as a reference for the EEGThe EEG data were processed by first applying an anti-

aliasing filter (Kaiser window finite impulse response[FIR] filter cutoff minus6 dB at 125 Hz transition bandwidth50 Hz order 130) and by downsampling the data to250 Hz to reduce the computation time of subsequentoperations A high-pass filter (Hanning window sincType I linear phase FIR filter cutoffminus6 dB at 03 Hz tran-sition bandwidth 015 Hz order 5168) was then appliedto every channel to remove nonstationary trends such asslow drifts and offsets Bad channels were identifiedusing the procedure clean_rawdata from the EEGLABplugin ASR (Artifact Subspace Reconstruction) they werethen removed and interpolated with spherical interpola-tion All channels were then referenced to the channelaverage We subsequently ran an independent compo-nent analysis (ICA) decomposition and removed artifactsfrom eye blink eyes movement as well as muscle motionby visual inspection of the ICA components The cleaneddata were low-pass filtered (Hamming window linearphase FIR filter cutoff minus6 dB at 62 Hz transition band-width 10 Hz order 138) and further down-sampled to125 Hz The filtered EEG data therefore contained thebroad frequency range from 03 to 62 Hz

Figure 1 Experimental overview (A) We employ continuous speech narratives and utilize speech processing as well as language modeling to extractacoustic and linguistic features namely word onset word frequency precision and surprisal (B) The participantrsquos neural activity is recordedthrough EEG while they listen to the stories (C) We extract TRFs for each of the four speech features through computing a linear model thatestimates the EEG recordings from the speech features

158 Journal of Cognitive Neuroscience Volume 32 Number 1

We computed temporal response functions (TRFs) fromEEG data in several frequency bands The TRFs followedfrom a linear forward model that expressed the EEG signalat each electrode as a linear combination of the speechfeatures shifted by different latencies (Broderick et al2018 Ding amp Simon 2012) We used FIR Type I filtersdesigned with the synced windowed method and em-ploying a hamming window We filtered the EEG data inseveral frequency bands of interest delta band (low-passfilter cutoff at 45 Hz filter order 132) theta band (band-pass filter cutoff frequencies at 4 Hz and 8 Hz order 206)alpha band (band-pass filter cutoff frequencies at 8 and12 Hz order 206) beta band (band-pass filter cutoff20 Hz and 30 Hz order 82) and gamma band (cutoff at30 and 60 Hz order 164) For every frequency band otherthan delta we computed the power modulation by takingthe absolute value of the Hilbert transform of the bandpassed data and further band-pass filtered it between 05and 20 Hz (filter order 824) to remove the DC offset andhigher frequencies that do not occur in the speech features

EEG Data Analysis

To relate the speech features to the EEG data we used alinear spatiotemporal forward model that reconstructedthe EEG recordings from the acoustic feature and the lin-guistic features shifted by different delays (Figure 1)Such an approach has recently been used successfullyfor assessing the cortical tracking of the speech envelopephonemic information as well as semantic dissimilarity ofwords in speech (Broderick et al 2018 Di Liberto et al2015 Ding amp Simon 2012) The coefficients resultingfrom this regression constitute the TRFs that inform onthe brainrsquos response to each feature at different latenciesIn particular the forward model sought to express the

preprocessed EEG recordings xi tneth THORNf gNjfrac141 of the N = 64channels at each time instance tn through the time series

yj tnminusτketh THORN Fjfrac141 of the F = 6 speech features word onset

word frequency word position word precision wordsurprisal and the product of surprisal and entropyshifted by T different delays τkf gTkfrac141

xi tneth THORN frac14X6

jfrac141

XT

kfrac141

βij τketh THORNyj tnminusτketh THORN (3)

We hereby considered equally spaced delays τkf gTkfrac141that ranged from minus400 to 1100 msec At the samplingrate of 125 Hz this yielded a number of T = 188 lagsThe obtained estimate for the EEG channel i is denotedby xi The coefficient βij(τk) is the TRF for the ith EEGchannel and speech feature j at the latency τk The pre-processed EEG recording xi tneth THORNf gNifrac141 was either the EEGsignal in the delta band or the power of the EEG signal inthe higher frequency bands We computed the TRFs foreach participant separately leading to a set of TRFs on

which we could apply group-level statistical analysis asdescribed below We then also computed the populationaverage of the participant-specific TRFs the populationaverages are shown in the figures

The different speech features that we employed werepartly correlated The largest correlation emerged be-tween surprisal and the interaction term ldquoSurprisal timesPrecisionrdquo at a value of 61 We wondered if these corre-lations would hinder the EEG analysis and in particular ifthey would obscure the neural responses to the individ-ual speech features through the linear regression analy-sis an issue known as multicollinearity (Chatterjee ampHadi 2015 Kumar 1975) A high multicollinearity be-tween features could result in higher variance or leakagebetween the coefficient βij(τk) However the FrischndashWaughndashLovell theorem from econometrics states that lin-ear regression based on correlated features yields thesame results as when the features are first orthogonal-ized that is decorrelated (Lovell 2008 Frisch ampWaugh 1933) In addition in our implementation ofthe multiple linear regression we used a singular valuedecomposition of the design matrix of time-lagged fea-tures resulting in transformed features that were mutu-ally uncorrelated (Klema amp Laub 1980) The correlationof the features was therefore not problematic The onlyissue that multicolinearity can cause is significantlyincreased variance for each βij(τk) estimate which typi-cally emerges when the variance inflation factor is above5 For our speech features we obtained variance inflationfactors between 122 and 225 indicating that increasednoise due to correlated features is not an issue

As an additional control that our TRFs did not containleakage from responses to different features we devel-oped a null model that was employed to assess the sta-tistical significance of the actual TRFs (see below) Thenull model was constructed such that a potential leakagebetween features would appear similarly both in theactual model and in the null model and therefore wouldnot result in statistically significant results It follows thatany statistically significant part in the TRFs that we ob-tained did not result from leakage between the features

Statistical Significance

To determine the statistical significance of the estimatedTRFs we determined chance-level TRFs as a null modelThe chance-level TRFs were computed by constructingunrelated speech features and by relating these to theEEG recordings in the same way as for the computationof the actual TRFs To establish chance-level linguisticTRFs only the linguistic information of interest containedin the spike amplitude of the speech features but not theacoustic information in the spike timing needed to be un-related to the EEG We therefore constructed unrelatedspeech features by keeping the timing of the spikes iden-tical to those in the true model The speech feature thatdescribed word onsets was therefore not altered

Weissbart Kandylaki and Reichenbach 159

However we changed the amplitude of the spikes for theother linguistic speech features by taking their valuesfrom an unrelated story that is a story that was notaligned with the EEG data To obtain a large number ofnull models we considered permutations of our 15 storyparts Through permutating entire story parts and not theorder of individual words the statistical relationship be-tween the linguistic features of successive words was con-served Because we kept the timing of the spikes in thenull model as in the actual stories the obtained nullmodel could only be used to determine the significanceof the neural responses to the linguistic features but notfor those to the acoustic word onset

The actual TRFs were then analyzed for statistical signif-icance through comparison to 1000 null models The com-parison was obtained from a permutation test togetherwith cluster-based correction for multiple comparison(Oostenveld Fries Maris amp Schoffelen 2011) where onlyclusters of at least four electrodes were kept Specificallywe used the function spatio_temporal_cluster_test fromthe MNE python library The statistic for each model coef-ficient at each electrode and each lag was computed usingthe empirical distribution formed by values from the nullmodels setting the threshold at the 99th percentile of thenull distribution The cluster-level p values were computedand we considered only clusters with a p value greater than0510WeherebyusedtheBonferronicorrectiontoaccountfor the 10 different tests that reflected the different fre-quency bands and the different linguistic features

Data Availability

The EEG data from all participants together with the cor-responding speech features are available on figsharecom (httpsdoiorg106084m9figshare9033983v1)An exemplary script for computing TRFs can be obtainedfrom figshare as well (httpsdoiorg106084m9figshare9034481v1)

RESULTS

Behavioral Assessment

We first assessed to what degree the participants under-stood the stories through asking them comprehensionquestions These questions were answered with an aver-age of 96 accuracy evidencing that the volunteers con-sistently understood the speech and paid attention

Cortical Tracking of Acoustic and LinguisticSpeech Features

The cortical tracking of the speech features can be foundin different frequency bands First because all four fea-tures relate to words the frequency range of the featuresis similar to the rate of words in speech The latter isabout 1ndash4 Hz and corresponds to the delta frequency

range Cortical activity at low frequencies including thedelta frequency band can therefore be evoked by orentrain to the rhythm set by the acoustic and linguisticword features Second the amplitude of the neural activ-ity in higher frequency bands can be modulated by thespeech features This may in particular occur for thetheta band (4ndash8 Hz) the alpha band (8ndash12 Hz) the betafrequency band (20ndash30 Hz) and the gamma frequencyband (30ndash100 Hz) the power of which can be modulatedby prediction in sentence comprehension (Wang et al2012 Weiss amp Mueller 2012 Bastiaansen et al 2010Bastiaansen amp Hagoort 2006)We started by quantifying the neural tracking of the

word features at low frequencies We found neural re-sponses to word frequency between delays of 300 and610 msec (Figure 2) The topographic plots of the re-sponses show large differences between the temporalscalp areas on the one hand and the parietal and occipitalareas on the other handImportantly we found significant responses to the

word surprisal around a delay of 450 msec (Figure 2)These responses emerged predominantly in the EEGchannels on the temporal and occipital scalp areas andwere lateralized on the left hemisphere Precision wastracked by cortical activity at delays of around 100 msecand around 500 msec Moreover we observed a signifi-cant neural response to the interaction of surprisal andprecision at an earlier latency of around 400 msec andat a longer latency of around 1000 msecWe also computed the modulation of the power in the

theta band the alpha band the beta band as well as thegamma band by the acoustic and linguistic features(Figures 4 and 5) Although the power in the alpha bandwas not significantly related to the linguistic features thepower in the theta band was shaped by word frequencyat delays of around 300 msec and around 1000 msec(Figure 3) Furthermore the power in the theta bandwas significantly decreased by precision at delays ofabout 700 msecThe power in the beta band correlated positively with

surprisal at delays of around 700 and 1000 msec (Figure 4)At the latter delay the influence of surprisal was strongestat the left temporal channels Moreover the power in thebeta band was modulated by precision at a delay of about700 msec with the main contributions coming from theoccipital channelsThe power in the gamma band was increased by words

with higher surprisal at long latency of around 1000 msecmainly for the left temporal channels (Figure 5) The in-teraction of surprisal and precision shaped the gammapower as well at the early delay of about 0 msec

DISCUSSION

We have shown that cortical activity tracks the surprisal ofwords in speech comprehension Such cortical tracking

160 Journal of Cognitive Neuroscience Volume 32 Number 1

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 2: Cortical Tracking of Surprisal during Continuous Speech

and syntactic violations and may therefore relate to wordexpectation as well (Kielar Meltzer Moreno Alain ampBialystok 2014 Bastiaansen Magyari amp Hagoort 2010Davidson amp Indefrey 2007) Gamma power has been ob-served to increase when a word is highly predictable butnot when its predictability is low (Molinaro Barraza ampCarreiras 2013 Wang Zhu amp Bastiaansen 2012)

However these prior studies on neural correlates ofword expectancy and precision have focused on specificwords in single sentences contrasting words with highand low expectancy as well as with high and low precisionBut natural speech often consists of many sentences andthe expectancy and the corresponding precision of suc-cessive words take a range of values that do not fall in onlytwo classes of ldquohighrdquo and ldquolowrdquo It therefore remains un-clear how neural responses to word expectancy and pre-cision correlate with this graded variability

Furthermore assessing the cortical responses to thelinguistic features of successive words in naturalisticstories allows to quantify the cortical tracking of these fea-tures A recent investigation on word predictability and hi-erarchical structure in naturalistic speech used such anapproach to show cortical tracking of word surprisal butdid not investigate an influence of precision and did notinvestigate power modulation in higher frequency bands(Brennan amp Hale 2019 Frank amp Willems 2017)

Here we therefore set out to investigate cortical track-ing including through power modulation in higher fre-quency bands of word surprisal and the precision ofword prediction in naturalistic stories The surprisal of aword denotes the log-transformed conditional probabilityof a word based on the preceding context The surprisalhas been argued to relate to processing load (Levy 2008)and predicts reading time (Frank Otten Galli ampVigliocco 2015 Smith amp Levy 2013) Precision is the in-verse of the entropy of the conditional probability distri-bution over a close vocabulary set We quantified wordsurprisal and precision from naturalistic stories using lan-guage modeling as estimated by a recurrent neural net-work and then related the obtained word features toEEG responses of volunteers who listened to the stories

METHODS

Participants

Thirteen participants (aged 25 plusmn 3 years six women) par-ticipated in the experiment The volunteers were allright-handed native English speakers They had no his-tory of hearing or neurological impairment All participantsprovided written informed consent The experimentalprocedures were approved by the Imperial CollegeResearch Ethics Committee

Experimental Design

We used naturalistic speech narratives in the participantsrsquonative language (English) The experiment consisted of one

session in which we measured EEG responses to the shortstories Gilrayrsquos Flower Pot and My Brother Henry by J MBarrie as well as An Undergraduatersquos Aunt by F Anstey(Patten 1910) The stimuli were sourced from the publicdomain librivoxorg and were spoken by a male voice Thecorresponding text was obtained from Project Gutenberg(wwwgutenbergorgebooks32846) The audio materialwas presented in 15 parts each of which were 26 plusmn043 min long The total length of the stories was 40 minAfter each part of a story participants answered compre-hension questions about what they just heard These ques-tions were presented as multiple-choice questions on amonitor Participants were asked 30 questions in total

Language Modeling

We used computational linguistics methods to quantifylinguistic features in the stories Specifically we em-ployed statistical language modeling to compute wordfrequency entropy and suprisal from the text of thestoriesWord frequency is a property of each individual word

out of context which was computed from GoogleN-grams by using only the unigram values This word fea-ture is an estimate of the unconditional probability of theoccurrence of a word w P(w) We use the negative log-arithm of this probability such that all our information-theoretical word features are expressed in the same unitBoth entropy and surprisal follow from conditional

probabilities of a particular word given the precedingwords We denote by P(wm|w1 hellip wmminus1) the condition-al probability of themth word in the sequence wm giventhe previous m minus 1 words w1 w2 hellip wmminus1 Taking thenegative logarithm of this probability yields the ldquosurpris-alrdquo value S(wm) for that word

S wmeth THORN frac14 minus log P wm w1 hellip wmminus1j THORNeth THORNeth (1)

The surprisal also referred to as self-information or in-formation content quantifies the information gain thatan upcoming word generates with respect to the priorsequence of words It can be related to how unexpecteda word is given the previous words in the sentenceInasmuch as surprisal informs about expected wordsprecision relates to the confidence about the predictionsmade (Koelsch Vuust amp Friston 2018) A high precisiontranslates into a high confidence about a word expecta-tion meaning that the word is predictableThe entropy E(m) of the prediction of the mth word

wm that is the uncertainty for predicting the word wm

from the context (w1 hellip wmminus1) is given by the sumof the conditional probabilities for each possible wordwk weighted by the logarithm of this probability Inother words the entropy is the expected surprisal

E meth THORN frac14X

wk

p wk w1 hellip wmminus1j THORN log P wk w1 hellipwmminus1j THORNeth frac12eth

(2)

156 Journal of Cognitive Neuroscience Volume 32 Number 1

The precision of the mth word wm follows as the inverseof entropy 1E(m) We note that the precision of the mthword is not a function of that word itself but of theprobability distribution of the words at that positionThe conditional probabilities for the different words in

the sequence given the preceding words were com-puted through a recurrent neural network languagemodel (Graves 2013 Bengio Ducharme Vincent ampJauvin 2003) The network had a hidden layer with recur-rent connections to encode previous input Such net-works are particularly useful for processing sequencesand have previously been successfully applied to languagemodeling (Graves 2013 Bengio et al 2003) In particulara recurrent neural network can capture long-term depen-dencies of variable length by encoding preceding wordsthrough its recurrent connection into the state of the hid-den neurons This is enabled by a careful balance betweenshort- and long-term memory and means that there is inprinciple no limit on the number of preceding words thatsuch a network can take into account (Pascanu Mikolov ampBengio 2013) This contrasts withN-gram languagemodelsfor instance that are limited to a context window ofNminus 1words (Brown Desouza Mercer Pietra amp Lai 1992)The network was implemented using the feature-

augmented recurrent neural network language modelingtoolkit (Mikolov Kombrink Burget Černockyacute ampKhudanpur 2011) To decrease the computational timerequired for training this toolbox assigns words to clas-ses and factorizes the output layer into a part that de-scribes the probability of each class given the previouswords as well as another part that describes the proba-bility of each word within a class given the previouswords This factorization yields a significant decrease intraining time at a small cost to accuracy importantlythe network still computes the probability of individualwords following the previous words (Mikolov et al2011) We employed 300 classes As an embedding layerwe used the pretrained global vectors for word represen-tation trained on the Wikipedia 2014 and the Gigaword5 data sets (Pennington Socher amp Manning 2014)The recurrent layer encompassed 350 hidden unitsThe source code was customized to compute the entropyof each word a feat that the original code did not allowThe neural network was then trained on the text8 data setthat consists of 100 MB of data from Wikipedia (Mahoney2011) using back propagation through time truncated tofive words with a starting learning rate of 01 The datawere cleaned to remove punctuation html tags capitali-zation and numbers before training Because the networkcan only train well on words that appear frequentlyenough in the training data to allow meaningful trainingwe limited the vocabulary to the 35000 most commonwords in the training data set The remaining words weremapped to an ldquounknownrdquo token Infrequent words in thestories such as compound nouns used for style that ap-peared repeatedly throughout the stories did thereforenot obscure the results

The output of the recurrent neural network was ob-tained from a softmax function and could therefore beinterpreted as the probability distribution for an upcom-ing word given the preceding words in the input se-quence The network was therefore trained to predictthe next word that is to compute an output that wasas close as possible to a probability distribution thatwas one for the actual upcoming word and zero for allremaining ones The trained network was then run onthe stories that the participants heard Precision and sur-prisal of each word were determined from the networkrsquoscomputed probability distribution at the correspondingword through Equations (1) and (2)

Speech Features

To relate surprisal and entropy to the EEG data we con-structed a time series for each linguistic feature We firstaligned each word of the speech to the acoustic signalthrough forced alignment using the Prosodylab-Alignersoftware (Gorman Howell amp Wagner 2011) We therebyobtained the time at which each word began To con-struct features for surprisal and for precision that werealigned with the speech stimuli we assigned each ofthe time points where a new word started a spike of amagnitude that corresponded to the surprisal and preci-sion of that word (Figure 1A) A similar procedure hasbeen employed recently for assessing neural responsesto the semantic dissimilarity of consecutive words(Broderick et al 2018)

Because surprisal and precision are high-level linguisticfeatures of speech we sought to ascertain that any puta-tive cortical tracking of them could not be explained bylower level features To this end we added three low-level speech features First cortical activity can trackthe onset of words which can partly be based on changesin the acoustics at word boundaries and partly result fromthe brainrsquos parsing of the acoustic signal to form discretelinguistic units (Brodbeck Presacco amp Simon 2018Ding amp Simon 2014) To account for this onset responsewe constructed a word onset feature as a series of spikeseach of which had unit amplitude and was located at theonset of a word Second we computed the word positionwithin a sentence The latter can be correlated with pre-cision as the entropy tends to decrease across wordswithin the sentence The word position feature thereforeserved as a control to ensure that the neural response toprecision is distinct from any incremental processing oc-curring throughout a sentence Third the frequency of aword in a given language outside its context is a linguis-tic feature that acts as a prior probability for computingthe probability of a word in a sequence (Brodbeck et al2018) Word frequency can also interfere with surprisalLess frequent words may indeed often bemore surprisingTo capture the share of the neural response that could beexplained away by word frequency we included the latteras a third linguistic feature This feature was computed by

Weissbart Kandylaki and Reichenbach 157

scaling the amplitude of the spike at each word onset bythe negative logarithmof the frequency of the correspond-ing word The logarithm was used such that word fre-quency and surprisal were expressed in the same units

Finally to investigate a possible modulating effect thatprecision may have on surprisal we added an interactionterm ldquoSurprisal times Precisionrdquo This was computed bymultiplying precision values with surprisal such that theinteraction feature effectively stands as a confidence-weighted version of surprisal

In summary we computed five speech features oneacoustic feature word onset and four linguistic featuresword position in its sentence word frequency precisionand surprisal To those we added the interaction term be-tween surprisal and precision Each feature was a time se-ries of spikes with each spike being located at the onsetof a word The amplitude of the spike was constant for theword onset feature For each other feature it was scaledto the corresponding value for each respective linguisticfeature All values of the different linguistic features werestandardized to have unit variance and zero mean

EEG Acquisition and Preprocessing

We recorded brain activity using 64 active electrodes(actiCAP BrainProducts) and a multichannel EEG ampli-fier (actiCHamp BrainProducts) The presented soundwas recorded simultaneously through an acoustic

adapter (Acoustical Stimulator Adapter and StimTrakBrainProducts) and was used for aligning the EEG record-ings to the audio signals Both the EEG and the audio datawere acquired at a sampling rate of 1 kHz The left earlobe was used as a reference for the EEGThe EEG data were processed by first applying an anti-

aliasing filter (Kaiser window finite impulse response[FIR] filter cutoff minus6 dB at 125 Hz transition bandwidth50 Hz order 130) and by downsampling the data to250 Hz to reduce the computation time of subsequentoperations A high-pass filter (Hanning window sincType I linear phase FIR filter cutoffminus6 dB at 03 Hz tran-sition bandwidth 015 Hz order 5168) was then appliedto every channel to remove nonstationary trends such asslow drifts and offsets Bad channels were identifiedusing the procedure clean_rawdata from the EEGLABplugin ASR (Artifact Subspace Reconstruction) they werethen removed and interpolated with spherical interpola-tion All channels were then referenced to the channelaverage We subsequently ran an independent compo-nent analysis (ICA) decomposition and removed artifactsfrom eye blink eyes movement as well as muscle motionby visual inspection of the ICA components The cleaneddata were low-pass filtered (Hamming window linearphase FIR filter cutoff minus6 dB at 62 Hz transition band-width 10 Hz order 138) and further down-sampled to125 Hz The filtered EEG data therefore contained thebroad frequency range from 03 to 62 Hz

Figure 1 Experimental overview (A) We employ continuous speech narratives and utilize speech processing as well as language modeling to extractacoustic and linguistic features namely word onset word frequency precision and surprisal (B) The participantrsquos neural activity is recordedthrough EEG while they listen to the stories (C) We extract TRFs for each of the four speech features through computing a linear model thatestimates the EEG recordings from the speech features

158 Journal of Cognitive Neuroscience Volume 32 Number 1

We computed temporal response functions (TRFs) fromEEG data in several frequency bands The TRFs followedfrom a linear forward model that expressed the EEG signalat each electrode as a linear combination of the speechfeatures shifted by different latencies (Broderick et al2018 Ding amp Simon 2012) We used FIR Type I filtersdesigned with the synced windowed method and em-ploying a hamming window We filtered the EEG data inseveral frequency bands of interest delta band (low-passfilter cutoff at 45 Hz filter order 132) theta band (band-pass filter cutoff frequencies at 4 Hz and 8 Hz order 206)alpha band (band-pass filter cutoff frequencies at 8 and12 Hz order 206) beta band (band-pass filter cutoff20 Hz and 30 Hz order 82) and gamma band (cutoff at30 and 60 Hz order 164) For every frequency band otherthan delta we computed the power modulation by takingthe absolute value of the Hilbert transform of the bandpassed data and further band-pass filtered it between 05and 20 Hz (filter order 824) to remove the DC offset andhigher frequencies that do not occur in the speech features

EEG Data Analysis

To relate the speech features to the EEG data we used alinear spatiotemporal forward model that reconstructedthe EEG recordings from the acoustic feature and the lin-guistic features shifted by different delays (Figure 1)Such an approach has recently been used successfullyfor assessing the cortical tracking of the speech envelopephonemic information as well as semantic dissimilarity ofwords in speech (Broderick et al 2018 Di Liberto et al2015 Ding amp Simon 2012) The coefficients resultingfrom this regression constitute the TRFs that inform onthe brainrsquos response to each feature at different latenciesIn particular the forward model sought to express the

preprocessed EEG recordings xi tneth THORNf gNjfrac141 of the N = 64channels at each time instance tn through the time series

yj tnminusτketh THORN Fjfrac141 of the F = 6 speech features word onset

word frequency word position word precision wordsurprisal and the product of surprisal and entropyshifted by T different delays τkf gTkfrac141

xi tneth THORN frac14X6

jfrac141

XT

kfrac141

βij τketh THORNyj tnminusτketh THORN (3)

We hereby considered equally spaced delays τkf gTkfrac141that ranged from minus400 to 1100 msec At the samplingrate of 125 Hz this yielded a number of T = 188 lagsThe obtained estimate for the EEG channel i is denotedby xi The coefficient βij(τk) is the TRF for the ith EEGchannel and speech feature j at the latency τk The pre-processed EEG recording xi tneth THORNf gNifrac141 was either the EEGsignal in the delta band or the power of the EEG signal inthe higher frequency bands We computed the TRFs foreach participant separately leading to a set of TRFs on

which we could apply group-level statistical analysis asdescribed below We then also computed the populationaverage of the participant-specific TRFs the populationaverages are shown in the figures

The different speech features that we employed werepartly correlated The largest correlation emerged be-tween surprisal and the interaction term ldquoSurprisal timesPrecisionrdquo at a value of 61 We wondered if these corre-lations would hinder the EEG analysis and in particular ifthey would obscure the neural responses to the individ-ual speech features through the linear regression analy-sis an issue known as multicollinearity (Chatterjee ampHadi 2015 Kumar 1975) A high multicollinearity be-tween features could result in higher variance or leakagebetween the coefficient βij(τk) However the FrischndashWaughndashLovell theorem from econometrics states that lin-ear regression based on correlated features yields thesame results as when the features are first orthogonal-ized that is decorrelated (Lovell 2008 Frisch ampWaugh 1933) In addition in our implementation ofthe multiple linear regression we used a singular valuedecomposition of the design matrix of time-lagged fea-tures resulting in transformed features that were mutu-ally uncorrelated (Klema amp Laub 1980) The correlationof the features was therefore not problematic The onlyissue that multicolinearity can cause is significantlyincreased variance for each βij(τk) estimate which typi-cally emerges when the variance inflation factor is above5 For our speech features we obtained variance inflationfactors between 122 and 225 indicating that increasednoise due to correlated features is not an issue

As an additional control that our TRFs did not containleakage from responses to different features we devel-oped a null model that was employed to assess the sta-tistical significance of the actual TRFs (see below) Thenull model was constructed such that a potential leakagebetween features would appear similarly both in theactual model and in the null model and therefore wouldnot result in statistically significant results It follows thatany statistically significant part in the TRFs that we ob-tained did not result from leakage between the features

Statistical Significance

To determine the statistical significance of the estimatedTRFs we determined chance-level TRFs as a null modelThe chance-level TRFs were computed by constructingunrelated speech features and by relating these to theEEG recordings in the same way as for the computationof the actual TRFs To establish chance-level linguisticTRFs only the linguistic information of interest containedin the spike amplitude of the speech features but not theacoustic information in the spike timing needed to be un-related to the EEG We therefore constructed unrelatedspeech features by keeping the timing of the spikes iden-tical to those in the true model The speech feature thatdescribed word onsets was therefore not altered

Weissbart Kandylaki and Reichenbach 159

However we changed the amplitude of the spikes for theother linguistic speech features by taking their valuesfrom an unrelated story that is a story that was notaligned with the EEG data To obtain a large number ofnull models we considered permutations of our 15 storyparts Through permutating entire story parts and not theorder of individual words the statistical relationship be-tween the linguistic features of successive words was con-served Because we kept the timing of the spikes in thenull model as in the actual stories the obtained nullmodel could only be used to determine the significanceof the neural responses to the linguistic features but notfor those to the acoustic word onset

The actual TRFs were then analyzed for statistical signif-icance through comparison to 1000 null models The com-parison was obtained from a permutation test togetherwith cluster-based correction for multiple comparison(Oostenveld Fries Maris amp Schoffelen 2011) where onlyclusters of at least four electrodes were kept Specificallywe used the function spatio_temporal_cluster_test fromthe MNE python library The statistic for each model coef-ficient at each electrode and each lag was computed usingthe empirical distribution formed by values from the nullmodels setting the threshold at the 99th percentile of thenull distribution The cluster-level p values were computedand we considered only clusters with a p value greater than0510WeherebyusedtheBonferronicorrectiontoaccountfor the 10 different tests that reflected the different fre-quency bands and the different linguistic features

Data Availability

The EEG data from all participants together with the cor-responding speech features are available on figsharecom (httpsdoiorg106084m9figshare9033983v1)An exemplary script for computing TRFs can be obtainedfrom figshare as well (httpsdoiorg106084m9figshare9034481v1)

RESULTS

Behavioral Assessment

We first assessed to what degree the participants under-stood the stories through asking them comprehensionquestions These questions were answered with an aver-age of 96 accuracy evidencing that the volunteers con-sistently understood the speech and paid attention

Cortical Tracking of Acoustic and LinguisticSpeech Features

The cortical tracking of the speech features can be foundin different frequency bands First because all four fea-tures relate to words the frequency range of the featuresis similar to the rate of words in speech The latter isabout 1ndash4 Hz and corresponds to the delta frequency

range Cortical activity at low frequencies including thedelta frequency band can therefore be evoked by orentrain to the rhythm set by the acoustic and linguisticword features Second the amplitude of the neural activ-ity in higher frequency bands can be modulated by thespeech features This may in particular occur for thetheta band (4ndash8 Hz) the alpha band (8ndash12 Hz) the betafrequency band (20ndash30 Hz) and the gamma frequencyband (30ndash100 Hz) the power of which can be modulatedby prediction in sentence comprehension (Wang et al2012 Weiss amp Mueller 2012 Bastiaansen et al 2010Bastiaansen amp Hagoort 2006)We started by quantifying the neural tracking of the

word features at low frequencies We found neural re-sponses to word frequency between delays of 300 and610 msec (Figure 2) The topographic plots of the re-sponses show large differences between the temporalscalp areas on the one hand and the parietal and occipitalareas on the other handImportantly we found significant responses to the

word surprisal around a delay of 450 msec (Figure 2)These responses emerged predominantly in the EEGchannels on the temporal and occipital scalp areas andwere lateralized on the left hemisphere Precision wastracked by cortical activity at delays of around 100 msecand around 500 msec Moreover we observed a signifi-cant neural response to the interaction of surprisal andprecision at an earlier latency of around 400 msec andat a longer latency of around 1000 msecWe also computed the modulation of the power in the

theta band the alpha band the beta band as well as thegamma band by the acoustic and linguistic features(Figures 4 and 5) Although the power in the alpha bandwas not significantly related to the linguistic features thepower in the theta band was shaped by word frequencyat delays of around 300 msec and around 1000 msec(Figure 3) Furthermore the power in the theta bandwas significantly decreased by precision at delays ofabout 700 msecThe power in the beta band correlated positively with

surprisal at delays of around 700 and 1000 msec (Figure 4)At the latter delay the influence of surprisal was strongestat the left temporal channels Moreover the power in thebeta band was modulated by precision at a delay of about700 msec with the main contributions coming from theoccipital channelsThe power in the gamma band was increased by words

with higher surprisal at long latency of around 1000 msecmainly for the left temporal channels (Figure 5) The in-teraction of surprisal and precision shaped the gammapower as well at the early delay of about 0 msec

DISCUSSION

We have shown that cortical activity tracks the surprisal ofwords in speech comprehension Such cortical tracking

160 Journal of Cognitive Neuroscience Volume 32 Number 1

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 3: Cortical Tracking of Surprisal during Continuous Speech

The precision of the mth word wm follows as the inverseof entropy 1E(m) We note that the precision of the mthword is not a function of that word itself but of theprobability distribution of the words at that positionThe conditional probabilities for the different words in

the sequence given the preceding words were com-puted through a recurrent neural network languagemodel (Graves 2013 Bengio Ducharme Vincent ampJauvin 2003) The network had a hidden layer with recur-rent connections to encode previous input Such net-works are particularly useful for processing sequencesand have previously been successfully applied to languagemodeling (Graves 2013 Bengio et al 2003) In particulara recurrent neural network can capture long-term depen-dencies of variable length by encoding preceding wordsthrough its recurrent connection into the state of the hid-den neurons This is enabled by a careful balance betweenshort- and long-term memory and means that there is inprinciple no limit on the number of preceding words thatsuch a network can take into account (Pascanu Mikolov ampBengio 2013) This contrasts withN-gram languagemodelsfor instance that are limited to a context window ofNminus 1words (Brown Desouza Mercer Pietra amp Lai 1992)The network was implemented using the feature-

augmented recurrent neural network language modelingtoolkit (Mikolov Kombrink Burget Černockyacute ampKhudanpur 2011) To decrease the computational timerequired for training this toolbox assigns words to clas-ses and factorizes the output layer into a part that de-scribes the probability of each class given the previouswords as well as another part that describes the proba-bility of each word within a class given the previouswords This factorization yields a significant decrease intraining time at a small cost to accuracy importantlythe network still computes the probability of individualwords following the previous words (Mikolov et al2011) We employed 300 classes As an embedding layerwe used the pretrained global vectors for word represen-tation trained on the Wikipedia 2014 and the Gigaword5 data sets (Pennington Socher amp Manning 2014)The recurrent layer encompassed 350 hidden unitsThe source code was customized to compute the entropyof each word a feat that the original code did not allowThe neural network was then trained on the text8 data setthat consists of 100 MB of data from Wikipedia (Mahoney2011) using back propagation through time truncated tofive words with a starting learning rate of 01 The datawere cleaned to remove punctuation html tags capitali-zation and numbers before training Because the networkcan only train well on words that appear frequentlyenough in the training data to allow meaningful trainingwe limited the vocabulary to the 35000 most commonwords in the training data set The remaining words weremapped to an ldquounknownrdquo token Infrequent words in thestories such as compound nouns used for style that ap-peared repeatedly throughout the stories did thereforenot obscure the results

The output of the recurrent neural network was ob-tained from a softmax function and could therefore beinterpreted as the probability distribution for an upcom-ing word given the preceding words in the input se-quence The network was therefore trained to predictthe next word that is to compute an output that wasas close as possible to a probability distribution thatwas one for the actual upcoming word and zero for allremaining ones The trained network was then run onthe stories that the participants heard Precision and sur-prisal of each word were determined from the networkrsquoscomputed probability distribution at the correspondingword through Equations (1) and (2)

Speech Features

To relate surprisal and entropy to the EEG data we con-structed a time series for each linguistic feature We firstaligned each word of the speech to the acoustic signalthrough forced alignment using the Prosodylab-Alignersoftware (Gorman Howell amp Wagner 2011) We therebyobtained the time at which each word began To con-struct features for surprisal and for precision that werealigned with the speech stimuli we assigned each ofthe time points where a new word started a spike of amagnitude that corresponded to the surprisal and preci-sion of that word (Figure 1A) A similar procedure hasbeen employed recently for assessing neural responsesto the semantic dissimilarity of consecutive words(Broderick et al 2018)

Because surprisal and precision are high-level linguisticfeatures of speech we sought to ascertain that any puta-tive cortical tracking of them could not be explained bylower level features To this end we added three low-level speech features First cortical activity can trackthe onset of words which can partly be based on changesin the acoustics at word boundaries and partly result fromthe brainrsquos parsing of the acoustic signal to form discretelinguistic units (Brodbeck Presacco amp Simon 2018Ding amp Simon 2014) To account for this onset responsewe constructed a word onset feature as a series of spikeseach of which had unit amplitude and was located at theonset of a word Second we computed the word positionwithin a sentence The latter can be correlated with pre-cision as the entropy tends to decrease across wordswithin the sentence The word position feature thereforeserved as a control to ensure that the neural response toprecision is distinct from any incremental processing oc-curring throughout a sentence Third the frequency of aword in a given language outside its context is a linguis-tic feature that acts as a prior probability for computingthe probability of a word in a sequence (Brodbeck et al2018) Word frequency can also interfere with surprisalLess frequent words may indeed often bemore surprisingTo capture the share of the neural response that could beexplained away by word frequency we included the latteras a third linguistic feature This feature was computed by

Weissbart Kandylaki and Reichenbach 157

scaling the amplitude of the spike at each word onset bythe negative logarithmof the frequency of the correspond-ing word The logarithm was used such that word fre-quency and surprisal were expressed in the same units

Finally to investigate a possible modulating effect thatprecision may have on surprisal we added an interactionterm ldquoSurprisal times Precisionrdquo This was computed bymultiplying precision values with surprisal such that theinteraction feature effectively stands as a confidence-weighted version of surprisal

In summary we computed five speech features oneacoustic feature word onset and four linguistic featuresword position in its sentence word frequency precisionand surprisal To those we added the interaction term be-tween surprisal and precision Each feature was a time se-ries of spikes with each spike being located at the onsetof a word The amplitude of the spike was constant for theword onset feature For each other feature it was scaledto the corresponding value for each respective linguisticfeature All values of the different linguistic features werestandardized to have unit variance and zero mean

EEG Acquisition and Preprocessing

We recorded brain activity using 64 active electrodes(actiCAP BrainProducts) and a multichannel EEG ampli-fier (actiCHamp BrainProducts) The presented soundwas recorded simultaneously through an acoustic

adapter (Acoustical Stimulator Adapter and StimTrakBrainProducts) and was used for aligning the EEG record-ings to the audio signals Both the EEG and the audio datawere acquired at a sampling rate of 1 kHz The left earlobe was used as a reference for the EEGThe EEG data were processed by first applying an anti-

aliasing filter (Kaiser window finite impulse response[FIR] filter cutoff minus6 dB at 125 Hz transition bandwidth50 Hz order 130) and by downsampling the data to250 Hz to reduce the computation time of subsequentoperations A high-pass filter (Hanning window sincType I linear phase FIR filter cutoffminus6 dB at 03 Hz tran-sition bandwidth 015 Hz order 5168) was then appliedto every channel to remove nonstationary trends such asslow drifts and offsets Bad channels were identifiedusing the procedure clean_rawdata from the EEGLABplugin ASR (Artifact Subspace Reconstruction) they werethen removed and interpolated with spherical interpola-tion All channels were then referenced to the channelaverage We subsequently ran an independent compo-nent analysis (ICA) decomposition and removed artifactsfrom eye blink eyes movement as well as muscle motionby visual inspection of the ICA components The cleaneddata were low-pass filtered (Hamming window linearphase FIR filter cutoff minus6 dB at 62 Hz transition band-width 10 Hz order 138) and further down-sampled to125 Hz The filtered EEG data therefore contained thebroad frequency range from 03 to 62 Hz

Figure 1 Experimental overview (A) We employ continuous speech narratives and utilize speech processing as well as language modeling to extractacoustic and linguistic features namely word onset word frequency precision and surprisal (B) The participantrsquos neural activity is recordedthrough EEG while they listen to the stories (C) We extract TRFs for each of the four speech features through computing a linear model thatestimates the EEG recordings from the speech features

158 Journal of Cognitive Neuroscience Volume 32 Number 1

We computed temporal response functions (TRFs) fromEEG data in several frequency bands The TRFs followedfrom a linear forward model that expressed the EEG signalat each electrode as a linear combination of the speechfeatures shifted by different latencies (Broderick et al2018 Ding amp Simon 2012) We used FIR Type I filtersdesigned with the synced windowed method and em-ploying a hamming window We filtered the EEG data inseveral frequency bands of interest delta band (low-passfilter cutoff at 45 Hz filter order 132) theta band (band-pass filter cutoff frequencies at 4 Hz and 8 Hz order 206)alpha band (band-pass filter cutoff frequencies at 8 and12 Hz order 206) beta band (band-pass filter cutoff20 Hz and 30 Hz order 82) and gamma band (cutoff at30 and 60 Hz order 164) For every frequency band otherthan delta we computed the power modulation by takingthe absolute value of the Hilbert transform of the bandpassed data and further band-pass filtered it between 05and 20 Hz (filter order 824) to remove the DC offset andhigher frequencies that do not occur in the speech features

EEG Data Analysis

To relate the speech features to the EEG data we used alinear spatiotemporal forward model that reconstructedthe EEG recordings from the acoustic feature and the lin-guistic features shifted by different delays (Figure 1)Such an approach has recently been used successfullyfor assessing the cortical tracking of the speech envelopephonemic information as well as semantic dissimilarity ofwords in speech (Broderick et al 2018 Di Liberto et al2015 Ding amp Simon 2012) The coefficients resultingfrom this regression constitute the TRFs that inform onthe brainrsquos response to each feature at different latenciesIn particular the forward model sought to express the

preprocessed EEG recordings xi tneth THORNf gNjfrac141 of the N = 64channels at each time instance tn through the time series

yj tnminusτketh THORN Fjfrac141 of the F = 6 speech features word onset

word frequency word position word precision wordsurprisal and the product of surprisal and entropyshifted by T different delays τkf gTkfrac141

xi tneth THORN frac14X6

jfrac141

XT

kfrac141

βij τketh THORNyj tnminusτketh THORN (3)

We hereby considered equally spaced delays τkf gTkfrac141that ranged from minus400 to 1100 msec At the samplingrate of 125 Hz this yielded a number of T = 188 lagsThe obtained estimate for the EEG channel i is denotedby xi The coefficient βij(τk) is the TRF for the ith EEGchannel and speech feature j at the latency τk The pre-processed EEG recording xi tneth THORNf gNifrac141 was either the EEGsignal in the delta band or the power of the EEG signal inthe higher frequency bands We computed the TRFs foreach participant separately leading to a set of TRFs on

which we could apply group-level statistical analysis asdescribed below We then also computed the populationaverage of the participant-specific TRFs the populationaverages are shown in the figures

The different speech features that we employed werepartly correlated The largest correlation emerged be-tween surprisal and the interaction term ldquoSurprisal timesPrecisionrdquo at a value of 61 We wondered if these corre-lations would hinder the EEG analysis and in particular ifthey would obscure the neural responses to the individ-ual speech features through the linear regression analy-sis an issue known as multicollinearity (Chatterjee ampHadi 2015 Kumar 1975) A high multicollinearity be-tween features could result in higher variance or leakagebetween the coefficient βij(τk) However the FrischndashWaughndashLovell theorem from econometrics states that lin-ear regression based on correlated features yields thesame results as when the features are first orthogonal-ized that is decorrelated (Lovell 2008 Frisch ampWaugh 1933) In addition in our implementation ofthe multiple linear regression we used a singular valuedecomposition of the design matrix of time-lagged fea-tures resulting in transformed features that were mutu-ally uncorrelated (Klema amp Laub 1980) The correlationof the features was therefore not problematic The onlyissue that multicolinearity can cause is significantlyincreased variance for each βij(τk) estimate which typi-cally emerges when the variance inflation factor is above5 For our speech features we obtained variance inflationfactors between 122 and 225 indicating that increasednoise due to correlated features is not an issue

As an additional control that our TRFs did not containleakage from responses to different features we devel-oped a null model that was employed to assess the sta-tistical significance of the actual TRFs (see below) Thenull model was constructed such that a potential leakagebetween features would appear similarly both in theactual model and in the null model and therefore wouldnot result in statistically significant results It follows thatany statistically significant part in the TRFs that we ob-tained did not result from leakage between the features

Statistical Significance

To determine the statistical significance of the estimatedTRFs we determined chance-level TRFs as a null modelThe chance-level TRFs were computed by constructingunrelated speech features and by relating these to theEEG recordings in the same way as for the computationof the actual TRFs To establish chance-level linguisticTRFs only the linguistic information of interest containedin the spike amplitude of the speech features but not theacoustic information in the spike timing needed to be un-related to the EEG We therefore constructed unrelatedspeech features by keeping the timing of the spikes iden-tical to those in the true model The speech feature thatdescribed word onsets was therefore not altered

Weissbart Kandylaki and Reichenbach 159

However we changed the amplitude of the spikes for theother linguistic speech features by taking their valuesfrom an unrelated story that is a story that was notaligned with the EEG data To obtain a large number ofnull models we considered permutations of our 15 storyparts Through permutating entire story parts and not theorder of individual words the statistical relationship be-tween the linguistic features of successive words was con-served Because we kept the timing of the spikes in thenull model as in the actual stories the obtained nullmodel could only be used to determine the significanceof the neural responses to the linguistic features but notfor those to the acoustic word onset

The actual TRFs were then analyzed for statistical signif-icance through comparison to 1000 null models The com-parison was obtained from a permutation test togetherwith cluster-based correction for multiple comparison(Oostenveld Fries Maris amp Schoffelen 2011) where onlyclusters of at least four electrodes were kept Specificallywe used the function spatio_temporal_cluster_test fromthe MNE python library The statistic for each model coef-ficient at each electrode and each lag was computed usingthe empirical distribution formed by values from the nullmodels setting the threshold at the 99th percentile of thenull distribution The cluster-level p values were computedand we considered only clusters with a p value greater than0510WeherebyusedtheBonferronicorrectiontoaccountfor the 10 different tests that reflected the different fre-quency bands and the different linguistic features

Data Availability

The EEG data from all participants together with the cor-responding speech features are available on figsharecom (httpsdoiorg106084m9figshare9033983v1)An exemplary script for computing TRFs can be obtainedfrom figshare as well (httpsdoiorg106084m9figshare9034481v1)

RESULTS

Behavioral Assessment

We first assessed to what degree the participants under-stood the stories through asking them comprehensionquestions These questions were answered with an aver-age of 96 accuracy evidencing that the volunteers con-sistently understood the speech and paid attention

Cortical Tracking of Acoustic and LinguisticSpeech Features

The cortical tracking of the speech features can be foundin different frequency bands First because all four fea-tures relate to words the frequency range of the featuresis similar to the rate of words in speech The latter isabout 1ndash4 Hz and corresponds to the delta frequency

range Cortical activity at low frequencies including thedelta frequency band can therefore be evoked by orentrain to the rhythm set by the acoustic and linguisticword features Second the amplitude of the neural activ-ity in higher frequency bands can be modulated by thespeech features This may in particular occur for thetheta band (4ndash8 Hz) the alpha band (8ndash12 Hz) the betafrequency band (20ndash30 Hz) and the gamma frequencyband (30ndash100 Hz) the power of which can be modulatedby prediction in sentence comprehension (Wang et al2012 Weiss amp Mueller 2012 Bastiaansen et al 2010Bastiaansen amp Hagoort 2006)We started by quantifying the neural tracking of the

word features at low frequencies We found neural re-sponses to word frequency between delays of 300 and610 msec (Figure 2) The topographic plots of the re-sponses show large differences between the temporalscalp areas on the one hand and the parietal and occipitalareas on the other handImportantly we found significant responses to the

word surprisal around a delay of 450 msec (Figure 2)These responses emerged predominantly in the EEGchannels on the temporal and occipital scalp areas andwere lateralized on the left hemisphere Precision wastracked by cortical activity at delays of around 100 msecand around 500 msec Moreover we observed a signifi-cant neural response to the interaction of surprisal andprecision at an earlier latency of around 400 msec andat a longer latency of around 1000 msecWe also computed the modulation of the power in the

theta band the alpha band the beta band as well as thegamma band by the acoustic and linguistic features(Figures 4 and 5) Although the power in the alpha bandwas not significantly related to the linguistic features thepower in the theta band was shaped by word frequencyat delays of around 300 msec and around 1000 msec(Figure 3) Furthermore the power in the theta bandwas significantly decreased by precision at delays ofabout 700 msecThe power in the beta band correlated positively with

surprisal at delays of around 700 and 1000 msec (Figure 4)At the latter delay the influence of surprisal was strongestat the left temporal channels Moreover the power in thebeta band was modulated by precision at a delay of about700 msec with the main contributions coming from theoccipital channelsThe power in the gamma band was increased by words

with higher surprisal at long latency of around 1000 msecmainly for the left temporal channels (Figure 5) The in-teraction of surprisal and precision shaped the gammapower as well at the early delay of about 0 msec

DISCUSSION

We have shown that cortical activity tracks the surprisal ofwords in speech comprehension Such cortical tracking

160 Journal of Cognitive Neuroscience Volume 32 Number 1

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 4: Cortical Tracking of Surprisal during Continuous Speech

scaling the amplitude of the spike at each word onset bythe negative logarithmof the frequency of the correspond-ing word The logarithm was used such that word fre-quency and surprisal were expressed in the same units

Finally to investigate a possible modulating effect thatprecision may have on surprisal we added an interactionterm ldquoSurprisal times Precisionrdquo This was computed bymultiplying precision values with surprisal such that theinteraction feature effectively stands as a confidence-weighted version of surprisal

In summary we computed five speech features oneacoustic feature word onset and four linguistic featuresword position in its sentence word frequency precisionand surprisal To those we added the interaction term be-tween surprisal and precision Each feature was a time se-ries of spikes with each spike being located at the onsetof a word The amplitude of the spike was constant for theword onset feature For each other feature it was scaledto the corresponding value for each respective linguisticfeature All values of the different linguistic features werestandardized to have unit variance and zero mean

EEG Acquisition and Preprocessing

We recorded brain activity using 64 active electrodes(actiCAP BrainProducts) and a multichannel EEG ampli-fier (actiCHamp BrainProducts) The presented soundwas recorded simultaneously through an acoustic

adapter (Acoustical Stimulator Adapter and StimTrakBrainProducts) and was used for aligning the EEG record-ings to the audio signals Both the EEG and the audio datawere acquired at a sampling rate of 1 kHz The left earlobe was used as a reference for the EEGThe EEG data were processed by first applying an anti-

aliasing filter (Kaiser window finite impulse response[FIR] filter cutoff minus6 dB at 125 Hz transition bandwidth50 Hz order 130) and by downsampling the data to250 Hz to reduce the computation time of subsequentoperations A high-pass filter (Hanning window sincType I linear phase FIR filter cutoffminus6 dB at 03 Hz tran-sition bandwidth 015 Hz order 5168) was then appliedto every channel to remove nonstationary trends such asslow drifts and offsets Bad channels were identifiedusing the procedure clean_rawdata from the EEGLABplugin ASR (Artifact Subspace Reconstruction) they werethen removed and interpolated with spherical interpola-tion All channels were then referenced to the channelaverage We subsequently ran an independent compo-nent analysis (ICA) decomposition and removed artifactsfrom eye blink eyes movement as well as muscle motionby visual inspection of the ICA components The cleaneddata were low-pass filtered (Hamming window linearphase FIR filter cutoff minus6 dB at 62 Hz transition band-width 10 Hz order 138) and further down-sampled to125 Hz The filtered EEG data therefore contained thebroad frequency range from 03 to 62 Hz

Figure 1 Experimental overview (A) We employ continuous speech narratives and utilize speech processing as well as language modeling to extractacoustic and linguistic features namely word onset word frequency precision and surprisal (B) The participantrsquos neural activity is recordedthrough EEG while they listen to the stories (C) We extract TRFs for each of the four speech features through computing a linear model thatestimates the EEG recordings from the speech features

158 Journal of Cognitive Neuroscience Volume 32 Number 1

We computed temporal response functions (TRFs) fromEEG data in several frequency bands The TRFs followedfrom a linear forward model that expressed the EEG signalat each electrode as a linear combination of the speechfeatures shifted by different latencies (Broderick et al2018 Ding amp Simon 2012) We used FIR Type I filtersdesigned with the synced windowed method and em-ploying a hamming window We filtered the EEG data inseveral frequency bands of interest delta band (low-passfilter cutoff at 45 Hz filter order 132) theta band (band-pass filter cutoff frequencies at 4 Hz and 8 Hz order 206)alpha band (band-pass filter cutoff frequencies at 8 and12 Hz order 206) beta band (band-pass filter cutoff20 Hz and 30 Hz order 82) and gamma band (cutoff at30 and 60 Hz order 164) For every frequency band otherthan delta we computed the power modulation by takingthe absolute value of the Hilbert transform of the bandpassed data and further band-pass filtered it between 05and 20 Hz (filter order 824) to remove the DC offset andhigher frequencies that do not occur in the speech features

EEG Data Analysis

To relate the speech features to the EEG data we used alinear spatiotemporal forward model that reconstructedthe EEG recordings from the acoustic feature and the lin-guistic features shifted by different delays (Figure 1)Such an approach has recently been used successfullyfor assessing the cortical tracking of the speech envelopephonemic information as well as semantic dissimilarity ofwords in speech (Broderick et al 2018 Di Liberto et al2015 Ding amp Simon 2012) The coefficients resultingfrom this regression constitute the TRFs that inform onthe brainrsquos response to each feature at different latenciesIn particular the forward model sought to express the

preprocessed EEG recordings xi tneth THORNf gNjfrac141 of the N = 64channels at each time instance tn through the time series

yj tnminusτketh THORN Fjfrac141 of the F = 6 speech features word onset

word frequency word position word precision wordsurprisal and the product of surprisal and entropyshifted by T different delays τkf gTkfrac141

xi tneth THORN frac14X6

jfrac141

XT

kfrac141

βij τketh THORNyj tnminusτketh THORN (3)

We hereby considered equally spaced delays τkf gTkfrac141that ranged from minus400 to 1100 msec At the samplingrate of 125 Hz this yielded a number of T = 188 lagsThe obtained estimate for the EEG channel i is denotedby xi The coefficient βij(τk) is the TRF for the ith EEGchannel and speech feature j at the latency τk The pre-processed EEG recording xi tneth THORNf gNifrac141 was either the EEGsignal in the delta band or the power of the EEG signal inthe higher frequency bands We computed the TRFs foreach participant separately leading to a set of TRFs on

which we could apply group-level statistical analysis asdescribed below We then also computed the populationaverage of the participant-specific TRFs the populationaverages are shown in the figures

The different speech features that we employed werepartly correlated The largest correlation emerged be-tween surprisal and the interaction term ldquoSurprisal timesPrecisionrdquo at a value of 61 We wondered if these corre-lations would hinder the EEG analysis and in particular ifthey would obscure the neural responses to the individ-ual speech features through the linear regression analy-sis an issue known as multicollinearity (Chatterjee ampHadi 2015 Kumar 1975) A high multicollinearity be-tween features could result in higher variance or leakagebetween the coefficient βij(τk) However the FrischndashWaughndashLovell theorem from econometrics states that lin-ear regression based on correlated features yields thesame results as when the features are first orthogonal-ized that is decorrelated (Lovell 2008 Frisch ampWaugh 1933) In addition in our implementation ofthe multiple linear regression we used a singular valuedecomposition of the design matrix of time-lagged fea-tures resulting in transformed features that were mutu-ally uncorrelated (Klema amp Laub 1980) The correlationof the features was therefore not problematic The onlyissue that multicolinearity can cause is significantlyincreased variance for each βij(τk) estimate which typi-cally emerges when the variance inflation factor is above5 For our speech features we obtained variance inflationfactors between 122 and 225 indicating that increasednoise due to correlated features is not an issue

As an additional control that our TRFs did not containleakage from responses to different features we devel-oped a null model that was employed to assess the sta-tistical significance of the actual TRFs (see below) Thenull model was constructed such that a potential leakagebetween features would appear similarly both in theactual model and in the null model and therefore wouldnot result in statistically significant results It follows thatany statistically significant part in the TRFs that we ob-tained did not result from leakage between the features

Statistical Significance

To determine the statistical significance of the estimatedTRFs we determined chance-level TRFs as a null modelThe chance-level TRFs were computed by constructingunrelated speech features and by relating these to theEEG recordings in the same way as for the computationof the actual TRFs To establish chance-level linguisticTRFs only the linguistic information of interest containedin the spike amplitude of the speech features but not theacoustic information in the spike timing needed to be un-related to the EEG We therefore constructed unrelatedspeech features by keeping the timing of the spikes iden-tical to those in the true model The speech feature thatdescribed word onsets was therefore not altered

Weissbart Kandylaki and Reichenbach 159

However we changed the amplitude of the spikes for theother linguistic speech features by taking their valuesfrom an unrelated story that is a story that was notaligned with the EEG data To obtain a large number ofnull models we considered permutations of our 15 storyparts Through permutating entire story parts and not theorder of individual words the statistical relationship be-tween the linguistic features of successive words was con-served Because we kept the timing of the spikes in thenull model as in the actual stories the obtained nullmodel could only be used to determine the significanceof the neural responses to the linguistic features but notfor those to the acoustic word onset

The actual TRFs were then analyzed for statistical signif-icance through comparison to 1000 null models The com-parison was obtained from a permutation test togetherwith cluster-based correction for multiple comparison(Oostenveld Fries Maris amp Schoffelen 2011) where onlyclusters of at least four electrodes were kept Specificallywe used the function spatio_temporal_cluster_test fromthe MNE python library The statistic for each model coef-ficient at each electrode and each lag was computed usingthe empirical distribution formed by values from the nullmodels setting the threshold at the 99th percentile of thenull distribution The cluster-level p values were computedand we considered only clusters with a p value greater than0510WeherebyusedtheBonferronicorrectiontoaccountfor the 10 different tests that reflected the different fre-quency bands and the different linguistic features

Data Availability

The EEG data from all participants together with the cor-responding speech features are available on figsharecom (httpsdoiorg106084m9figshare9033983v1)An exemplary script for computing TRFs can be obtainedfrom figshare as well (httpsdoiorg106084m9figshare9034481v1)

RESULTS

Behavioral Assessment

We first assessed to what degree the participants under-stood the stories through asking them comprehensionquestions These questions were answered with an aver-age of 96 accuracy evidencing that the volunteers con-sistently understood the speech and paid attention

Cortical Tracking of Acoustic and LinguisticSpeech Features

The cortical tracking of the speech features can be foundin different frequency bands First because all four fea-tures relate to words the frequency range of the featuresis similar to the rate of words in speech The latter isabout 1ndash4 Hz and corresponds to the delta frequency

range Cortical activity at low frequencies including thedelta frequency band can therefore be evoked by orentrain to the rhythm set by the acoustic and linguisticword features Second the amplitude of the neural activ-ity in higher frequency bands can be modulated by thespeech features This may in particular occur for thetheta band (4ndash8 Hz) the alpha band (8ndash12 Hz) the betafrequency band (20ndash30 Hz) and the gamma frequencyband (30ndash100 Hz) the power of which can be modulatedby prediction in sentence comprehension (Wang et al2012 Weiss amp Mueller 2012 Bastiaansen et al 2010Bastiaansen amp Hagoort 2006)We started by quantifying the neural tracking of the

word features at low frequencies We found neural re-sponses to word frequency between delays of 300 and610 msec (Figure 2) The topographic plots of the re-sponses show large differences between the temporalscalp areas on the one hand and the parietal and occipitalareas on the other handImportantly we found significant responses to the

word surprisal around a delay of 450 msec (Figure 2)These responses emerged predominantly in the EEGchannels on the temporal and occipital scalp areas andwere lateralized on the left hemisphere Precision wastracked by cortical activity at delays of around 100 msecand around 500 msec Moreover we observed a signifi-cant neural response to the interaction of surprisal andprecision at an earlier latency of around 400 msec andat a longer latency of around 1000 msecWe also computed the modulation of the power in the

theta band the alpha band the beta band as well as thegamma band by the acoustic and linguistic features(Figures 4 and 5) Although the power in the alpha bandwas not significantly related to the linguistic features thepower in the theta band was shaped by word frequencyat delays of around 300 msec and around 1000 msec(Figure 3) Furthermore the power in the theta bandwas significantly decreased by precision at delays ofabout 700 msecThe power in the beta band correlated positively with

surprisal at delays of around 700 and 1000 msec (Figure 4)At the latter delay the influence of surprisal was strongestat the left temporal channels Moreover the power in thebeta band was modulated by precision at a delay of about700 msec with the main contributions coming from theoccipital channelsThe power in the gamma band was increased by words

with higher surprisal at long latency of around 1000 msecmainly for the left temporal channels (Figure 5) The in-teraction of surprisal and precision shaped the gammapower as well at the early delay of about 0 msec

DISCUSSION

We have shown that cortical activity tracks the surprisal ofwords in speech comprehension Such cortical tracking

160 Journal of Cognitive Neuroscience Volume 32 Number 1

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 5: Cortical Tracking of Surprisal during Continuous Speech

We computed temporal response functions (TRFs) fromEEG data in several frequency bands The TRFs followedfrom a linear forward model that expressed the EEG signalat each electrode as a linear combination of the speechfeatures shifted by different latencies (Broderick et al2018 Ding amp Simon 2012) We used FIR Type I filtersdesigned with the synced windowed method and em-ploying a hamming window We filtered the EEG data inseveral frequency bands of interest delta band (low-passfilter cutoff at 45 Hz filter order 132) theta band (band-pass filter cutoff frequencies at 4 Hz and 8 Hz order 206)alpha band (band-pass filter cutoff frequencies at 8 and12 Hz order 206) beta band (band-pass filter cutoff20 Hz and 30 Hz order 82) and gamma band (cutoff at30 and 60 Hz order 164) For every frequency band otherthan delta we computed the power modulation by takingthe absolute value of the Hilbert transform of the bandpassed data and further band-pass filtered it between 05and 20 Hz (filter order 824) to remove the DC offset andhigher frequencies that do not occur in the speech features

EEG Data Analysis

To relate the speech features to the EEG data we used alinear spatiotemporal forward model that reconstructedthe EEG recordings from the acoustic feature and the lin-guistic features shifted by different delays (Figure 1)Such an approach has recently been used successfullyfor assessing the cortical tracking of the speech envelopephonemic information as well as semantic dissimilarity ofwords in speech (Broderick et al 2018 Di Liberto et al2015 Ding amp Simon 2012) The coefficients resultingfrom this regression constitute the TRFs that inform onthe brainrsquos response to each feature at different latenciesIn particular the forward model sought to express the

preprocessed EEG recordings xi tneth THORNf gNjfrac141 of the N = 64channels at each time instance tn through the time series

yj tnminusτketh THORN Fjfrac141 of the F = 6 speech features word onset

word frequency word position word precision wordsurprisal and the product of surprisal and entropyshifted by T different delays τkf gTkfrac141

xi tneth THORN frac14X6

jfrac141

XT

kfrac141

βij τketh THORNyj tnminusτketh THORN (3)

We hereby considered equally spaced delays τkf gTkfrac141that ranged from minus400 to 1100 msec At the samplingrate of 125 Hz this yielded a number of T = 188 lagsThe obtained estimate for the EEG channel i is denotedby xi The coefficient βij(τk) is the TRF for the ith EEGchannel and speech feature j at the latency τk The pre-processed EEG recording xi tneth THORNf gNifrac141 was either the EEGsignal in the delta band or the power of the EEG signal inthe higher frequency bands We computed the TRFs foreach participant separately leading to a set of TRFs on

which we could apply group-level statistical analysis asdescribed below We then also computed the populationaverage of the participant-specific TRFs the populationaverages are shown in the figures

The different speech features that we employed werepartly correlated The largest correlation emerged be-tween surprisal and the interaction term ldquoSurprisal timesPrecisionrdquo at a value of 61 We wondered if these corre-lations would hinder the EEG analysis and in particular ifthey would obscure the neural responses to the individ-ual speech features through the linear regression analy-sis an issue known as multicollinearity (Chatterjee ampHadi 2015 Kumar 1975) A high multicollinearity be-tween features could result in higher variance or leakagebetween the coefficient βij(τk) However the FrischndashWaughndashLovell theorem from econometrics states that lin-ear regression based on correlated features yields thesame results as when the features are first orthogonal-ized that is decorrelated (Lovell 2008 Frisch ampWaugh 1933) In addition in our implementation ofthe multiple linear regression we used a singular valuedecomposition of the design matrix of time-lagged fea-tures resulting in transformed features that were mutu-ally uncorrelated (Klema amp Laub 1980) The correlationof the features was therefore not problematic The onlyissue that multicolinearity can cause is significantlyincreased variance for each βij(τk) estimate which typi-cally emerges when the variance inflation factor is above5 For our speech features we obtained variance inflationfactors between 122 and 225 indicating that increasednoise due to correlated features is not an issue

As an additional control that our TRFs did not containleakage from responses to different features we devel-oped a null model that was employed to assess the sta-tistical significance of the actual TRFs (see below) Thenull model was constructed such that a potential leakagebetween features would appear similarly both in theactual model and in the null model and therefore wouldnot result in statistically significant results It follows thatany statistically significant part in the TRFs that we ob-tained did not result from leakage between the features

Statistical Significance

To determine the statistical significance of the estimatedTRFs we determined chance-level TRFs as a null modelThe chance-level TRFs were computed by constructingunrelated speech features and by relating these to theEEG recordings in the same way as for the computationof the actual TRFs To establish chance-level linguisticTRFs only the linguistic information of interest containedin the spike amplitude of the speech features but not theacoustic information in the spike timing needed to be un-related to the EEG We therefore constructed unrelatedspeech features by keeping the timing of the spikes iden-tical to those in the true model The speech feature thatdescribed word onsets was therefore not altered

Weissbart Kandylaki and Reichenbach 159

However we changed the amplitude of the spikes for theother linguistic speech features by taking their valuesfrom an unrelated story that is a story that was notaligned with the EEG data To obtain a large number ofnull models we considered permutations of our 15 storyparts Through permutating entire story parts and not theorder of individual words the statistical relationship be-tween the linguistic features of successive words was con-served Because we kept the timing of the spikes in thenull model as in the actual stories the obtained nullmodel could only be used to determine the significanceof the neural responses to the linguistic features but notfor those to the acoustic word onset

The actual TRFs were then analyzed for statistical signif-icance through comparison to 1000 null models The com-parison was obtained from a permutation test togetherwith cluster-based correction for multiple comparison(Oostenveld Fries Maris amp Schoffelen 2011) where onlyclusters of at least four electrodes were kept Specificallywe used the function spatio_temporal_cluster_test fromthe MNE python library The statistic for each model coef-ficient at each electrode and each lag was computed usingthe empirical distribution formed by values from the nullmodels setting the threshold at the 99th percentile of thenull distribution The cluster-level p values were computedand we considered only clusters with a p value greater than0510WeherebyusedtheBonferronicorrectiontoaccountfor the 10 different tests that reflected the different fre-quency bands and the different linguistic features

Data Availability

The EEG data from all participants together with the cor-responding speech features are available on figsharecom (httpsdoiorg106084m9figshare9033983v1)An exemplary script for computing TRFs can be obtainedfrom figshare as well (httpsdoiorg106084m9figshare9034481v1)

RESULTS

Behavioral Assessment

We first assessed to what degree the participants under-stood the stories through asking them comprehensionquestions These questions were answered with an aver-age of 96 accuracy evidencing that the volunteers con-sistently understood the speech and paid attention

Cortical Tracking of Acoustic and LinguisticSpeech Features

The cortical tracking of the speech features can be foundin different frequency bands First because all four fea-tures relate to words the frequency range of the featuresis similar to the rate of words in speech The latter isabout 1ndash4 Hz and corresponds to the delta frequency

range Cortical activity at low frequencies including thedelta frequency band can therefore be evoked by orentrain to the rhythm set by the acoustic and linguisticword features Second the amplitude of the neural activ-ity in higher frequency bands can be modulated by thespeech features This may in particular occur for thetheta band (4ndash8 Hz) the alpha band (8ndash12 Hz) the betafrequency band (20ndash30 Hz) and the gamma frequencyband (30ndash100 Hz) the power of which can be modulatedby prediction in sentence comprehension (Wang et al2012 Weiss amp Mueller 2012 Bastiaansen et al 2010Bastiaansen amp Hagoort 2006)We started by quantifying the neural tracking of the

word features at low frequencies We found neural re-sponses to word frequency between delays of 300 and610 msec (Figure 2) The topographic plots of the re-sponses show large differences between the temporalscalp areas on the one hand and the parietal and occipitalareas on the other handImportantly we found significant responses to the

word surprisal around a delay of 450 msec (Figure 2)These responses emerged predominantly in the EEGchannels on the temporal and occipital scalp areas andwere lateralized on the left hemisphere Precision wastracked by cortical activity at delays of around 100 msecand around 500 msec Moreover we observed a signifi-cant neural response to the interaction of surprisal andprecision at an earlier latency of around 400 msec andat a longer latency of around 1000 msecWe also computed the modulation of the power in the

theta band the alpha band the beta band as well as thegamma band by the acoustic and linguistic features(Figures 4 and 5) Although the power in the alpha bandwas not significantly related to the linguistic features thepower in the theta band was shaped by word frequencyat delays of around 300 msec and around 1000 msec(Figure 3) Furthermore the power in the theta bandwas significantly decreased by precision at delays ofabout 700 msecThe power in the beta band correlated positively with

surprisal at delays of around 700 and 1000 msec (Figure 4)At the latter delay the influence of surprisal was strongestat the left temporal channels Moreover the power in thebeta band was modulated by precision at a delay of about700 msec with the main contributions coming from theoccipital channelsThe power in the gamma band was increased by words

with higher surprisal at long latency of around 1000 msecmainly for the left temporal channels (Figure 5) The in-teraction of surprisal and precision shaped the gammapower as well at the early delay of about 0 msec

DISCUSSION

We have shown that cortical activity tracks the surprisal ofwords in speech comprehension Such cortical tracking

160 Journal of Cognitive Neuroscience Volume 32 Number 1

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 6: Cortical Tracking of Surprisal during Continuous Speech

However we changed the amplitude of the spikes for theother linguistic speech features by taking their valuesfrom an unrelated story that is a story that was notaligned with the EEG data To obtain a large number ofnull models we considered permutations of our 15 storyparts Through permutating entire story parts and not theorder of individual words the statistical relationship be-tween the linguistic features of successive words was con-served Because we kept the timing of the spikes in thenull model as in the actual stories the obtained nullmodel could only be used to determine the significanceof the neural responses to the linguistic features but notfor those to the acoustic word onset

The actual TRFs were then analyzed for statistical signif-icance through comparison to 1000 null models The com-parison was obtained from a permutation test togetherwith cluster-based correction for multiple comparison(Oostenveld Fries Maris amp Schoffelen 2011) where onlyclusters of at least four electrodes were kept Specificallywe used the function spatio_temporal_cluster_test fromthe MNE python library The statistic for each model coef-ficient at each electrode and each lag was computed usingthe empirical distribution formed by values from the nullmodels setting the threshold at the 99th percentile of thenull distribution The cluster-level p values were computedand we considered only clusters with a p value greater than0510WeherebyusedtheBonferronicorrectiontoaccountfor the 10 different tests that reflected the different fre-quency bands and the different linguistic features

Data Availability

The EEG data from all participants together with the cor-responding speech features are available on figsharecom (httpsdoiorg106084m9figshare9033983v1)An exemplary script for computing TRFs can be obtainedfrom figshare as well (httpsdoiorg106084m9figshare9034481v1)

RESULTS

Behavioral Assessment

We first assessed to what degree the participants under-stood the stories through asking them comprehensionquestions These questions were answered with an aver-age of 96 accuracy evidencing that the volunteers con-sistently understood the speech and paid attention

Cortical Tracking of Acoustic and LinguisticSpeech Features

The cortical tracking of the speech features can be foundin different frequency bands First because all four fea-tures relate to words the frequency range of the featuresis similar to the rate of words in speech The latter isabout 1ndash4 Hz and corresponds to the delta frequency

range Cortical activity at low frequencies including thedelta frequency band can therefore be evoked by orentrain to the rhythm set by the acoustic and linguisticword features Second the amplitude of the neural activ-ity in higher frequency bands can be modulated by thespeech features This may in particular occur for thetheta band (4ndash8 Hz) the alpha band (8ndash12 Hz) the betafrequency band (20ndash30 Hz) and the gamma frequencyband (30ndash100 Hz) the power of which can be modulatedby prediction in sentence comprehension (Wang et al2012 Weiss amp Mueller 2012 Bastiaansen et al 2010Bastiaansen amp Hagoort 2006)We started by quantifying the neural tracking of the

word features at low frequencies We found neural re-sponses to word frequency between delays of 300 and610 msec (Figure 2) The topographic plots of the re-sponses show large differences between the temporalscalp areas on the one hand and the parietal and occipitalareas on the other handImportantly we found significant responses to the

word surprisal around a delay of 450 msec (Figure 2)These responses emerged predominantly in the EEGchannels on the temporal and occipital scalp areas andwere lateralized on the left hemisphere Precision wastracked by cortical activity at delays of around 100 msecand around 500 msec Moreover we observed a signifi-cant neural response to the interaction of surprisal andprecision at an earlier latency of around 400 msec andat a longer latency of around 1000 msecWe also computed the modulation of the power in the

theta band the alpha band the beta band as well as thegamma band by the acoustic and linguistic features(Figures 4 and 5) Although the power in the alpha bandwas not significantly related to the linguistic features thepower in the theta band was shaped by word frequencyat delays of around 300 msec and around 1000 msec(Figure 3) Furthermore the power in the theta bandwas significantly decreased by precision at delays ofabout 700 msecThe power in the beta band correlated positively with

surprisal at delays of around 700 and 1000 msec (Figure 4)At the latter delay the influence of surprisal was strongestat the left temporal channels Moreover the power in thebeta band was modulated by precision at a delay of about700 msec with the main contributions coming from theoccipital channelsThe power in the gamma band was increased by words

with higher surprisal at long latency of around 1000 msecmainly for the left temporal channels (Figure 5) The in-teraction of surprisal and precision shaped the gammapower as well at the early delay of about 0 msec

DISCUSSION

We have shown that cortical activity tracks the surprisal ofwords in speech comprehension Such cortical tracking

160 Journal of Cognitive Neuroscience Volume 32 Number 1

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 7: Cortical Tracking of Surprisal during Continuous Speech

has emerged at low frequencies that is within the deltaband that encompasses a similar frequency range as therate of words in speech Importantly we found that theneural activity in the faster theta beta and gamma fre-quency bands tracks surprisal as well These frequencybands have previously been suggested to be involved inthe bottomndashup and topndashdown propagation of predic-tions and prediction errors (Lewis amp Bastiaansen 2015)We have further demonstrated that the cortical track-

ing of word surprisal is modulated by precision Theinteraction between surprisal and precision leads to

responses both in the slow delta band as well as in thepower of the faster gamma band In particular word pre-dictions that are made with high precision but then leadto large surprisal cause an increased gamma power atzero lag However as opposed to a previous study onERPs we did not observe a significant effect in the thetaor alpha bands (Rommers et al 2017) This differencemay be due to our use of naturalistic stimuli and the inclu-sion of all words in the analysis whereas the previous studyused specialized sentences with final words that had eitherhigh or low surprisal and either high or low precision

Figure 2 TRFs for acousticand linguistic speech featuresThe TRFs for each electrode areshown in bold at time instanceswhere they are significantcompared with a null modelthat is based on shuffled dataEEG channels that yield asignificant response withina particular range of delayshighlighted in gray areindicated in white in thetopographic plots (A) Theresponses to the word onsetappear as insignificant due tothe construction of the nullmodel (B C) We obtainsignificant neural responsesto word frequency as well assurprisal for delays around400 msec (D) Significant neuralresponses to precision arisearound delays of 100 msec aswell as around 500 msec(E) The interaction betweensurprisal and precision leads toa neural response at a delay of400 msec as well as at a longdelay of 1000 msec

Weissbart Kandylaki and Reichenbach 161

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 8: Cortical Tracking of Surprisal during Continuous Speech

The cortical tracking of surprisal may indicate predic-tive processing by the brain Predictive processing is aframework for perception in which it is assumed thatthe brain infers hypotheses about a sensory input by gen-erating predictions of its neural representations and thatthe hypotheses are constantly updated as new sensoryinformation becomes available (Kanai et al 2015Bendixen SanMiguel amp Schroumlger 2012 Friston 2010Friston amp Kiebel 2009) In particular the surprisal of aword reflects a prediction error a key quantity in theframework of predictive coding (Friston 2010) Howeverthe expectancy of a word based on previous words also

correlates with the plausibility of a word in a particular con-text (Nieuwland et al 2019 DeLong Quante amp Kutas2014) Further studies are therefore required to disentan-gle neural correlates of actual word prediction from thosethat do not require predictive processing such as wordplausibilityThe surprisal of a word can reflect both its semantic as

well as syntactic information and previous investigationsinto the neurobiological mechanisms of languagecomprehension have manipulated both independently(Henderson Choi Lowder amp Ferreira 2016 HumphriesBinder Medler amp Liebenthal 2006) In contrast our

Figure 3 Neural responsesin the theta frequency band (A)Word frequency is positivelycorrelated to theta power at adelay of 300 msec and isnegatively correlated at a delayof 1000 msec (B) Words thatcan be predicted with higherprecision lead to an increasedtheta power at 150 msec and adecreased theta power at alatency of 700 msec

Figure 4 Neural responses inthe beta frequency band (A)There are significant neuralresponses to surprisalemerging at delays of 700 and1000 msec (B) Precision causesan increased power in the betaband activity around a delay of700 msec

162 Journal of Cognitive Neuroscience Volume 32 Number 1

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 9: Cortical Tracking of Surprisal during Continuous Speech

approach has taken a naturalistic and holistic approach tosurprisal we employed natural speech without manipula-tions combined with statistical learning of a rich varietyof natural language cues through a recurrent neural net-work Because the neural network infers both syntacticrules as well as semantic information from the training ofthe speech material the reported neural response to wordsurprisal can reflect both semantic as well as syntactic infor-mation (Collobert et al 2011)It is instructive to compare the reported neural re-

sponses to surprisal to the well-characterized event-related responses that can be elicited by violations ofsemantics syntax or morphology in sentences In partic-ular semantic violations can cause the N400 response anegativity at 200ndash500 msec at the central and parietalscalp area (Kutas amp Federmeier 2011 Kutas amp Hillyard1980) Syntactic anomalies due to ungrammaticality ortemporary misanalysis elicit the P600 a broad positivepotential that is located at the posterior scalp area andarises around 600 msec after the anomaly (Hagoortamp Brown 2000 Friederici Pfeifer amp Hahne 1993)More specific syntactic anomalies can lead to negativepotentials that occur anteriorly and that can be left later-alized either occurring at 300ndash500 msec ((L)AN) orearlier at 125ndash150 msec (ELAN Steinhauer amp Drury2012 Friederici 2002 Van Den Brink Brown ampHagoort 2001 Roumlsler Pechmann Streb Roumlder ampHennighausen 1998)These ERPs do presumably not reflect the activation of

single static neural sources but rather waves of neuralactivity that propagate in time across different brainareas (Kutas amp Federmeier 2011 Tse et al 2007 MaessHerrmann Hahne Nakamura amp Friederici 2006) In thecase of the N400 for instance this wave of activity startsat about 250 msec in the left superior temporal gyrus

and then propagates to the left temporal lobe by 365 msecas well as to both frontal lobes by 500 msec (Van Petten ampLuka 2006 Halgren et al 2002 Helenius SalmelinService amp Connolly 1998) A recent theory suggests thatthis wave of activity reflects reverberating activity withinthe inferior middle and superior temporal gyri thatcorresponds to the activation of lexical informationthe formation of context and the unification of an up-coming word with the context (Baggio amp Hagoort 2011)

The spatiotemporal characteristics of the responses tosurprisal that we have measured here share certain sim-ilarities with these ERPs In particular we have foundneural responses to surprisal at latencies between 300and 600 msec These responses show a central-parietalnegativity that is reminiscent of the N400 Howeverother features of the neural responses that we describehere appear distinct from these ERPs The neural re-sponse to surprisal in the delta band at the latency of600 msec does for instance not display the posteriorpositivity of the P600 Moreover we have identified lateresponses around 700 and 1000 msec We have alsoshown that neural responses to surprisal arise in variousfrequency bands beyond the delta band that matters forthe ERPs However a further comparison of the neuralresponse to surprisal to the related ERPs is hindered bythe lack of spatial resolution offered by EEG recordingsFuture neuroimaging studies using intracranial record-ings or magnetoencephalography (MEG) may localizethe sources of the neural response to surprisal thatwe have measured here and quantify potential sharedsources with the ERPs

The difference of the cortical tracking of surprisal tothe well-known neural correlates of semantic syntacticor morphological anomalies and in particular the late re-sponses at a delay of around 1 sec may come as a result

Figure 5 Tracking of surprisalby gamma band activity (A) Thegamma activity is decreased ataround 1000 msec mostly inthe left temporal and frontalscalp areas (B) The interactionbetween precision and surprisalleads to a modulation of thegamma power at the latency ofaround 0 msec This modulationoccurs predominantly for lefttemporal and frontal channelsas well

Weissbart Kandylaki and Reichenbach 163

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 10: Cortical Tracking of Surprisal during Continuous Speech

of our use of natural speech that differs from the artifi-cially constructed and tightly controlled stimuli used tomeasure ERPs First in our experiment the participants en-countered no violations of semantics syntax and morphol-ogy but instead heard naturalistic speech within which thewords occurred in context Second our stimuli did not con-tain artificial manipulations of word surprisal or precisionInstead of altering the stimuli we focused on quantifyingsurprisal and precision as they varied naturally in the pre-sented stories Third we assessed the responses to surprisaland precision at each word in the story and hence for wordsin every sentence position rather than for words at a partic-ular position within each sentence Because we accountedfor word position through a corresponding control featurewe avoided the possibility of sentence position having aneffect on the results (Bastiaansen et al 2010) Fourthwe did not employ isolated sentences but continuousstories so that information of integration occurred overtimescales exceeding a few seconds

Although our EEG recordings showed the corticaltracking of surprisal in different frequency bands theydid not allow us to precisely localize the sources of theactivity in the cortex Pairing EEG with fMRI or employingMEG may allow to add spatial information to the tempo-ral tracking that we have assessed here A recent fMRIstudy for instance found that the left inferior temporalsulcus the bilateral posterior superior temporal gyri andthe right amygdala responded to surprisal during naturallanguage comprehension whereas the left ventral pre-motor cortex and the left inferior parietal lobule re-sponded to entropy (Willems Frank Nijhof Hagoortamp van den Bosch 2015) Another recent MEG measure-ment of the brainrsquos natural speech processing found thatentropy and surprisal play a role in the assembly of pho-nemes into words and involve brain areas such as coreauditory cortex and the STS (Brodbeck et al 2018)Combining the temporal precision of EEG with the spa-tial precision of fMRI or harnessing the ability of MEG tolocate neural sources temporally and spatially will allowto further clarify the spatiotemporal mechanisms of nat-ural language comprehension in the brain

In summary we showed that neural responses to wordsurprisal can be measured from EEG responses to natu-ralistic stories Our results demonstrate that both theslow delta band as well as the power in higher frequencybands in particular the beta and gamma bands areshaped by surprisal Moreover we also showed that theneural response to surprisal is modulated by the preci-sion of a prediction In particular predictions made withhigh precision which lead to high surprisal modulategamma power in the left temporal and frontal scalp areasIn addition we also demonstrated that neural activity inthe delta theta and beta frequency bands is shaped bythe precision of word prediction directly These re-sponses arise at different latencies and at different scalpareas suggesting a rich spatiotemporal dynamics of neu-ral activity related to word prediction

Acknowledgments

This research was supported by Wellcome Trust grant 108295Z15Z by EPSRC grants EPM0267281 and EPR0326021 as wellas in part by the National Science Foundation under grant noNSF PHY-1125915

Reprint requests should be sent to Tobias ReichenbachDepartment of Bioengineering and Centre for NeurotechnologyImperial College London South Kensington Campus SW7 2AZLondon United Kingdom or via e-mail reichenbachimperialacuk

REFERENCES

Baggio G amp Hagoort P (2011) The balance betweenmemory and unification in semantics A dynamic accountof the N400 Language and Cognitive Processes 261338ndash1367

Bastiaansen M amp Hagoort P (2006) Oscillatory neuronaldynamics during language comprehension Progress in BrainResearch 159 179ndash196

Bastiaansen M Magyari L amp Hagoort P (2010) Syntacticunification operations are reflected in oscillatory dynamicsduring on-line sentence comprehension Journal ofCognitive Neuroscience 22 1333ndash1347

Bendixen A SanMiguel I amp Schroumlger E (2012) Earlyelectrophysiological indicators for predictive processing inaudition A review International Journal ofPsychophysiology 83 120ndash131

Bengio Y Ducharme R Vincent P amp Jauvin C (2003) Aneural probabilistic language model Journal of MachineLearning Research 3 1137ndash1155

Brennan J R amp Hale J T (2019) Hierarchical structure guidesrapid linguistic predictions during naturalistic listening PLoSOne 14 e0207741

Brodbeck C Presacco A amp Simon J Z (2018) Neural sourcedynamics of brain responses to continuous stimuli Speechprocessing from acoustics to comprehension Neuroimage172 162ndash174

Broderick M P Anderson A J Di Liberto G M Crosse M Jamp Lalor E C (2018) Electrophysiological correlates ofsemantic dissimilarity reflect the comprehension of naturalnarrative speech Current Biology 28 803ndash809

Brown P F Desouza P V Mercer R L Pietra V J D amp LaiJ C (1992) Class-based n-gram models of natural languageComputational Linguistics 18 467ndash479

Chatterjee S amp Hadi A S (2015) Regression analysis byexample Hoboken NJ Wiley

Collobert R Weston J Bottou L Karlen M KavukcuogluK amp Kuksa P (2011) Natural language processing (almost)from scratch Journal of Machine Learning Research 122493ndash2537

Davidson D J amp Indefrey P (2007) An inverse relationbetween event-related and timendashfrequency violationresponses in sentence processing Brain Reseach 115881ndash92

DeLong K A Quante L amp Kutas M (2014) Predictabilityplausibility and two late ERP positivities during writtensentence comprehension Neuropsychologia 61 150ndash162

Di Liberto G M OrsquoSullivan J A amp Lalor E C (2015) Low-frequency cortical entrainment to speech reflects phoneme-level processing Current Biology 25 2457ndash2465

Ding N Melloni L Zhang H Tian X amp Poeppel D (2016)Cortical tracking of hierarchical linguistic structures inconnected speech Nature Neuroscience 19 158ndash164

Ding N Pan X Luo C Su N Zhang W amp Zhang J (2018)Attention is required for knowledge-based sequential

164 Journal of Cognitive Neuroscience Volume 32 Number 1

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 11: Cortical Tracking of Surprisal during Continuous Speech

grouping Insights from the integration of syllables intowords Journal of Neuroscience 38 1178ndash1188

Ding N amp Simon J Z (2012) Emergence of neural encodingof auditory objects while listening to competing speakersProceedings of the National Academy of Sciences USA109 11854ndash11859

Ding N amp Simon J Z (2014) Cortical entrainment tocontinuous speech Functional roles and interpretationsFrontiers in Human Neuroscience 8 311

Federmeier K D Wlotko E W De Ochoa-Dewald E ampKutas M (2007) Multiple effects of sentential constraint onword processing Brain Research 1146 75ndash84

Feldman H amp Friston K (2010) Attention uncertainty andfree-energy Frontiers in Human Neuroscience 4 215

Frank S L Otten L J Galli G amp Vigliocco G (2015) TheERP response to the amount of information conveyed bywords in sentences Brain and Language 140 1ndash11

Frank S L amp Willems R M (2017) Word predictability andsemantic similarity show distinct patterns of brain activityduring language comprehension Language Cognition andNeuroscience 32 1192ndash1203

Friederici A D (2002) Towards a neural basis of auditorysentence processing Trends in Cognitive Sciences 678ndash84

Friederici A D Pfeifer E amp Hahne A (1993) Event-relatedbrain potentials during natural speech processing Effects ofsemantic morphological and syntactic violations CognitiveBrain Research 1 183ndash192

Frisch R amp Waugh F V (1933) Partial time regressionsas compared with individual trends Econometrica 1387ndash401

Friston K (2010) The free-energy principle A unified braintheory Nature Reviews Neuroscience 11 127ndash138

Friston K amp Kiebel S (2009) Predictive coding under thefree-energy principle Philosophical Transactions of theRoyal Society of London Series B Biological Sciences364 121ndash1221

Giraud A-L amp Poeppel D (2012) Cortical oscillations andspeech processing Emerging computational principles andoperations Nature Neuroscience 15 511ndash517

Gorman K Howell J amp Wagner M (2011) Prosodylab-alignerA tool for forced alignment of laboratory speech Journal ofthe Canadian Acoustical Association 39 192ndash193

Graves A (2013) Generating sequences with recurrent neuralnetworks arXiv preprint arXiv13080850

Hagoort P amp Brown C M (2000) ERP effects of listening tospeech compared to reading The P600SPS to syntacticviolations in spoken sentences and rapid serial visualpresentation Neuropsychologia 38 1531ndash1549

Halgren E Dhond R P Christensen N Van Petten CMarinkovic K Lewine J D et al (2002) N400-likemagnetoencephalography responses modulated by semanticcontext word frequency and lexical class in sentencesNeuroimage 17 1101ndash1116

Heilbron M amp Chait M (2018) Great expectations Is thereevidence for predictive coding in auditory cortexNeuroscience 389 54ndash73

Helenius P Salmelin R Service E amp Connolly J F (1998)Distinct time courses of word and context comprehension inthe left temporal cortex Brain 121 1133ndash1142

Henderson J M Choi W Lowder M W amp Ferreira F(2016) Language structure in the brain A fixation-relatedfMRI study of syntactic surprisal in reading Neuroimage132 293ndash300

Humphries C Binder J R Medler D A amp Liebenthal E(2006) Syntactic and semantic modulation of neural activityduring auditory sentence comprehension Journal ofCognitive Neuroscience 18 665ndash679

Hyafil A Fontolan L Kabdebon C Gutkin B amp Giraud A-L(2015) Speech encoding by coupled cortical theta and gammaoscillations eLife 4 e06213

Kanai R Komura Y Shipp S amp Friston K (2015) Cerebralhierarchies Predictive processing precision and the pulvinarPhilophical Transancations of the Royal Society of LondonSeries B Biological Science 370 20140169

Keitel A Gross J amp Kayser C (2018) Perceptually relevantspeech tracking in auditory and motor cortex reflects distinctlinguistic features PLoS Biology 16 e2004473

Kielar A Meltzer J A Moreno S Alain C amp Bialystok E(2014) Oscillatory responses to semantic and syntacticviolations Journal of Cognitive Neuroscience 26 2840ndash2862

Klema V amp Laub A (1980) The singular value decompositionIts computation and some applications IEEE Transactionson Automatic Control 25 164ndash176

Koelsch S Vuust P amp Friston K (2018) Predictive processesand the peculiar case of music Trends in Cognitive Sciences23 63ndash77

Kumar T K (1975) Multicollinearity in regression analysisReview of Economics and Statistics 57 365ndash366

Kutas M amp Federmeier K D (2011) Thirty years andcounting Finding meaning in the N400 component of theevent-related brain potential (ERP) Annual Review ofPsychology 62 621ndash647

Kutas M amp Hillyard S A (1980) Reading senseless sentencesBrain potentials reflect semantic incongruity Science 207203ndash205

Kutas M amp Hillyard S A (1984) Brain potentials duringreading reflect word expectancy and semantic associationNature 307 161ndash163

Lakatos P Chen C M OrsquoConnell M N Mills A amp SchroederC E (2007) Neuronal oscillations and multisensoryinteraction in primary auditory cortex Neuron 53 279ndash292

Levy R (2008) Expectation-based syntactic comprehensionCognition 106 1126ndash1177

Lewis A G amp Bastiaansen M (2015) A predictive codingframework for rapid neural dynamics during sentence-levellanguage comprehension Cortex 68 155ndash168

Lovell M C (2008) A simple proof of the FWL theoremJournal of Economic Education 39 88ndash91

Maess B Herrmann C S Hahne A Nakamura A amp FriedericiA D (2006) Localizing the distributed language networkresponsible for the N400 measured by MEG during auditorysentence processing Brain Research 1096 163ndash172

Mahoney M (2011) About the test data Retrieved frommattmahoneynetdctextdatahtml

Mikolov T Kombrink S Burget L Černockyacute J amp Khudanpur S(2011) Extensions of recurrent neural network language modelPaper presented at the 2011 IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP)

Miller G A Heise G A amp Lichten W (1951) Theintelligibility of speech as a function of the context of thetest materials Journal of Experimental Psychology 41329ndash335

Miller G A amp Isard S (1963) Some perceptual consequencesof linguistic rules Journal of Verbal Learning and VerbalBehavior 2 217ndash228

Molinaro N Barraza P amp Carreiras M (2013) Long-rangeneural synchronization supports fast and efficient readingEEG correlates of processing expected words in sentencesNeuroimage 72 120ndash132

Nieuwland M Barr D Bartolozzi F Busch-Moreno SDonaldson D Ferguson H J et al (2019) Dissociableeffects of prediction and integration during languagecomprehension Evidence from a large-scale study usingbrain potentials httpswwwbiorxivorgcontent101101267815v4

Weissbart Kandylaki and Reichenbach 165

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 12: Cortical Tracking of Surprisal during Continuous Speech

Oostenveld R Fries P Maris E amp Schoffelen J-M (2011)FieldTrip Open source software for advanced analysis ofMEG EEG and invasive electrophysiological dataComputational Intelligence and Neuroscience 2011156869

Pascanu R Mikolov T amp Bengio Y (2013) On the difficultyof training recurrent neural networks Paper presented at the30th International Conference on International Conferenceon Machine Learning Atlanta GA

Patten W (1910) International short stories (Vol 2) AuroraIL PF Collier amp Son

Pennington J Socher R amp Manning C (2014) Glove Globalvectors for word representation Paper presented at theConference on Empirical Methods in Natural LanguageProcessing (EMNLP) Doha Qatar

Rommers J Dickson D S Norton J J Wlotko E W ampFedermeier K D (2017) Alpha and theta band dynamicsrelated to sentential constraint and word expectancyLanguage Cognition and Neuroscience 32 576ndash589

Roumlsler F Pechmann T Streb J Roumlder B amp HennighausenE (1998) Parsing of sentences in a language with varyingword order Word-by-word variations of processing demandsare revealed by event-related brain potentials Journal ofMemory and Language 38 150ndash176

Smith N J amp Levy R (2013) The effect of wordpredictability on reading time is logarithmic Cognition128 302ndash319

Steinhauer K amp Drury J E (2012) On the early left-anteriornegativity (ELAN) in syntax studies Brain and Language120 135ndash162

Tse C-Y Lee C-L Sullivan J Garnsey S M Dell G SFabiani M et al (2007) Imaging cortical dynamics oflanguage processing with the event-related optical signalProceedings of the National Academy of Sciences USA104 17157ndash17162

Van Den Brink D Brown C M amp Hagoort P (2001)Electrophysiological evidence for early contextual influencesduring spoken-word recognition N200 versus N400 effectsJournal of Cognitive Neuroscience 13 967ndash985

Van Petten C amp Luka B J (2006) Neural localization of semanticcontext effects in electromagnetic and hemodynamic studiesBrain and Language 97 279ndash293

Wang L Jensen O Van den Brink D Weder N SchoffelenJ M Magyari L et al (2012) Beta oscillations relate to theN400m during language comprehension Human BrainMapping 33 2898ndash2912

Wang L Zhu Z amp Bastiaansen M (2012) Integration orpredictability A further specification of the functional role ofgamma oscillations in language comprehension Frontiers inPsychology 3 187

Weiss S amp Mueller H M (2012) ldquoToo many betas do not spoilthe brothrdquo The role of beta brain oscillations in languageprocessing Frontiers in Psychology 3 201

Willems R M Frank S L Nijhof A D Hagoort P amp van denBosch A (2015) Prediction during natural languagecomprehension Cerebral Cortex 26 2506ndash2516

Zion Golumbic E M Ding N Bickel S Lakatos P SchevonC A McKhann G M et al (2013) Mechanisms underlyingselective neuronal tracking of attended speech at a ldquococktailpartyrdquo Neuron 77 980ndash991

166 Journal of Cognitive Neuroscience Volume 32 Number 1

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]

Page 13: Cortical Tracking of Surprisal during Continuous Speech

This article has been cited by

1 Eleonora J Beier Suphasiree Chantavarin Gwendolyn Rehrig Fernanda Ferreira Lee M Miller 2021 Cortical Tracking ofSpeech Toward Collaboration between the Fields of Signal and Sentence Processing Journal of Cognitive Neuroscience 334574-593 [Abstract] [Full Text] [PDF] [PDF Plus]

2 Mikolaj Kegler Tobias Reichenbach 2021 Modelling the effects of transcranial alternating current stimulation on the neuralencoding of speech in noise NeuroImage 224 117427 [Crossref]

3 Christian Brodbeck Jonathan Z Simon 2020 Continuous speech processing Current Opinion in Physiology 18 25-31 [Crossref]4 Katerina D Kandylaki Sonja A Kotz 2020 Distinct cortical rhythms in speech and language processing and some more a

commentary on Meyer Sun amp Martin (2019) Language Cognition and Neuroscience 359 1124-1128 [Crossref]5 Mahmoud Keshavarzi Mikolaj Kegler Shabnam Kadir Tobias Reichenbach 2020 Transcranial alternating current stimulation

in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise NeuroImage 210 116557[Crossref]