31
Decoding Speech with ECoG – Computational Challenges Chris Holdgraf Helen Wills Neuroscience Institute, UC Berkeley

Challenge in neuroscience Neuroscience is a very broad field. It covers everything from gene expression, to a single neuron firing, to activity across

Embed Size (px)

Citation preview

  • Slide 1
  • Slide 2
  • Challenge in neuroscience Neuroscience is a very broad field. It covers everything from gene expression, to a single neuron firing, to activity across the whole brain in humans. As such, one must have a wide range of knowledge and a diverse set of techniques. Often makes it hard to have the best domain-specific knowledge.
  • Slide 3
  • Mapping the world onto the brain The trick is to fit some function that links brain activity with the outside world. However, we also want it to be a function that is scientifically meaningful.
  • Slide 4
  • Neuroscience/Psychology and computation Historically, there has been a focus on tightly- controlled experiments and simple questions. Advances in imaging and electrophysiological methods have increased the quality and quantity of data.
  • Slide 5
  • Electrocorticography a blend of temporal and spatial resolution ECoG involves the application of electrodes directly to the surface of the brain. This avoid many problems with EEG, while retaining the rich temporal precision of the signal.
  • Slide 6
  • Complex and noisy data requires careful methods ECoG is only possible in those with some sort of pathology. Moreover, recording time is short. Data driven methods bad data in = bad models out.
  • Slide 7
  • Merging ECoG and Computational Methods Might be possible to leverage the spatial precision of ECoG to decode the nature of this processing.
  • Slide 8
  • Challenge 1: GLMs in Neuroscience
  • Slide 9
  • Computational Challenge #1 How to fit a model that is both interpretable and a good fit for the electrodes response. The parameter space is increasingly complex for more hypotheses. Oftentimes, this is paired with a limited dataset. Especially in ECoG. Regularization and Feature Selection become very important
  • Slide 10
  • Want it simple? Use a GLM! Linear models allow us to predict some output with a model that is both interpretable and (relatively) easy to fit.
  • Slide 11
  • One problem with this However, the brain assuredly does not vary linearly in response to inputs from the outside world.
  • Slide 12
  • Basis functions Instead, we can decompose an input stimulus as a combination of basis functions Basically, this entails a non-linear transformation of your stimulus, so that fitting linear models to brain activity makes more sense.
  • Slide 13
  • Exploring the brain through basis functions dog hat car man
  • Slide 14
  • Fitting weights with gradient descent We can find the values for these weights by following the typical least-squares regression approach. Early stopping must be tuned carefully in order to regularize. Full gradient descent Coordinate gradient descent Threshold gradient descent
  • Slide 15
  • An application of the GLM for neural decoding
  • Slide 16
  • Neural Decoding If you can map stimuli onto brain activity, then you could also map brain activity onto stimuli. Same approach, but now our inputs are values from the electrodes, and the output is sound. Implications in Neural Prostheses and Brain Computer Interfaces Speech Decoding
  • Slide 17
  • Decoding with a linear model Original Spectrogram Reconstructed Spectrogram High Gamma Neural Signal Decoding Model = X
  • Slide 18
  • Pasley et al. Plos Biology, 2012 Decoding Listened Speech High Gamma (60-200 Hz)
  • Slide 19
  • Speech Reconstruction from ECoG
  • Slide 20
  • Challenge 2: From model output to language
  • Slide 21
  • Challenge #2 Turn a noisy, variable spectrogram reconstruction into linguistic output. Simpler methods are often not powerful enough to account for these small variations How to take advantage of temporal correlations between words / phonemes? How to accomplish this without a ton of data?
  • Slide 22
  • How to classify this output? TownDoubtPropertyPencil
  • Slide 23
  • From model output to language Borrow ideas from the speech recognition literature. Currently using Dynamic Time Warping to match output spectrograms to words.
  • Slide 24
  • Dynamic Time Warping Compute dissimilarity matrix between every pair of elements Find the optimal path in order to minimize the overall accumulated distance Effectively warps and realigns the two signals
  • Slide 25
  • Current output workflow Reconstructed Spectrogram DTW
  • Slide 26
  • Where to go from here?
  • Slide 27
  • Improving the decoder fit Clever methods of dealing with finite and noisy datasets Finding better features (basis functions) Interactions between features Fitting more complicated models Interactions between features Nonlinear models are useful for engineering, but require much more data
  • Slide 28
  • Turning output into reconstructed language Leverage the spectro-temporal statistics of language Focus on a classification rather than arbitrary decoding /ch//ks/ /w/ /g/
  • Slide 29
  • The Big Data Angle Right now, the field of ECoG is in a bit of a transition period Excitement around using computational methods, but many labs (including my own) dont have the infrastructure and culture to tackle big data problems. That said, we do have the potential to collect increasingly large datasets, once we know what to do with them.
  • Slide 30
  • The Long-Term Goal Create a modeling framework that allows us to use ECoG to decode linguistic information.
  • Slide 31
  • Fellow Decoders Special thanks Frederic Theunissen and co. Jack Gallant and co. STRFLab Team Stphanie Brian GervEddie Peter
  • Slide 32
  • Linguistic features for model output Hidden Markov Models allow us to model spectrogram output as a function of hidden states Capture the probabilistic nature of spectrogram output for a given word, as well as the temporal correlations between components of that word. /ch//ks/ /w/ /g/
  • Slide 33
  • Designing stimulus sets Data collection is very rare were lucky if we get 2 subjects per month. Need to be clever about how we design our behavioral tasks. Stimuli must be rich, and ideally could be used to answer many different questions. Need to think about what kind of stimuli we need in order to achieve the goal we prioritize. E.g., classification vs. regression
  • Slide 34
  • The Big Data Angle We have access to some ECoG recordings of a patient simply sitting in a room with a microphone placed nearby. These are often >24 hours long, and include a wide range of sounds and speech. Being able to parse through this data might allow us to fit increasingly complicated models, and vastly improve the speech recognition approach.