57
Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators: Luis Barrios, Rebecca Hutchinson, Marcel Just, Francisco Pereira, Jay Pujara, John Ramish, Indra Rustandi

Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Embed Size (px)

Citation preview

Page 1: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning to distinguish cognitive subprocesses based on fMRI

Tom M. MitchellCenter for Automated Learning and Discovery

Carnegie Mellon University

Collaborators: Luis Barrios, Rebecca Hutchinson, Marcel Just, Francisco Pereira, Jay Pujara, John

Ramish, Indra Rustandi

Page 2: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Can we distinguish brief cognitive processes using fMRI?

Finds sentence ambiguous or not?

Page 3: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Decide whether consistent

Can we classify/track multiple overlapping processes?

Observed fMRI:

Observed button press:

Read sentence

View picture

Page 4: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Mental Algebra Task

[Anderson, Qin, & Sohn, 2002]

24 3 c

Page 5: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

[Anderson, Qin, & Sohn, 2002]

Activity Predicted by ACT-R Model

Typical ACT-R rule:

IF “_ op a = b”

THEN “ _ = <b <inv op> a>”

Page 6: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

[Anderson, Qin,& Sohn, 2002]

Page 7: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Outline

• Training classifiers for short cognitive processes– Examples– Classifier learning algorithms– Feature selection– Training across multiple subjects

• Simultaneously classifying multiple overlapping processes– Linear Model and classification– Hidden Processes and EM

Page 8: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Training “Virtual Sensors” of Cognitive Processes

Train classifiers of form: fMRI(t, t+) CognitiveProcess

e.g., fMRI(t, t+) = {ReadSentence, ViewPicture}

• Fixed set of cognitive processes

• Fixed time interval [t, t+]

Page 9: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Study 1: Pictures and Sentences

• Subject answers whether sentence describes picture by pressing button.

• 13 subjects, TR=500msec

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

Data from [Keller et al., 2001]

Page 10: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

It is not true that the star is above the plus.

Page 11: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:
Page 12: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

+

---

*

Page 13: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

.

Page 14: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

• Learn fMRI(t,t+8) {Picture,Sentence}, for t=0,8

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

picture or sentence? picture or sentence?

Difficulties:

only 8 seconds of very noisy data

overlapping hemodynamic responses

additional cognitive processes occuring simultaneously

Page 15: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning task formulation:

• Learn fMRI(t, …, t+8) {Picture, Sentence}– 40 trials (40 pictures and 40 sentences)– fMRI(t,…t+8) = voxels x time (~ 32,000 features)– Train separate classifier for each of 13 subjects– Evaluate cross-validated prediction accuracy

• Learning algorithms:– Gaussian Naïve Bayes– Linear Support Vector Machine (SVM)– k-Nearest Neighbor – Artificial Neural Networks

• Feature selection/abstraction– Select subset of voxels (by signal, by anatomy)– Select subinterval of time– Summarize by averaging voxel activities over space, time– …

Page 16: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning a Gaussian Naïve Bayes (GNB) classifier for <f1, … fn> C

For each class value, ci,

1. Estimate

2. For each feature fj estimate

modeling distribution for each ci , fj, as Gaussian,

Applying GNB classifier to new instance

f2f1

C

fn…

Page 17: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Support Vector Machines [Vapnik et al. 1992]

• Method for learning classifiers corresponding to linear decision surface in high dimensional spaces

• Chooses maximum margin decision surface

• Useful in many high-dimensional domains– Text classification– Character recognition– Microarray analysis

Page 18: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Support Vector Machines (SVM)

Page 19: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Linear SVM

Page 20: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:
Page 21: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Non-linear Support Vector Machines

• Based on applying kernel functions to data points

– Equivalent to projecting data into higher dimensional space, then finding linear decision surface

– Select kernel complexity (H) to minimize ‘structural risk’

Error on training data

Variance term related to kernel H complexity and number of training

examples m

True error rate

Page 22: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Generative vs. Discriminative Classifiers

Goal: learn , equivalently

Discriminative classifier:

• Learn directly

Generative classifier:

• Learn

• Classify using

Page 23: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Generative vs. Discriminative Classifiers

Discriminative Generative

What they estimate:

P(C|data) P(data|C)

Examples: SVM’s,

Artificial Neural Nets

Naïve Bayes, Bayesian networks

Robustness to modeling errors

Typically more robust

Less robust

Criterion for estimating parameters

Minimize classification

error

Maximize data likelihood

Page 24: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

GNB vs. Logistic regression [Ng, Jordan NIPS03]

Gaussian naïve Bayes

• Model P(X|C) as a class-conditional Gaussian

• Decision surface: hyperplane

• Learning converges in O(log(n)) examples, where n is number of data attributes

Logistic regression

• Model P(C|X) as a logistic function

• Decision surface: hyperplane

• Learning converges in O(n) examples

• Asymptotic error less or same as GNB

Page 25: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Accuracy of Trained Pict/Sent Classifier

• Results (leave one out cross validation)– Guessing 50% accuracy

– SVM: 91% mean accuracy• Single subject accuracies ranged from 75% to 98%

– GNB: 84% mean accuracy

– Feature selection step important for both• ~10,000 voxels x 16 time samples = 160,000 features• Selected only 240 voxels x 16 time samples

Page 26: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Can We Train Subject-Indep Classifiers?

Page 27: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Training Cross-Subject Classifiers for Picture/Sentence

• Approach1: define “supervoxels” based on anatomically defined brain regions– Abstract to seven brain region supervoxels– Each supervoxel 100’s to 1000’s of voxels

• Train on n-1 subjects, test on nth subject

• Result: 75% prediction accuracy over subjects outside training set– Compared to 91% avg. single-subject accuracies– Significantly better than 50% guessing accuracy

[Wang, Hutchinson, Mitchell. NIPS03]

Page 28: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Study 2: Semantic Word Categories

Word categories:• Fish• Trees• Vegetables

• Tools• Dwellings• Building parts

[Francisco Pereira]

Experimental setup:• Block design• Two blocks per

category• Each block begins by

presenting category name, then 20 words

• Subject indicates whether word fits category

Page 29: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning task formulation• Learn fMRI(t, …, t+32) WordCategory

– fMRI(t,…t+32) represented by mean fMRI image– Train on presentation 1, test on presentation 2 (and vice versa)

• Learning algorithm:– 1-Nearest Neighbor, based on spatial correlation [after Haxby]

• Feature selection/abstraction– Select most ‘object selective’ voxels, based on multiple regression

on boxcars convolved with gamma function– 300 voxels in ventral temporal cortex produced greatest accuracy

Page 30: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Results predicting word semantic category

Mean pairwise prediction accuracy averaged over 8 subjects:

• Ventral temporal: 77% (low: 57%, high 88%)• Parietal: 70%• Frontal: 67%

Random guess: 50%

Page 31: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Mean Activation per Voxel for Word Categories

Tools

Dwellings

Vegetables

one horizontal slice, ventral temporal cortex

[Pereira, et al 2004]

P(fMRI | WordCategory)

Page 32: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Plot of single-voxel classification accuracies.

Gaussian naïve Bayes classifier (yellow and red are most predictive). Images from three different subjects show similar regions with highly informative voxels.

Subject 1 Subject 2 Subject 3

Page 33: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Single-voxel GNB classification error vs. p value from T-statistic

N=10^6, P < 0.0001, Error = 0.51 N=10^3, P < 0.0001, Error = 0.01

Cross validated prediction error is unbiased estimate of the Bayes optimal error – the area under the intersection

Page 34: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Question:

Do different people’s brains ‘encode’ semantic categories

using the same spatial patterns?

No.

But, there are cross-subject regularities in “distances” between categories, as measured by classifier error rates.

Page 35: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Six-Category Study: Pairwise Classification Errors (ventral temporal cortex)

Fish Vegetables Tools Dwellings Trees Bldg Parts

Subj1 .20 .55 * .20 .15 .15 .05 *Sub2 .10 * .55 * .35 .20 .10 * .30Sub3 .20 .35 * .15 * .20 .20 .20Sub4 .15 .45 * .15 .15 .25 .05 *Sub5 .60 * .55 .25 .20 .15 * .15 *Sub6 .20 .25 .00 * .30 * .30 * .05Sub7 .15 .55 * .15 .25 .15 .05 *Mean .23 .46 .18 .21 .19 .12

* Worst * Best

Page 36: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

LDA classification of semantic categories of photographs.

[Carlson, et al., J. Cog. Neurosci, 2003]

Page 37: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Cox & Savoy, Neuroimage 2003

Trained SVM and LDA classifiers for semantic photo categories.

Classifiers applied to same subject a week later were equally accurate

Page 38: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Lessons Learned

Yes, one can train machine learning classifiers to distinguish a variety of cognitive processes– Comprehend Picture vs. Sentence– Read ambiguous sentence vs. unambiguous– Read Noun vs. Verb– Read Nouns about “tools” vs. “building parts”

Failures too:– True vs. false sentences– Negative vs. affirmative sentences

Page 39: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Which Machine Learning Method Works Best?

• GNB and SVM tend to outperform KNN• Feature selection important

NoYes

NoYes

NoYes

NoYes

Average per-subject classification error

Page 40: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Which Feature Selection Works Best?

• Conventional wisdom: pick features xi that best distinguish between classes A and B– E.g., sort xi by mutual information, choose the top n

• Surprise:

Alternative strategy worked much better

Wish to learn F: <x1,x2,…xn> {A,B}

Page 41: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

The learning setting

Class A Class B

Rest / Fixation

Voxel discriminability

Voxel activity Voxel activity

Page 42: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

GNB Classifier Errors: Feature Selection

NA.23.27.21ROI Active Average

.09.31.27.18ROI Active

.08.34.25.16Active

.10.36.34.26Discriminate target classes

.10.36.43.29All features

Word Categories

Nouns vs. Verbs

Syntactic Ambiguity

Picture Sentence

fMRI study

feat

ure

sel

ecti

on m

eth

od

Page 43: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

X1=S1+N1 X2=S2+N2

Z = N0

Goal: learn f: XY or P(Y|X)

Given:

1. Training examples <Xi, Yi> where Xi = Si + Ni , signal Si ~ P(S|Y= Yi), noise Ni ~ Pnoise

2. Observed noise with zero signal N0 ~ Pnoise

“Zero Signal” learning setting.

Zero signal (fixation)

Class 1 observations

Class 2 observations

Select features based on discrim(X1,X2) or discrim(Z,Xi)?

Page 44: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

“Zero Signal” learning setting

Conjecture: feature selection using discrim(Z,Xi) will improve relative to discrim(X1,X2) as:

• # of features increases

• # of training examples decreases

• signal/noise ratio decreases

• fraction of relevant features decreases

Page 45: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Decide whether consistent

2. Can we classify/track multiple overlapping processes?

Observed fMRI:

Observed button press:

Read sentence

View picture

Input stimuli:

?

Page 46: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Bayes Net related State-Space ModelsHMM’s, DBNs, etc. e.g., [Ghahramani, 2001]

Cognitive subprocesses / state variables:

fMRI:

see [Hojen-Sorensen et al, NIPS99]

Page 47: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Hidden Process Model Each process defined by:

– ProcessID: <comprehend sentence>– Maximum HDR duration: R– EmissionDistribution: [ W(v,t) ]

Interpretation Z of data: set of process instances– Desire max likelihood { <ProcessIDi, StartTimei>}

– Where data likelihood is

Generative model for classifying overlapping hidden processes

[with Rebecca Hutchinson]

Page 48: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Classifying Processes with HPMs

Start time known:

Start time unknown: consider candidate times S

Page 49: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

GNB classifier is a special case of HPM classifier

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

picture or sentence? picture or sentence?

16 sec.

GNB:

picture or sentence?

picture or sentence?

HPM:

Page 50: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learning HPMs

• Known start times:Least squares regression, eg. see Dale[HMB,

1999]

• Unknown start times:EM algorithm– Repeat:

• Estimate P(S|Y,W)• W’ arg max

Currently implement M step with gradient ascent

Page 51: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

OLS learns 2 processes, overlapping in time, 1 voxel, zero noise, start times known, 10 trials

Estimates:

-00.250.50.7510.750.50.253.5108e-17

-4.7535e-170.50.50.50.50.50.50.50.5

[Indra Rustandi]

Observed data

Reconstructed data

Learned process 1

Learned process 2

Page 52: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

OLS learns 2 processes, overlapping in time, 1 voxel, noise 0.2, start times known, 10 trials

Estimates:

0.00549560.324460.488470.833170.998720.865550.556240.23633-0.050592

-0.0173760.364350.361340.48560.601430.461680.541370.474660.52419

[Indra Rustandi]

Observed data

Reconstructed data

Learned process 1

Learned process 2

Page 53: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Phase II, Words every 3 seconds. Mean LFEF, subj 08179

Estimate Noun and Verb impulse responses

Verb impulse response estimated from above

Verb impulse response “ground truth” from non-

overlapping stimuli

[Indra Rustandi]

Page 54: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Decide whether consistent

Can we classify/track multiple overlapping processes?

Observed fMRI:

Observed button press:

Read sentence

View picture

Page 55: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Learned HPM with 3 processes (S,P,D), and R=13sec (TR=500msec).

P PS S

D?

Learned models

S

P

D

observed

reconstructed

D start time picked to be trailStart+18

P PS S

D D

D?

Page 56: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Initial results: HPM’s on PictSent

• EM chooses start time = 18 for hidden D process

• Classification accuracy for heldout PS/SP trials = 15/20 = 0.75

• Heldout classification accuracy same for 2 process (P,S) and 3 process (P,S,D) models

• Data likelihood over heldout data slightly better for 3 process (P,S,D)

Page 57: Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

Further reading• Carlson, et al., J. Cog. Neurosci, 2003

• Cox, D.D. and R.L. Savoy, Functional magnetic resonance imaging (fMRI) ``brain reading'': detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage, Volume 19, Pages 261--270, 2003.

• Kjems, U., L. Hansen, J. Anderson, S. Frutiger, S. Muley, J. Sidtis, D. Rottenberg, and S. C. Strother. The quantitative evalutation of functional neuroimaging experiments: mutual information learning curves, NeuroImage 15, pp. 772--786, 2002.

• Mitchell, T.M., R. Hutchinson, M. Just, S. R. Niculescu, F. Pereira, X. Wang, Classifying Instantaneous Cognitive States from fMRI Data. Proceedings of the 2003 Americal Medical Informatics Association Annual Symposium, Washington D.C., November 2003.

• Mitchell, T.M., R. Hutchinson, S. R. Niculescu, F. Pereira, X. Wang, , M. Just, S. Newman. Learning to Decode Cognitive States from Brain Images, Machine Learning, 2004.

• Strother S.C., J. Anderson, L.Hansen, U.Kjems, R.Kustra, J. Siditis, S. Frutiger, S. Muley, S. LaConte, and D. Rottenberg. The quantitative evaluation of functional neuroimaging experiments: The NPAIRS data analysis framework. NeuroImage 15:747-771, 2002.

• Wang, X., R. Hutchinson, and T.~M. Mitchell. Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects. Proceedings of the 2003 Conference on Neural Information Processing Systems, Vancouver, December 2003.