Click here to load reader
Upload
cosmoaims-bassett
View
590
Download
0
Tags:
Embed Size (px)
Citation preview
Cosmo-not:a brief look at methods of
analysis in functional MRI andin diffusion tensor imaging (DTI)
Paul Taylor
AIMS, UMDNJ
Cosmology seminar, Nov. 2012
Outline• FMRI and DTI described (briefly)• Granger Causality• PCA• ICA
– Individual, group, covariance networks• Jackknifing/bootstrapping
The brain in brief (large scales)has many parts-
complexblood vesselsneuronsaqueous tissue(GM, WM, CSF)
activity (examples):hydrodynamicselectrical impulseschemical
how do different parts/areas work together?A) observe various parts acting together in unison during some
activities (functional relation -> fMRI)B) follow structural connections, esp. due to WM tracts, which affect
random motion in fluid/aqueous tissue (-> DTI, DSI, et al.)
has many parts-complexblood vesselsneuronsaqueous tissue(GM, WM, CSF)
activity (examples):hydrodynamicselectrical impulseschemical
The brain in brief (large scales)
Biswal et al.(2010, PNAS)
GM ROIs in networks: spatially distinct regions working in concert
Example:Resting statenetworks
Functional (GM)
Basic fMRI• General topic of functional MRI:
– Segment the brain into ‘functional networks’ for various tasks– Motor, auditory, vision, memory, executive control, etc.– Quantify, track changes, compare populations (HC vs disorder)
Basic fMRI• General topic of functional MRI:
– Segment the brain into ‘functional networks’ for various tasks– Motor, auditory, vision, memory, executive control, etc.– Quantify, track changes, compare populations (HC vs disorder)
• Try to study which regions have ‘active’ neurons– Modalities for measuring metabolism directly include PET scan
Basic fMRI• General topic of functional MRI:
– Segment the brain into ‘functional networks’ for various tasks– Motor, auditory, vision, memory, executive control, etc.– Quantify, track changes, compare populations (HC vs disorder)
• Try to study which regions have ‘active’ neurons– Modalities for measuring metabolism directly include PET scan
• With fMRI, use an indirect measure of blood oxygenation
MRI vs. fMRI
↑ neural activity ↑ blood oxygen ↑ fMRI signal
MRI fMRI
one image
fMRI Blood Oxygenation Level Dependent (BOLD) signal
indirect measure of neural activity
…
BOLD signal
Source: fMRIB Brief Introduction to fMRI
neural activity ↑ blood flow ↑ oxyhemoglobin ↑ T2* ↑ MR signal
Blood Oxygen Level Dependent signal
time
MxySignal
Mosinθ T2* task
T2* control
TEoptimum
StaskScontrol
ΔS
Source: Jorge Jovicich
Step 1: Person is told to perform a task,maybe tapping fingers, in a time-varyingpattern:
OFF
ON
30s 30s 30s 30s 30s 30s
Basic fMRI
Step 2: we measurea signal from eachbrain voxel over time
(example slice oftime series)
signal: basically, local increase in oxygenation: idea thatneurons which are active are hungrier, and demand an increase in food (oxygen)
Basic fMRI
Step 3: wecompare brainoutput signals tostimulus/inputsignal
looking for: strongsimilarity(correlation)
Basic fMRI
First Functional Images
Source: Kwong et al., 1992
Step 4: map out regions of significant correlation (yellow/red)and anti-correlation (blue), which we take to be someinvolved in specific task given (to some degree); these areasare then taken to be ‘functionally’ related networks
Basic fMRI
Basic fMRI• Have several types of tasks:
– Again: motor, auditory, vision, memory, executive control, etc.– Could investigate network by network…
Basic fMRI• Have several types of tasks:
– Again: motor, auditory, vision, memory, executive control, etc.– Could investigate network by network…
• Or, has been noticed that correlations among networkROIs exist even during rest– Subset of functional MRI called resting state fMRI (rs-fMRI)– First noticed by Biswal et al. (1995)– Main rs-fMRI signals exist in 0.01-0.1 Hz range– Offer way to study several networks at once
Basic rs-fMRIe.g., Functional Connectome Project resting state networks (Biswal et al.,
2010):
Granger Causality• Issue to address: want to find relations
between time series- does one affect anotherdirectly? Using time-lagged relations, can tryto infer ‘causality’ (Granger 1969) (NB: carefulin what one means by causal here…).
Granger Causality
• Modelling a measured time series x(t) aspotentially autoregressive (first sum) and withtime-lagged contributions of other time seriesy(i)– u(t) are errors/‘noise’ features, and c1 is baseline
• Issue to address: want to find relationsbetween time series- does one affect anotherdirectly? Using time-lagged relations, can tryto infer ‘causality’ (Granger 1969) (NB: carefulin what one means by causal here…).
Granger Causality• Calculation:
– Residual from:
– Is compared with that of:
– And put into an F-test:
– (T= number of time points, p the lag)– Model order determined with Akaike Info. Criterion or Bayeian
Info. Criterion (AIC and BIC, respectively)
Granger Causality• Results, for example, in directed graphs:
(Rypma et al. 2006)
• Principal Component Analysis (PCA): can treatFMRI dataset (3spatial+1time dimensions) as a 2Dmatrix (voxels x time).– Then, want to decompose it into spatial maps (~functional
networks) with associated time series– goal of finding components which explain max/most of
variance of dataset– Essentially, ‘eigen’-problem, use SVD to find
eigenmodes, with associated vectors determining relativevariance explained
PCA
• To calculate from (centred) dataset M with Ncolumns:– Make correlation matrix:
• C = M MT /(N-1)– Calculate eigenvectors Ei and -values λi from C, and the
principal component is:• PCi = Ei [λI]1/2
PCA
• To calculate from (centred) dataset M with Ncolumns:– Make correlation matrix:
• C = M MT /(N-1)– Calculate eigenvectors Ei and -values λi from C, and the
principal component is:• PCi = Ei [λI]1/2
• For FMRI, this can yield spatial/temporaldecomposition of dataset, with eigenvectorsshowing principal spatial maps (and associated timeseries), and the relative contribution of eachcomponent to total variance
PCA
• Graphic example: finding directions of maximumvariance for 2 sources
PCA
(example from web)
• (go to PCA reconstruction example inaction fromhttp://www.fil.ion.ucl.ac.uk/~wpenny/mbi/)
PCA
ICA• Independent
component analysis(ICA) (McKeown et al.1998; Calhoun et al.2002) is a method fordecomposing a ‘mixed’MRI signal intoseparate (statistically)independentcomponents.
(McKeown et al. 1998)
(NB: ICA~ known ‘blind source separation’ or ‘cocktail party’ problems)
ICA• ICA in brief (excellent discussion, see Hyvarinen & Oja 2000):
– ICA basically is undoing Central Limit Theorem• CLT: sum of independent variables with randomness -> Gaussianity• Therefore, to decompose the mixture, find components with
maximal non-Gaussianity– Several methods exist, essentially based on which function is powering
the decomposition (i.e., by what quantity is non-Gaussianity measured):kurtosis, negentropy, pseudo-negentropy, mutual information, max.likelihood/infomax (latter used by McKeown et al. 1998 in fMRI)
ICA• ICA in brief (excellent discussion, see Hyvarinen & Oja 2000):
– ICA basically is undoing Central Limit Theorem• CLT: sum of independent variables with randomness -> Gaussianity• Therefore, to decompose the mixture, find components with
maximal non-Gaussianity– Several methods exist, essentially based on which function is powering
the decomposition (i.e., by what quantity is non-Gaussianity measured):kurtosis, negentropy, pseudo-negentropy, mutual information, max.likelihood/infomax (latter used by McKeown et al. 1998 in fMRI)
• NB: can’t determine ‘energy’/variances or order of ICs, due toambiguity of matrix decomp (too much freedom to rescalecolumns or permute matrix).– i.e.: relative importance/magnitude of components is not known.
ICA• Simple/standard representation of matrix
decomposition for ICA of individual dataset:
time ->
voxels ->
=
voxels ->
time ->
# ICs
# ICs
x
Time series of ith component
Spatial map(IC) of ithcomponent
Have to choose number of ICs--often based on ‘knowledge’ of system, or preliminary PCA-variance explained
ICA• Can do group ICA, with assumptions of some
similarity across a group to yield ‘group level’ spatialmap– Very similar to individual spatial ICA, based on concatenating sets
along time
ICA• Can do group ICA, with assumptions of some
similarity across a group to yield ‘group level’ spatialmap– Very similar to individual spatial ICA, based on concatenating sets
along time
Subjects and tim
e ->
voxels ->
Subject 1 =
voxels -># ICs
# ICs
x
Time series of ith component, S1
Groupspatial map(IC) of ithcomponent
Subject 2
Subject 3 Time series of ith component, S2
Time series of ith component, S3
Subjects and tim
e ->
ICA• Group ICA example (visual paradigm)
(Calhoun et al. 2009)
ICA
(images:Calhoun et al. 2009)
• GLM decomp (~correlation tomodelled/known time course)
vsICA decomp (unknowncomponents-- ‘data driven’,assumptions of indep. sources)
ICA
(images:Calhoun et al. 2009)
• GLM decomp (~correlation tomodelled/known time course)
vsICA decomp (unknowncomponents-- ‘data driven’,assumptions of indep. sources)
• PCA decomp (ortho.directions of maxvariance; 2nd order)
vsICA decomp (directionsof max independence;higher order)
Dual Regression
• ICA is useful for finding an individual’s (independent)spatial/temporal maps; also for the ICs which arerepresented across a group.– Dual regression (Beckmann et al. 2009) is a method for taking
that group IC and finding its associated, subject-specific IC.
Dual Regression
• ICA is useful for finding an individual’s (independent)spatial/temporal maps; also for the ICs which arerepresented across a group.– Dual regression (Beckmann et al. 2009) is a method for taking
that group IC and finding its associated, subject-specific IC.
• 1) ICA decomposition:– >‘group’ time courses
and ‘group’ spatialmaps, independentcomponents (ICs)
(graphics from ~Beckmann et al. 2009)
time
time
time
time
voxels voxels# ICs
# ICs
x
Steps:
Dual Regression
• 2) Use group ICs asregressors perindividual in GLM– > Time series
associated with thatspatial map
(graphics from ~Beckmann et al. 2009)
time time
voxels
# ICs
# ICs
x
Dual Regression
• 2) Use group ICs asregressors perindividual in GLM– > Time series
associated with thatspatial map
• 3) GLM regression withtime courses perindividual– > find each subject’s
spatial map of that IC
(graphics from ~Beckmann et al. 2009)
time
time
voxels voxels# ICs
# ICs
x=
time time
voxels
# ICs
# ICs
x
Covariance networks (in brief)
• Group level analysis tool• Take a single property across whole brain
– That property has different values across brain (persubject) and across subjects (per voxel)
• Find voxels/regions (->network) in which that propertychanges similarly (-> covariance) as one goes fromsubject to subject (-> subject series)
ICA for BOLD series and FCNsStandard BOLD
analysisSubject series
analysis
Covariance networks (in brief)
• Group level analysis tool• Take a single property across whole brain
– That property has different values across brain (persubject) and across subjects (per voxel)
• Find voxels/regions (->network) in which that propertychanges similarly (-> covariance) as one goes fromsubject to subject (-> subject series)
• Networks reflect shared information or single influenceat basic/organizational level (discussed further, below).
Covariance networks (in brief)
• Can use with many different parameters, e.g.:– Mechelli et al. (2005): GMV– He et al. (2007): cortical thickness– Xu et al. (2009): GMV– Zielinski et al. (2010): GMV– Bergfield et al. (2010): GMV– Zhang et al. (2011): ALFF– Taylor et al. (2012): ALFF, fALFF, H, rs-fMRI mean
and std, GMV– Di et al. (2012): FDG-PET
• A) Start with group of M subjects (for example, fMRI dataset)
Analysis: making subject series
12
3
4 5
+
A
• A) Start with group of M subjects (for example, fMRI dataset)• B) Calculate a voxelwise parameter, P, producing 3D dataset per subject
Analysis: making subject series
12
3
4 5
+
A
Pi
B
• A) Start with group of M subjects (for example, fMRI dataset)• B) Calculate a voxelwise parameter, P, producing 3D dataset per subject• C) Concatenate the 3D datasets of whole group (in MNI) to form a 4D ‘subject
series’– Analogous to standard ‘time series’, but now each voxel has M values of P– Instead of i-th ‘time point’, now have i-th subject
Analysis: making subject series
12
3
4 5
+
A
n=12
34
5
C
Pi
B
Analysis: making subject series
12
3
4 5
+
A
n=12
34
5
C
Pi
B
• A) Start with group of M subjects (for example, fMRI dataset)• B) Calculate a voxelwise parameter, P, producing 3D dataset per subject• C) Concatenate the 3D datasets of whole group (in MNI) to form a 4D ‘subject
series’– Analogous to standard ‘time series’, but now each voxel has M values of P– Instead of i-th ‘time point’, now have i-th subject
• NB: for all analyses, order of subjects is arbitrary and has no effect
• A) Start with group of M subjects (for example, fMRI dataset)• B) Calculate a voxelwise parameter, P, producing 3D dataset per subject• C) Concatenate the 3D datasets of whole group (in MNI) to form a 4D ‘subject
series’– Analogous to standard ‘time series’, but now each voxel has M values of P– Instead of i-th ‘time point’, now have i-th subject
• NB: for all analyses, order of subjects is arbitrary and has no effect• Can perform usual ‘time series’ analyses (correlation, ICA, etc.) on subject series
Analysis: making subject series
12
3
4 5
+
A
n=12
34
5
C
Pi
B
Interpreting subject seriescovariance
X1 Y1
Z1
X2 Y2
Z2
X3 Y3
Z3
X4 Y4
Z4
X5 Y5
Z5
Ex.: Consider 3 ROIs (X, Y and Z) in subjects with GMV data Say, values of ROIs X and Y correlate strongly, but neither with Z.
Interpreting subject seriescovariance
X1 Y1
Z1
X2 Y2
Z2
X3 Y3
Z3
X4 Y4
Z4
X5 Y5
Z5
Ex.: Consider 3 ROIs (X, Y and Z) in subjects with GMV data Say, values of ROIs X and Y correlate strongly, but neither with Z.
--> X and Y form ‘GMV covariance network’
Interpreting subject seriescovariance
X1 Y1
Z1
Ex.: Consider 3 ROIs (X, Y and Z) in subjects with GMV data Say, values of ROIs X and Y correlate strongly, but neither with Z.
Then, knowing the X-values and one Y-value (since X and Y canhave different bases/scales) can lead us to informed guesses aboutthe remaining Y-values, but nothing can be said about Z-values.
X2 Y2
Z2
X3 Y3
Z3
X4 Y4
Z4
X5 Y5
Z5
Interpreting subject seriescovariance
X1 Y1
Z1
Then, knowing the X-values and one Y-value (since X and Y canhave different bases/scales) can lead us to informed guesses aboutthe remaining Y-values, but nothing can be said about Z-values.
X2 Y2
Z2
X3 Y3
Z3
X4 Y4
Z4
X5 Y5
Z5
-> ROIs X and Y have information about each other even acrossdifferent subjects, while having little/none about Z.-> X and Y must have some mutual/common influence, which Z maynot.
Ex.: Consider 3 ROIs (X, Y and Z) in subjects with GMV data Say, values of ROIs X and Y correlate strongly, but neither with Z.
Interpreting covariance networks• Analyzing: similarity of brain structure across subjects.
Interpreting covariance networks• Analyzing: similarity of brain structure across subjects.• Null hypothesis: local brain structure due to local control, (mainly)
independent of other regions.– -> would observe little/no correlation of ‘subject series’ non-locally
Interpreting covariance networks• Analyzing: similarity of brain structure across subjects.• Null hypothesis: local brain structure due to local control, (mainly)
independent of other regions.– -> would observe little/no correlation of ‘subject series’ non-locally
• Alt. Hypothesis: can have (1 or many) extended/multi-regioninfluences controlling localities as general feature– -> can observe consistent patterns of properties as correlation of
subject series ‘non-locally’
Interpreting covariance networks• Analyzing: similarity of brain structure across subjects.• Null hypothesis: local brain structure due to local control, (mainly)
independent of other regions.– -> would observe little/no correlation of ‘subject series’ non-locally
• Alt. Hypothesis: can have (1 or many) extended/multi-regioninfluences controlling localities as general feature– -> can observe consistent patterns of properties as correlation of
subject series ‘non-locally’– -> observed network and property are closely related
Interpreting covariance networks• Analyzing: similarity of brain structure across subjects.• Null hypothesis: local brain structure due to local control, (mainly)
independent of other regions.– -> would observe little/no correlation of ‘subject series’ non-locally
• Alt. Hypothesis: can have (1 or many) extended/multi-regioninfluences controlling localities as general feature– -> can observe consistent patterns of properties as correlation of
subject series ‘non-locally’– -> observed network and property are closely related– -> one network would have one organizing influence across itself– [-> perhaps independent networks with separate influences might
have low/no correlation; related networks perhaps have somecorrelation].
Switching gears…
• Statistical resampling: methods for estimatingconfidence intervals for estimates
• Several kinds, two common ones in fMRI arejackknifing and bootstrapping (see, e.g. Efron et al.1982).
• Can use with fMRI, and also with DTI (~for noisyellipsoid estimates-- confidence in fit parameters)
Jackknifing• Basically, take M acquisitions
e.g., M=12
Jackknifing
e.g., M=12 MJ=9
[D11 D22 D33 D12 D13 D23] = ....
• Basically, take M acquisitions• Randomly select MJ < M to use
to calculate quantity of interest– standard nonlinear fits
(ellipsoid is defined by 6 parameters of quadratic surface)
Jackknifing
e.g., M=12 MJ=9
[D11 D22 D33 D12 D13 D23] = ....[D11 D22 D33 D12 D13 D23] = ....[D11 D22 D33 D12 D13 D23] = .... ....
• Basically, take M acquisitions• Randomly select MJ < M to use
to calculate quantity of interest– standard nonlinear fits
• Repeatedly subsample largenumber (~103-104 times)
Jackknifing• Basically, take M acquisitions• Randomly select MJ < M to use
to calculate quantity of interest– standard nonlinear fits
• Repeatedly subsample largenumber (~103-104 times)
• Analyze distribution of valuesfor estimator (mean) andconfidence interval– sort/%iles
• (not so efficient)– if Gaussian, e.g. µ±2σ
• simple
e.g., M=12 MJ=9
[D11 D22 D33 D12 D13 D23] = ....[D11 D22 D33 D12 D13 D23] = ....[D11 D22 D33 D12 D13 D23] = .... ....
Jackknifing
M=32 gradients
- quite Gaussian- Gaussianity, σ
increase withdecreasing MJ
- µ changes little
Jackknifing
M=12 gradients
- not too bad withsmaller M, even
- but could usemin/max fromdistributions for%iles (don’t needto sort)
Bootstrapping• Similar principal to jackknifing,but need multiple copies of dataset.
e.g., M=12e.g., M=12
e.g., M=12 e.g., M=12
A B
C D
Bootstrapping• Make an estimate from 12 measures, but randomly selected from
each set:
e.g., M=12e.g., M=12
e.g., M=12 e.g., M=12
A B
C D
Bootstrapping• Then select another random (complete) set, build a distribution, etc.
e.g., M=12e.g., M=12
e.g., M=12 e.g., M=12
A B
C D
Summary• There are a wide array of methods applicable to MRI analysis
– Many of them involve statistics and are therefore always believable at facevalue.
– The applicability of the assumptions of the underlying mathematics to thereal situation is always key.
– Often, in MRI, we are concerned with a ‘network’ view of regions workingtogether to do certain tasks.• Therefore, we are interested in grouping regions together per task (as with
PCA/ICA)– New approaches start now to look at temporal variance of networks (using,
e.g., sliding window or wavelet decompositions).– Methods of preprocessing (noise filtering, motion correction, MRI-field
imperfections) should also be considered as part of the methodology.