Upload
gaelvaroquaux
View
114
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Talk given at Gipsa-lab on using machine learning to learn from fMRI brain patterns and regions related to behavior. This talks focuses on the signal and inverse-problem aspects of the equation, as well as on the software.
Citation preview
Brain reading:Compressive sensing, fMRI,and statistical learning in Python
Gael Varoquaux
INRIA/Parietal
1 Brain reading: predictive models
2 Sparse recovery with correlated de-signs
3 Having an impact: software
G Varoquaux 2
1 Brain reading: predictivemodels
Functional brain imaging:Study of human cognition
G Varoquaux 3
1 Brain imagingfMRI data > 50 000 voxels stimuli
Standard analysis
Detect voxels that correlate tothe stimuli
G Varoquaux 4
1 Brain imagingfMRI data > 50 000 voxels stimuli
Standard analysis
Detect voxels that correlate tothe stimuli
G Varoquaux 4
1 Brain reading
Predicting the object category viewed[Haxby 2001, Distributed and OverlappingRepresentations of Faces and Objects in VentralTemporal Cortex ]
Supervised learning task
Find combinations of voxels topredict the stimuli
Multi-variate statistics
G Varoquaux 5
1 Brain reading
Predicting the object category viewed[Haxby 2001, Distributed and OverlappingRepresentations of Faces and Objects in VentralTemporal Cortex ]
Supervised learning taskFind combinations of voxels topredict the stimuli
Multi-variate statistics
G Varoquaux 5
1 Linear model for fMRI
sign(X w + e) = y
Designmatrix × Coefficients =
Target
Problem size:p > 50 000n∼100 per category
G Varoquaux 6
1 Estimation: statistical learning
Inverse problem Minimize an error term:
w = argminw
l(y− X w)
Ill-posed: X is not full rank
Inject prior: regularize
w = argminw
l(y− X w) + p(w)
Example: Lasso = sparse regressionw = argmin
w‖y− X w‖2
2 + `1(w)
`1(w) =∑
i |wi |
G Varoquaux 7
1 Estimation: statistical learning
Inverse problem Minimize an error term:
w = argminw
l(y− X w)
Ill-posed: X is not full rank
Inject prior: regularize
w = argminw
l(y− X w) + p(w)
Example: Lasso = sparse regressionw = argmin
w‖y− X w‖2
2 + `1(w)
`1(w) =∑
i |wi |
G Varoquaux 7
1 TV-penalization to promote regions
Neuroscientists think interms of brain regions
[Haxby 2001]
Total-variation penalizationImpose sparsity on the gradientof the image:
p(w) = `1(∇w)
[Michel TMI 2011]
G Varoquaux 8
1 Prediction with logistic regression - TV
w = argminw
l(y− X w) + p(w)
l : least-square or logistic-regression p: TV
Optimization: proximal gradient (FISTA)- Gradient descent on l (smooth term)- Projections on TV
Prediction performance:Feature screening + SVC 0.77
Sparse regression 0.78Total Variation 0.84
(explained variance)
Standard analysis
G Varoquaux 9
1 Prediction with logistic regression - TV
w = argminw
l(y− X w) + p(w)
l : least-square or logistic-regression p: TV
Optimization: proximal gradient (FISTA)- Gradient descent on l (smooth term)- Projections on TV
Prediction performance:Feature screening + SVC 0.77
Sparse regression 0.78Total Variation 0.84
(explained variance)
Standard analysis
G Varoquaux 9
1 Standard analysis or predictive modeling?
Predicting the object category viewed[Haxby 2001, Distributed and OverlappingRepresentations of Faces and Objects in VentralTemporal Cortex ]
Take home message:brain regions, not prediction
G Varoquaux 10
1 Standard analysis or predictive modeling?
Recovery rather than prediction
G Varoquaux 11
1 Good prediction 6=6=6= good recovery
SimulationsGround truth
Ground truth
LassoPrediction: 0.78Recovery: 0.429
SVMPrediction: 0.71Recovery: 0.486
Need a method suited for recoveryG Varoquaux 12
1 Brain mapping: a statistical perspective
× =
Small sample linear model estimationRandom correlated design
Problem size:p > 50 000n∼100 per category
G Varoquaux 13
1 Brain mapping: a statistical perspective
× =
Small sample linear model estimationRandom correlated design
Estimation strategyStandard approach: univariate statisticsMultiple comparisons problem
⇒ statistical power ∝ 1/p
We want sub-linear sample complexity⇒ non-rotationally-invariant estimators
e.g. `1 penalization[ Ng, 2004 Feature selection, `1 vs. `2 regularization,
and rotational invariance ]G Varoquaux 13
1 Brain mapping as a sparse recovery task
Recovering brain regions
nmin ∼ 2 k log pRestricted-isometry-like property:The design matrix is well-conditioned [Candes 2006]on sub-matrices of size > k [Tropp 2004]
Mutual incoherence: [Wainwright 2009]
Relevant features S and irrelevantones S are not too correlated
Violated by spatialcorrelations in our design
G Varoquaux 14
1 Brain mapping as a sparse recovery task
lasso: 23 non-zeros
Recovering k non-zero coefficientsnmin ∼ 2 k log pRestricted-isometry-like property:The design matrix is well-conditioned [Candes 2006]on sub-matrices of size > k [Tropp 2004]
Mutual incoherence: [Wainwright 2009]
Relevant features S and irrelevantones S are not too correlated
Violated by spatialcorrelations in our design
G Varoquaux 14
1 Randomized sparsity[Meinshausen and Buhlmann 2010, Bach 2008]
Perturb the design matrix:Subsample the dataRandomly rescale features
+ Run sparse estimatorKeep features that are often selected⇒ Good recovery without mutual incoherence
But RIP-like condition
Cannot recover largecorrelated groups
For m correlated features,selection frequency divided by m
G Varoquaux 15
2 Sparse recovery withcorrelated designs
Not enough samples: nmin ∼ 2 k log p
Spatial correlations
G Varoquaux 16
2 Sparse recovery withcorrelated designs
Combining
Clustering
Sparsity
[Varoquaux ICML 2012]
G Varoquaux 16
2 Brain parcellations
Spatially-connected hierarchical clustering⇒ reduces voxel numbers [Michel Pat Rec 2011]
Replace features by corresponding cluster average+ Use a supervised learner on reduced problem
Cluster choice sub-optimal for regression
G Varoquaux 17
2 Brain parcellations + sparsityHypothesis: clustering compatible with support(w)
Benefits of clusteringReduced k and p⇒ n > nmin: good side of the “sharp threshold”
Cluster together correlated features⇒ Improves RIP-like conditions
Recovery possible on reduced features
G Varoquaux 18
2 Randomized parcellations + sparsity
Randomization+ Stability scores
Marginalize thecluster choice
Relaxes mutualincoherencerequirement
G Varoquaux 19
2 Algorithm1 set n clusters and sparsity by cross-validation
2 loop: perturb randomly data
3 clustering to form reduced features
4 sparse linear model on reduced features
5 accumulate non-zero features
6 threshold map of apparition countsG Varoquaux 20
2 Simulationsp = 2048, k = 64, n = 256 (nmin > 1000)Weights w: patches of varying sizeDesign matrix: 2D Gaussian random images of
varying smoothnessEstimators
Randomized lassoElastic Net
Our approachUnivariate F test
Parameters set by cross-validation
Performance metricRecovery seen as a 2-class problem⇒ Report AUC of the precision-recall curve
G Varoquaux 21
2 When can we recover patches?
Smoothness helps (reduces noise degrees of freedom)Small patches are hard to recover
G Varoquaux 22
2 What is the best method for patch recovery?
For small patches: elastic netFor large patches: randomized-clustered sparsityLarge patches and very smooth images: F-test
G Varoquaux 23
2 Randomizing clusters matters!
Non-random (Ward) clustering inefficientFully-random performs quite wellRandomized Ward gives an extra gain
Degenerate family of clusterassignements
G Varoquaux 24
2 Randomizing clusters matters!
Non-random (Ward) clustering inefficientFully-random performs quite wellRandomized Ward gives an extra gain
Degenerate family of clusterassignements
G Varoquaux 24
2 fMRI: face vs house discrimination [Haxby 2001]
F-scores
L R
y=-31 x=17
L R
z=-17
G Varoquaux 25
2 fMRI: face vs house discrimination [Haxby 2001]
`1 Logistic
L R
y=-31 x=17
L R
z=-17
G Varoquaux 25
2 fMRI: face vs house discrimination [Haxby 2001]
Randomized `1 Logistic
L R
y=-31 x=17
L R
z=-17
G Varoquaux 25
2 fMRI: face vs house discrimination [Haxby 2001]
Randomized Clustered `1 Logistic
L R
y=-31 x=17
L R
z=-17
G Varoquaux 25
2 fMRI: face vs house discrimination [Haxby 2001]
F-scores
L R
y=-31 x=17
L R
z=-17
G Varoquaux 25
2 Predictive model on selected featuresObject recognition [Haxby 2001]
Using recovered features improves predictionG Varoquaux 26
Small-sample brain mappingSparse recovery of patches onspatially-correlated designs
Ingredients: Clustering + Randomization⇒ Reduced feature set compatible with recovery:
matches sparsity pattern + recovery conditions
Compressive sensing questionsCan we recover k > n, in the case of large patches?When do we loose sub-linear sample-complexity?
G Varoquaux 27
3 Having an impact: softwareHow to we reach our targetaudience (neuroscientists)?
How do we disseminate our ideas?
How do we facilitate new ideas?
G Varoquaux 28
3 Python as a scientific environment
General purpose
Easy, readable syntax
Interactive (ipython)
Great scientific libraries (numpy, scipy, matplotlib...)
G Varoquaux 29
3 Growing a software stack
Code lines are costly
⇒ Open source + community driven
Need quality and impact
⇒ Focus on the general purpose libraries first
Scikit-learn: machine learning in Pythonhttp://scikit-learn.org
G Varoquaux 30
3 scikit-learn: machine learning in PythonTechnical choices
Prefer Python or Cython, focus on readabilityDocumentation and examples are paramountLittle object-oriented design. Opt for simplicityPrefer algorithms to frameworkCode quality: consistency and testing
G Varoquaux 31
3 scikit-learn: machine learning in PythonAPI
Inputs are numpy arraysLearn a model from the data:estimator.fit(X train, Y train)
Predict using learned modelestimator.predict(X test)
Test goodness of fitestimator.score(X test, y test)
Apply change of representationestimator.transform(X, y)
G Varoquaux 32
3 scikit-learn: machine learning in PythonComputational performance
scikit-learn mlpy pybrain pymvpa mdp shogunSVM 5.2 9.47 17.5 11.52 40.48 5.63LARS 1.17 105.3 - 37.35 - -Elastic Net 0.52 73.7 - 1.44 - -kNN 0.57 1.41 - 0.56 0.58 1.36PCA 0.18 - - 8.93 0.47 0.33k-Means 1.34 0.79 ∞ - 35.75 0.68
Algorithms rather than low-level optimizationconvex optimization + machine learning
Avoid memory copies
G Varoquaux 33
3 scikit-learn: machine learning in PythonCommunity
163 contributors since 2008, 397 github forks25 contributors in latest release (3 months span)
Why this success?Trendy topic?Low barrier of entryFriendly and very skilled mailing listCredit to people
G Varoquaux 34
3 Research code 6= software libraryFactor 10 in time investment
Corner cases in algorithm (numerical stability)
Multiple platforms and library versions (Blas )
Documentation
Making it simpler (and get less educated users)
User and developer support ( ∼ 100 mails/day)
Exhausting,but has impact on science and society
Technical + scientific tradeoffs
Ease of install/ease of use rather than speed
Focus on “old science”
Nice publications and theorems are nota recipe for useful code
G Varoquaux 35
3 Research code 6= software libraryFactor 10 in time investment
Corner cases in algorithm (numerical stability)
Multiple platforms and library versions (Blas )
Documentation
Making it simpler (and get less educated users)
User and developer support ( ∼ 100 mails/day)
Exhausting,but has impact on science and society
Technical + scientific tradeoffs
Ease of install/ease of use rather than speed
Focus on “old science”
Nice publications and theorems are nota recipe for useful code
G Varoquaux 35
Statistical learning to study brain function
Spatial regularization forpredictive modelsTotal variation
Compressive-sensing approachSparsity + randomizedclustering for correlated designs
Machine learning in PythonHuge impact
Post-doc positions availableG Varoquaux 36
Bibliography[Michel TMI 2011] V. Michel, et al., Total variation regularization forfMRI-based prediction of behaviour, IEEE Transactions in medicalimaging (2011)http://hal.inria.fr/inria-00563468/en
[Varoquaux ICML 2012] G. Varoquaux, A. Gramfort, B. ThirionSmall-sample brain mapping: sparse recovery on spatially correlateddesigns with randomization and clustering, ICML (2012)http://hal.inria.fr/hal-00705192/en
[Michel Pat Rec 2011] V. Michel, et al., A supervised clustering approachfor fMRI-based inference of brain states, Pattern Recognition (2011)http://hal.inria.fr/inria-00589201/en
[Pedregosa ICML 2011] F. Pedregosa, el al., Scikit-learn: machinelearning in Python, JMRL (2011)http://hal.inria.fr/hal-00650905/en
G Varoquaux 37