3
10.2417/12009XX.XXXX Advancing Neuroimaging Research with Predictive Multivariate Pattern Analysis (MVPA) Yaroslav O. Halchenko and Michael Hanke PyMVPA, a novel Python-based framework for multivariate pattern analysis facilitates the application of statistical learning methods to neural data. Nobel prize winner Eric Kandel wrote: “The task of neural sci- ence is to explain behavior in terms of the activities of the brain.” 1 Unfortunately, the currently prevalent data analysis strategies per se do not aim at exploring behavior in terms of neural activity. Instead, the majority of the methods primar- ily explore the data by performing mass-univariate hypothesis tests, searching for statistically significant excursions of the sig- nal from a “no-effect” baseline. restrictive modeling assump- tions (e.g., forward model of a hemodynamic response func- tion in functional magnetic resonance imaging, fMRI) and re- quire pre-processing steps (spatial and temporal smearing, aver- aging, etc.) that necessarily ignore or obliterate some information embedded in the data. Furthermore, univariate modeling of the acquired signal in terms of behavioral factors neither considers present covariance and causal structure among distinct brain ar- eas, nor does it account for the variance of the response patterns across trials. Recently, in fMRI-based research these limitations led to a reconsideration 2 of multivariate pattern analysis (MVPA) meth- ods, which had been approached more than a decade ago in studies employing positron emission tomography (PET) 3, 4 . Em- powered by the recent advances in the field of statistical learn- ing theory, striking developments have attracted considerable interest throughout the neuroscience community 5–8 . The appli- cation of regularized statistical classifiers (e.g., a support vec- tor machine 9 ; SVM) allowed for reliable prediction of behavioral conditions based on full-brain fMRI data 10 for each single trial. This reversal of the analysis strategy, where now aspects of be- havior are modeled in terms of neural activity, represents a criti- cal difference from currently established approaches (Fig. 1). Despite the advantages and promises of these methods, vari- ous factors have delayed their adoption in the field. Although a growing number of studies employ statistical learning methods, Figure 1. Reversing the analysis flow: classical statistical paramet- ric mapping (SPM) performs mass-univariate testing to localize hy- pothetical brain responses. In contrast, multivariate pattern analysis (MVPA) offers a direct quantifiable mapping 8 from brain activity pat- terns onto behavioral states. the compressed verbal descriptions of the novel and rather com- plex analysis pipelines, coupled with the lack of a unified and flexible software framework hinder straightforward replication attempts. However, replication, and hence validation of reported results by independent research groups is essential for scientific progress. To provide the neuroscience community with an adequate tool for the analysis of neural data using statistical learning meth- ods, we have developed PyMVPA (Python MVPA, http://www. pymvpa.org). This is a free, open-source, and platform-agnostic project that utilizes the Python programming language. Python is a perfect choice because of its portability, concise and descrip- tive syntax, as well as the ability to easily interface to low-level libraries and high-level scientific scripting environments, such as R 11 . PyMVPA makes it easy to access data stored in standard data formats (e.g., NIfTI), to perform typical procedures of statis- Continued on next page

Advancing Neuroimaging Research with Predictive ...haxbylab.dartmouth.edu/publications/HH09.pdf · project that utilizes the Python programming language. Python is a perfect choice

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

10.2417/12009XX.XXXX

Advancing Neuroimaging Research with Predictive

Multivariate Pattern Analysis (MVPA)

Yaroslav O. Halchenko and Michael Hanke

PyMVPA, a novel Python-based framework for multivariate patternanalysis facilitates the application of statistical learning methods toneural data.

Nobel prize winner Eric Kandel wrote: “The task of neural sci-ence is to explain behavior in terms of the activities of thebrain.”1 Unfortunately, the currently prevalent data analysisstrategies per se do not aim at exploring behavior in terms ofneural activity. Instead, the majority of the methods primar-ily explore the data by performing mass-univariate hypothesistests, searching for statistically significant excursions of the sig-nal from a “no-effect” baseline. restrictive modeling assump-tions (e.g., forward model of a hemodynamic response func-tion in functional magnetic resonance imaging, fMRI) and re-quire pre-processing steps (spatial and temporal smearing, aver-aging, etc.) that necessarily ignore or obliterate some informationembedded in the data. Furthermore, univariate modeling of theacquired signal in terms of behavioral factors neither considerspresent covariance and causal structure among distinct brain ar-eas, nor does it account for the variance of the response patternsacross trials.

Recently, in fMRI-based research these limitations led to areconsideration2 of multivariate pattern analysis (MVPA) meth-ods, which had been approached more than a decade ago instudies employing positron emission tomography (PET)3, 4. Em-powered by the recent advances in the field of statistical learn-ing theory, striking developments have attracted considerableinterest throughout the neuroscience community5–8. The appli-cation of regularized statistical classifiers (e.g., a support vec-tor machine9; SVM) allowed for reliable prediction of behavioralconditions based on full-brain fMRI data10 for each single trial.This reversal of the analysis strategy, where now aspects of be-havior are modeled in terms of neural activity, represents a criti-cal difference from currently established approaches (Fig. 1).

Despite the advantages and promises of these methods, vari-ous factors have delayed their adoption in the field. Although agrowing number of studies employ statistical learning methods,

Figure 1. Reversing the analysis flow: classical statistical paramet-ric mapping (SPM) performs mass-univariate testing to localize hy-pothetical brain responses. In contrast, multivariate pattern analysis(MVPA) offers a direct quantifiable mapping8 from brain activity pat-terns onto behavioral states.

the compressed verbal descriptions of the novel and rather com-plex analysis pipelines, coupled with the lack of a unified andflexible software framework hinder straightforward replicationattempts. However, replication, and hence validation of reportedresults by independent research groups is essential for scientificprogress.

To provide the neuroscience community with an adequate toolfor the analysis of neural data using statistical learning meth-ods, we have developed PyMVPA (Python MVPA, http://www.pymvpa.org). This is a free, open-source, and platform-agnosticproject that utilizes the Python programming language. Pythonis a perfect choice because of its portability, concise and descrip-tive syntax, as well as the ability to easily interface to low-levellibraries and high-level scientific scripting environments, suchas R11. PyMVPA makes it easy to access data stored in standarddata formats (e.g., NIfTI), to perform typical procedures of statis-

Continued on next page

10.2417/12009XX.XXXX Page 2/2

tical learning (e.g., training, testing, feature selection, and cross-validation without “peeking” or “double-dipping”12), while ex-ploring the multitude of available learning methods, and to fa-cilitate rapid development and easy contributions from any in-terested researcher.

We designed PyMVPA to offer a high-level programming in-terface that allows for a flexible combination of the providedbuilding blocks to express complex analysis pipelines in just afew lines of code11. This feature enables researchers to easilyreplicate existing studies, and to carry out novel non-standardanalyses. Moreover, the descriptive power of human-readable,yet compact source code opens the possibility of including thecomplete source code of a study as a supplemental material of apublication. An advocation of mandatory code-inclusion for anyresearch paper could tremendously expedite verification andadoption of novel analysis strategies.

To demonstrate the power and applicability of the suggestedanalysis methodologies we13 analyzed data from four differentneural modalities and accompanied the publication with com-plete source code of all analyses. Essentially the same work-flow was used for all neural data modalities: basic preprocess-ing, training and testing (by cross-validation) of statistical classi-fiers, and the analysis of the trained classifiers sensitivities withrespect to any given input dimension. Applied to extracellularrecordings data (post-stimulus time histograms of spike counts)it was possible to reliably identify eight original auditory stim-uli conditions for single trials, and to obtain an assessment ofthe relevance of any given neuron to the processing of stim-ulus conditions. Applied to EEG data from a visual process-ing experiment, it was possible not only to confirm results ofconventional event-related potential (ERP) analysis, but also todiscover a late response component not revealed by ERPs. Ap-plied to fMRI data from an event-related visual object processingexperiment14, PyMVPA allowed to identify the original stimuluscondition of each trial, and to provide spatio-temporal categoryspecificity profiles without imposing any specific hemodynamicresponse model.

MVPA methods are in no way limited to the processing of datafrom one modality at a time. For example, a reliable descriptionof fMRI data in terms of a simultaneously recorded EEG signal(Fig. 2) allows for identification of areas that are active duringany given task, and localization of generators and covariates ofdominant EEG frequency bands15. Furthermore, the constructedEEG-to-fMRI mapping can be used for filtering of fMRI and EEGsignals, and for EEG-driven interpolation of fMRI timeseries.

To improve the understanding of brain function, neuroscienceresearch requires versatile computing environments and ad-

Figure 2. Reliable mapping from EEG onto fMRI data15. The upperplot outlines the analysis workflow, where for each fMRI voxel a map-per (multiple regression) is trained on the joint EEG signal. The lowerpart shows thresholded maps of correlation coefficients between pre-dicted and actual fMRI data from an auditory experiment.

vanced methods that make efficient use of acquired data. Meth-ods developed in the domain of machine and statistical learn-ing are generic, powerful, and their application to neural re-search has already provided new insights about the brain. OurPyMVPA analysis framework aims to provide a convenient, ex-tensive, and expandable environment to apply existing and todevelop new methods for the analysis of neural data. PyMVPA’suser base has been constantly growing and new data analy-sis methods and methodologies are continuously added to theframework. Future development will further enrich the avail-

10.2417/12009XX.XXXX Page 3/2

able techniques and offer promising analysis strategies. One ofthe immediate next steps will allow for an improved transparentand unbiased model selection. This new functionality will espe-cially help to apply complex non-linear methods while ensuringvalid results16.The authors want to thank all PyMVPA developers and contributors,whose continuous efforts help to forge an open and customizable anal-ysis framework.

Author Information

Michael HankeDepartment of Experimental PsychologyUniversity of MagdeburgMagdeburg, Germany

Yaroslav HalchenkoDepartment of Psychological and Brain Sciences,Center for Cognitive NeuroscienceDartmouth CollegeHanover, New Hampshire

References

1. E. R. Kandel, J. H. Schwartz, and T. M. Jessell, Principles of Neural Science,4th ed., McGraw-Hil, New York, 2000.

2. J. Haxby, M. Gobbini, M. Furey, A. Ishai, J. Schouten, and P. Pietrini, Distributedand overlapping representations of faces and objects in ventral temporal cortex, Sci-ence 293, pp. 2425–2430, 2001. doi:10.1126/science.1063736

3. J. R. Moeller and S. Strother, A regional covariance approach to the analysis of func-tional patterns in positron emission tomographic data, Journal of Cerebral BloodFlow and Metabolism 11, pp. 121–135, 1991.

4. J. S. Kippenhahn, W. W. Barker, S. Pascal, J. Nagel, and R. Duara, Evaluation ofa neural-network classifier for PET scans of normal and Alzheimer’s disease subjects,Journal of Nuclear Medicine 33, pp. 1459–1467, 1992.

5. S. Hanson, T. Matsuka, and J. Haxby, Combinatorial codes in ventral temporal lobefor object recognition: Haxby (2001) revisited: is there a “face” area?, NeuroImage23, pp. 156–166, 2004. doi:10.1016/j.neuroimage.2004.05.020

6. K. A. Norman, S. M. Polyn, G. J. Detre, and J. V. Haxby, Beyond mind-reading:multi-voxel pattern analysis of fMRI data, Trends in Cognitive Science 10,pp. 424–430, 2006. doi:10.1016/j.tics.2006.07.005

7. J.-D. Haynes and G. Rees, Decoding mental states from brain activity in humans,Nature Reviews Neuroscience 7, pp. 523–534, 2006.

8. A. J. O’Toole, F. Jiang, H. Abdi, N. Penard, J. P. Dunlop, and M. A. Parent,Theoretical, statistical, and practical perspectives on pattern-based classification ap-proaches to the analysis of functional neuroimaging data, Journal of Cognitive Neu-roscience 19, pp. 1735–1752, 2007.

9. V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York,1995.

10. S. J. Hanson and Y. O. Halchenko, Brain reading using full brain support vectormachines for object recognition: there is no “face” identification area, Neural Com-putation 20, pp. 486–503, 2008. doi:10.1162/neco.2007.09-06-340

11. M. Hanke, Y. O. Halchenko, P. B. Sederberg, S. J. Hanson, J. V. Haxby, andS. Pollmann, PyMVPA: A Python toolbox for multivariate pattern analysis of fMRIdata, Neuroinformatics 7 (1), pp. 37–53, Mar. 2009. doi:10.1007/s12021-008-9041-y

12. N. Kriegeskorte, W. K. Simmons, P. S. F. Bellgowan, and C. I. Baker, Circularanalysis in systems neuroscience: the dangers of double dipping, Nature Neuro-science 12 (5), pp. 535–540, 2009. doi:10.1038/nn.2303

13. M. Hanke, Y. O. Halchenko, P. B. Sederberg, E. Olivetti, I. Frund, J. W. Rieger,C. S. Herrmann, J. V. Haxby, S. J. Hanson, and S. Pollmann, PyMVPA: A Unify-ing Approach to the Analysis of Neuroscientific Data, Frontiers in Neuroinformat-ics 3 (3), 2009. doi:10.3389/neuro.11.003.2009

14. M. Hanke, Advancing the Understanding of Brain Function with Multivari-ate Pattern Analysis. PhD thesis, Otto-von-Guericke-University, Magdeburg,Germany, May 2009.

15. Y. O. Halchenko, Predictive Decoding of Neural Data. PhD thesis, NJIT,Newark, NJ, USA, May

16. F. Pereira, T. Mitchell, and M. Botvinick, Machine learning classifiersand fMRI: A tutorial overview, NeuroImage 45, pp. 199–209, 2009.doi:10.1016/j.neuroimage.2008.11.007

c© 2009 Institute of Neuromorphic EngineeringContinued on next page