Data mining in neuroimaging - brain mapping · 2013. 7. 17. · (FWER) corrected at pb.05. Other colored regions are FWER corrected for spatial extent at pb.05 with primary alpha

Tal YarkoniUniversity of Colorado Boulder

Data mining in neuroimagingSome whys and one how

1

Wednesday, July 17, 13

What is data mining?

• “...the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.” --wikipedia

• Much of what you’ve learned about already falls under data mining

• Often contrasted with hypothesis-driven research

• Until recently, data mining had a bad reputation in psychology/neuroscience• “Going on a fishing expedition...”

2


http://en.wikipedia.org/wiki/Data_set

http://en.wikipedia.org/wiki/Data_set

http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Machine_learning




http://en.wikipedia.org/wiki/Statistics

http://en.wikipedia.org/wiki/Statistics

http://en.wikipedia.org/wiki/Database_system

http://en.wikipedia.org/wiki/Database_system

Why mine your data?

• It’s fun

• Neuroimaging data is expensive; we should maximize its utility

• You can learn interesting and unexpected new things• Bottom-up and top-down approaches are complementary

• It’s probably* wishful thinking to think you can avoid it

3


I. Sensitivity training

4


The promise of fMRI

Fig. 1, Canli et al (2001)


Look at those scatterplots!

Canli et al (2001)

Extraversion Neuroticism


In 2001...

• The neural correlates of personality (and many other things) seemed:• Highly localized• Extremely strong• Theoretically predictable

• The answers were just around the corner


What happened?

• Have these findings been replicated?

• Yes: Canli et al (2002); Kehoe et al (2011); Paulus et al (2003); Ochsner et al (2006), etc.• Some are conceptual replications• Localization varies• Measures vary

• No: Britton et al (2007); De Gelder et al (2008); Hutcherson et al (2008); Kret et al (2011), etc.• Some contradictory findings• Many unreported non-replications

• Depends on how you define replication• Arguably no finding in personality neuroscience has been replicated (Yarkoni, in press)

• Why?


Power is often very low

Yarkoni (2009, PPS) 9


How low?

10


A simulation

11


The effects of sampling…

12


13


The (hypothetical) truth





Inflated statistically significant effects

!Yarkoni (2009)


It has been claimed and demonstrated that many (and possibly most) of the conclusions drawn from biomedi-cal research are probably false1. A central cause for this important problem is that researchers must publish in order to succeed, and publishing is a highly competitive enterprise, with certain kinds of findings more likely to be published than others. Research that produces novel results, statistically significant results (that is, typically p < 0.05) and seemingly ‘clean’ results is more likely to be published2,3. As a consequence, researchers have strong incentives to engage in research practices that make their findings publishable quickly, even if those prac-tices reduce the likelihood that the findings reflect a true (that is, non-null) effect4. Such practices include using flexible study designs and flexible statistical analyses and running small studies with low statistical power1,5. A simulation of genetic association studies showed that a typical dataset would generate at least one false positive result almost 97% of the time6, and two efforts to replicate promising findings in biomedicine reveal replication rates of 25% or less7,8. Given that these pub-lishing biases are pervasive across scientific practice, it is possible that false positives heavily contaminate the neuroscience literature as well, and this problem may affect at least as much, if not even more so, the most prominent journals9,10.

Here, we focus on one major aspect of the problem: low statistical power. The relationship between study power and the veracity of the resulting finding is under-appreciated. Low statistical power (because of

low sample size of studies, small effects or both) nega-tively affects the likelihood that a nominally statistically significant finding actually reflects a true effect. We dis-cuss the problems that arise when low-powered research designs are pervasive. In general, these problems can be divided into two categories. The first concerns prob-lems that are mathematically expected to arise even if the research conducted is otherwise perfect: in other words, when there are no biases that tend to create sta-tistically significant (that is, ‘positive’) results that are spurious. The second category concerns problems that reflect biases that tend to co-occur with studies of low power or that become worse in small, underpowered studies. We next empirically show that statistical power is typically low in the field of neuroscience by using evi-dence from a range of subfields within the neuroscience literature. We illustrate that low statistical power is an endemic problem in neuroscience and discuss the impli-cations of this for interpreting the results of individual studies.

Low power in the absence of other biasesThree main problems contribute to producing unreliable findings in studies with low power, even when all other research practices are ideal. They are: the low probability of finding true effects; the low positive predictive value (PPV; see BOX 1 for definitions of key statistical terms) when an effect is claimed; and an exaggerated estimate of the mag-nitude of the effect when a true effect is discovered. Here, we discuss these problems in more detail.

1School of Experimental Psychology, University of Bristol, Bristol, BS8 1TU, UK.2School of Social and Community Medicine, University of Bristol, Bristol, BS8 2BN, UK.3Stanford University School of Medicine, Stanford, California 94305, USA.4Department of Psychology, University of Virginia, Charlottesville, Virginia 22904, USA.5Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK.6School of Physiology and Pharmacology, University of Bristol, Bristol, BS8 1TD, UK.Correspondence to M.R.M. e-mail: [email protected]:10.1038/nrn3475Published online 10 April 2013Corrected online 15 April 2013

Power failure: why small sample size undermines the reliability of neuroscienceKatherine S. Button1,2, John P. A. Ioannidis3, Claire Mokrysz1, Brian A. Nosek4, Jonathan Flint5, Emma S. J. Robinson6 and Marcus R. Munafò1

Abstract | A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.

ANALYSIS

NATURE REVIEWS | NEUROSCIENCE VOLUME 14 | MAY 2013 | 365

© 2013 Macmillan Publishers Limited. All rights reserved

19


Big Data helps

• Larger samples and/or aggregation of studies

• Increased power

• Better effect size estimation

• Helps us separate truly large effects from small effects that only seem large

• In practice, an interesting thing happens...

20


Increasing sensitivity through meta-analysis

Functional group analysis: implementation

Parcellation. The goal of parcellation was to initially reduce thedata from the voxel level to the parcel level and group voxels that wereclosely related (i.e., activated in largely the same contrasts). This stepwas necessary to make subsequent NMDS and clustering analysescomputationally tractable. Input data was an indicator matrix of 437comparison maps×18,489 suprathreshold voxels (defined as thosevoxels that survived the height threshold or the most stringent clusterthreshold). Indicator values were either 0 (“this contrast did notactivate within 10 mm of this voxel”) or 1 (“this contrast did activatewithin 10 mm of this voxel”), as shown in Fig. 2A. This 437×18,489indicatormatrixwas subjected to SVD to allow reduction of the data toa smaller set of voxels whose activation profile across studies explainsmost of the variance in the larger set. Examination of the eigenvaluesrevealed an elbow at ~50 components. Thus,we saved 50 componentsand assigned them index values of 1 through 50. Associations betweeneach voxel and each component were assessed using Kendall's Tau-b,referred to here as τ (Gibbons, 1993; Gibbons et al., 2003), anonparametric measure of association that does not require normallydistributed data and thus, unlike Pearson's r, is appropriate forindicator data. For each voxel, we computed τ with each component.Highest values of τ between a voxel and one of the components led tothe assignment of the component 1–50 ID to that voxel. Consequently,

sets of contiguous voxels that had the same ID value were defined asparcels\that is, parcels were made of voxels that correlated mosthighly with the same component, indicating co-activation. As ourparcellation algorithmwas a hard-clustering algorithm, each voxelwasassigned to the component on which it loaded most highly, even if itwas closely related to two components. While other methods such astheNMDSapproachwe used in subsequent steps can address issues ofambiguous groupings, this was not computationally feasible at thevoxel level. Importantly, this procedure allowed us to reduce the voxelspace enough to make the subsequent NMDS possible and to use thepreferred measure of association, τ. At the conclusion of this stage, asillustrated in Fig. 2B, 172 parcels, ranging in size from 10 to1060 voxels with a mean of 99.72 voxels (from 80mm

3to 8480mm

3,

with a mean of 797.76 mm3) were found across the brain. These are

roughly comparable in size to known functional subregions inanatomical studies.

Identifying regions. To further investigate whether parcels are partof larger regions or functional distributed groups (e.g. move from theparcel level to the region level), clustering could theoretically beperformed on the 172 parcels identified by SVD. However, variabilitybetween parcels is likely to be accounted for by far fewer dimensions,rendering the to-be-clustered space sparse and consequently, noisy.Therefore, the goal of this portion of the analysis was to reduce the

Fig. 1. Schematic representation of the procedures for multilevel kernel density analysis (MKDA). (A) Peak coordinates in three of the 437 comparisonmaps includedin thismeta-analysis. (B) Peak coordinates in eachmapwere separately convolvedwith a 10mmkernel, generating comparison indicatormaps (CIMs) of values 0 or 1(1 shown in black). (C) The weighted average of the CIMs (weights based on sample size and analysis type) is thresholded by the maximum proportion of activatedcomparison maps expected under the null hypothesis (shown in D) to produce significant results. (E) Significant results: yellow voxels are familywise error rate(FWER) corrected at pb .05. Other colored regions are FWER corrected for spatial extent at pb .05 with primary alpha levels of .001 (orange), and .01(pink).

1005H. Kober et al. / NeuroImage 42 (2008) 998–1031

Wager, Lindquist, & Kaplan (2007)21


clustering quality. In a high quality solution, similarity will be highamong members of the group, and low among members ofdifferent groups. Clustering quality is defined in:

q ¼X

k

X

i

Di0 " Dinm

max Di0 ;Dinmð Þ

whereD denotes distance,Di0 is the distance from parcel i to the centerof its own set,Dinn is the distance to the nearest neighboring set, and kindexes over parcels (Struyf et al., 1996). Once clustering quality q fora solution was established, we then permuted the columns of thecomponent scores, re-applied the clustering algorithm, and calculatedcluster quality q based on the permuted data. This permutationprocedure disrupts relationships between parcels by exchanging theirlocations in each dimension with those of other parcels, whilepreserving the marginal distribution of scores in each dimension. Thisprocess was repeated 5000 times to assess average cluster quality forsolutions of that size, and ultimately, to develop a null-hypothesisdistribution of q for each of 2–30 possible cluster solutions.

Fig. 3B shows the number of regions in the possible solutionson the x-axis, plotted against the improvement over permutedsolutions on the y-axis; a 21-region solution gives the maximum

improvement over the permuted data. The permuted-data distribu-tion of quality is shown in Fig. 3C for the 21-region solution.Quality for the observed-data solution is shown by the verticalblack line. Because the 21 clusters of parcels found by this methodwere predominantly contiguous in space (a minority was distri-buted, often homologous), we refer to them as regions rather thangroups, as illustrated in Fig. 2C.

Identifying groups. To investigate whether co-activation patternsacross contrasts culminated in coherent functional groups across thewhole brain (and to move from the region level to the group level), wecomputed new indicator maps for each of these 21 regions,culminating in a 437 comparison maps×21 regions indicator matrix.The indicator maps encoded whether each contrast reported anactivation coordinate anywhere within each region. We then repeatedthe dimension reduction and clustering steps as described above.These NMDS procedures (Shepard plot in Fig. 3D) yielded 16dimensions, and the subsequent clustering resulted in six functionalgroups, as illustrated in Figs. 2D and 3E–F.

Mediation analysis. In the right panel of Fig. 2D, regions thatcomprise each of the six functional groups are plotted in the space of

Fig. 4. (A–C) Un-weighted peak activations from all 437 contrasts in our meta-analysis are plotted on the lateral, orbital, and medial surfaces on the brain,respectively. Activations across studies are distributed throughout the cortex, though clusters of consistent results are concentrated in some areas. (D–F) Regionsthat were consistently activated across neuroimaging studies as determined by multi-level kernel density analysis. To achieve significance in our analysis, anysingle voxel had to be activated by at least ~4% of the contrasts in our meta-analysis (e.g. 18 contrasts or more, depending on the study weights). Yellow voxelsare family-wise error rate (FWER) corrected at pb .05. Other colored regions are Family-wise Error Rate corrected for spatial extent at pb .05 with primary alphalevels of .001 (orange), and .01(pink). See Table 1 for abbreviations of brain region names.

1008 H. Kober et al. / NeuroImage 42 (2008) 998–1031

Kober et al. (2008) 22


Molenberghs et al (2011) - Mirror system

Engelmann et al. (2011)Smoking cue reactivity

Kim (2011)Subsequent memory

Meta-analyze all the things!

Kim et al. (2011) - Task-switching

Chase et al. (2011) - Drug craving 23


Fan et al. (2011) - Empathy

Swick et al. (2011) Go/NoGo & Stop-Signal

Rehme et al. (2012) - Movement after stroke

Brooks et al. (2011) Subliminal arousing stimuli

Tillisch et al. (2011) - Rectal distension

24


Dude, where’s my effect?

• A funny thing: as samples grow, effects seem to shrink

• Vul et al (2009b) reported a correlation of -0.62 between sample size and effect size

• The largest fMRI studies invariably report much smaller effects

• E.g., Whelan et al (2012): n ~ 2,000

• Highly significant effects

• But how big?• Not big; r ~ .1

Whelan et al (2012)


Fig. S6. Effect of sample size on the correlation between age and RSFC. Shown is the mean correlation ± 2 SD across 10,000 calculations of the correlationbetween age and functional connectivity strength as a function of sample size. For each of the two regions illustrating the effect of age for the PCC seed ROI inFig. 2, we calculated the correlation between age and RSFC as a function of sample size. We randomly sampled subgroups, ranging in size from 10 to 1,090participants, from the total of 1,093 participants. We then calculated the correlation between age and RSFC for each of the subgroups. This procedure wasiterated 10,000 times to optimize randomization. (A) Mean correlation ± 2 times the SD across 10,000 iterations for the region illustrated in Fig. 2 that showeda positive correlation between age and RSFC with the PCC seed. (B) Mean correlation ± 2 times the SD across the 10,000 iterations for the region illustrated inFig. 2 that showed a negative correlation between age and RSFC with the PCC seed. In each figure, the actually observed correlation is indicated on the y-axis ina smaller font.

Fig. S7. Center-, age-, and sex-related variations in R-fMRI amplitude measures. The first column depicts group-level maps for voxelwise measures of ALFF(Upper) and fALFF (Lower). Before group-level analyses, each participant’s ALFF/fALFF map is Z-transformed, such that positive voxels reflect greater low-frequency fluctuation amplitudes than baseline (whole brain mean) and negative voxels reflect low-frequency fluctuation amplitudes below baseline. Thesecond column depicts voxels exhibiting significant effects of center, as detected by one-way ANOVA (across 24 centers, including 1,093 participants). Columns3 and 4 depict voxels exhibiting age- and sex-related variations. Center, age, and sex findings were corrected for multiple comparisons (Z > 2.3; P < 0.05,corrected). “Male” refers to significantly greater connectivity in males; similarly, “female” refers to significantly greater connectivity in females. “Older” refersto significantly increasing connectivity with increasing age, whereas “younger” refers to significantly increasing connectivity with decreasing age.

Biswal et al. www.pnas.org/cgi/content/short/0911855107 9 of 10

26

Biswal et al (2010)


Voxelwise genome-wide association study (vGWAS)

Jason L. Stein a, Xue Hua a, Suh Lee a, April J. Ho a, Alex D. Leow a,b, Arthur W. Toga a, Andrew J. Saykin c,d,Li Shen c, Tatiana Foroud d, Nathan Pankratz d, Matthew J. Huentelman e, David W. Craig e, Jill D. Gerber e,April N. Allen e, Jason J. Corneveaux e, Bryan M. DeChairo f, Steven G. Potkin g,Michael W. Weiner h,i, Paul M. Thompson a,⁎and the Alzheimer's Disease Neuroimaging Initiative 1

a Laboratory of Neuro Imaging, Department of Neurology, University of California, Los Angeles School of Medicine, Neuroscience Research Building 225E, 635 Charles Young Drive, LosAngeles, CA 90095-1769, USAb University of Illinois School of Medicine, Chicago, IL USA, and Community Psychiatry Associates, Sacramento, CA, USAc Center for Neuroimaging, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USAd Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USAe The Translational Genomics Research Institute, Phoenix, AZ, USAf Neuroscience, Molecular Medicine, Pfizer Global Research and Development, New London, CT, USAg Department of Psychiatry and Human Behavior, University of California, Irvine, Irvine, CA, USAh Departments of Radiology, Medicine and Psychiatry, University of California, San Francisco, San Francisco, CA, USAi Department of Veterans Affairs Medical Center, San Francisco, CA, USA

a b s t r a c ta r t i c l e i n f o

Article history:Received 31 August 2009Revised 21 January 2010Accepted 11 February 2010Available online 17 February 2010

The structure of the human brain is highly heritable, and is thought to be influenced by many common geneticvariants, many of which are currently unknown. Recent advances in neuroimaging and genetics have allowedcollection of both highly detailed structural brain scans and genome-wide genotype information. This wealth ofinformation presents a new opportunity to find the genes influencing brain structure. Here we explore therelation between448,293 single nucleotidepolymorphisms in eachof 31,622 voxels of the entire brain across 740elderly subjects (mean age±s.d.: 75.52±6.82years; 438male) including subjectswithAlzheimer's disease,MildCognitive Impairment, andhealthy elderly controls fromtheAlzheimer'sDiseaseNeuroimaging Initiative (ADNI).We used tensor-based morphometry to measure individual differences in brain structure at the voxel levelrelative to a study-specific template based on healthy elderly subjects. We then conducted a genome-wideassociation at each voxel to identify genetic variants of interest. By studying only the most associated variant ateach voxel, we developed a novel method to address the multiple comparisons problem and computationalburden associated with the unprecedented amount of data. No variant survived the strict significance criterion,but several genes worthy of further exploration were identified, including CSMD2 and CADPS2. These genes havehigh relevance tobrain structure. This is thefirst voxelwise genomewide association study to our knowledge, andoffers a novel method to discover genetic influences on brain structure.

© 2010 Elsevier Inc. All rights reserved.

Introduction

A key goal in imaging neuroscience is to discover specificgenetic variants that influence brain structure and function (Glahnet al., 2007a; Glahn et al., 2007b). The dynamic trajectory of braindevelopment and aging throughout life is strongly influenced by

genetic factors, and genetic variants have been discovered thatincrease the risk for Alzheimer's disease (Corder et al., 1993), othermental illness (Gottesman and Gould, 2003; Meyer-Lindenberg andWeinberger, 2006; Purcell et al., 2009) and even obesity (Fraylinget al., 2007; Ho et al., 2010). The goals are both scientific andpractical: by selecting those at genetic risk for early treatment,drug trials will be better powered to detect treatment effects(Frisoni et al., 2010). A more mechanistic understanding of mentalillness will be achieved if gene variants over-represented inpatients are studied both at the molecular level and in terms oftheir effects on brain structure.

Early neuroimaging studies of twins found that several aspects ofbrain structure are under strong genetic control (Thompson et al.,2001; Posthuma et al., 2002) and that common sets of genes may

NeuroImage 53 (2010) 1160–1174

⁎ Corresponding author. Fax: +1 310 206 5518.E-mail address: [email protected] (P. M. Thompson).

1 Data used in the preparation of this article were obtained from the Alzheimer’sDisease Neuroimaging Initiative (ADNI) database (http://www.loni.ucla.edu/ADNI).As such, there are investigators within the ADNI who contributed to the design andimplementation of ADNI and/or provided data but did not participate in the analysis orwriting of this report. A complete listing of ADNI investigators is available at http://www.loni.ucla.edu/ADNI/Collaboration/ADNI_Manuscript_Citations.pdf.

1053-8119/$ – see front matter © 2010 Elsevier Inc. All rights reserved.doi:10.1016/j.neuroimage.2010.02.032

Contents lists available at ScienceDirect

NeuroImage

j ourna l homepage: www.e lsev ie r.com/ locate /yn img

27


This does not mean...

• ...that all fMRI research is grossly underpowered

• Many questions can be answered with 4, 14, or 40 subjects

• It depends on the kind of question

• But in most cases, the prior probability of huge effects is not high

28


The limits of memory

29

participants made male/female judgments about old and newfaces (rather than explicit memory judgments), whereas duringthe second half of scanning, participants made explicit recognitiondecisions. Thus, Exp. 2 assessed classification under circum-stances in which old/new recognition was indirectly probed andexamined whether the neural signatures that characterize explicitrecognition are also diagnostic of indirect (implicit) recognition.

ResultsExp. 1: Explicit Recognition Task. Behavioral performance. Sixteenparticipants were scanned while making explicit memory judg-ments on 400 probe faces. For each, participants indicated theirmnemonic experience using one of five responses: recollected asstudied (“R old”), high confidence studied (“HC old”), low con-fidence studied (“LC old”), low confidence unstudied (“LCnew”), or high confidence unstudied (“HC new") (39). Meanrecognition accuracy was 0.71 [(hit rate (0.70) + correct rejection(CR) rate (0.71))/2]; mean d′ (1.15) differed from chance [t(15) =7.42, p < 10−5]. The distribution of responses to objectively old(OLD) and objectively new (NEW) faces confirmed that partic-ipants used the response options appropriately, rarely responding“R old” or “HC old” to NEW faces or “HC new” to OLD faces(Table S1, Exp. 1). Reaction times (RTs) followed an inverted U-shaped function, with the fastest RTs occurring for responses atthe endpoints of the recognition scale (i.e., “R old” and “HCnew”) and the slowest RTs for LC responses. Despite increased

study-test lag, mnemonic interference, and potential fatigue,performance was relatively stable (mean d′ in the first (1.23) andsecond (1.09) half of the session did not significantly differ [t(15) =1.65, P = 0.11].fMRI analyses. Assessing classifier performance. We used regularizedlogistic regression to classify the mnemonic status of individualtrials based on distributed fMRI activation patterns. Classificationperformance was indexed by receiver operating characteristic(ROC) curves, which rank the classification outputs according totheir probability estimates (from strongly favoring Class A tostrongly favoring Class B) and chart the relationship between theclassifier’s true positive rate (probability of correctly labelingexamples of Class A as Class A) and false positive rate (probabilityof incorrectly labeling examples of Class B as Class A) acrossa range of decision boundaries. The area under the curve (AUC)indexes the mean accuracy with which a randomly chosen pair ofClass A and Class B trials could be assigned to their correct classes(0.5 = random performance; 1.0 = perfect performance).Classifying faces as OLD vs. NEW. As a first assessment of the

MVPA classifier’s ability to decode whether a face was OLD orNEW, we analyzed trials in which the participant correctly labeledthe face’s objective mnemonic status, training the classifier todiscriminate OLD faces that participants called “old” (Hits) fromNEW faces called “new” (CRs). In this classification scheme, theobjective and subjective old/new status of the faces in each classwere identical, and thus the classifier could capitalize on neural

Fig. 1. Mnemonic decoding results. Mean ROC curves (A, C, and E) and their corresponding AUC values (B, D, and F) summarize classifier performance forvarious classification schemes in Exp. 1 (A–D) and Exp. 2 (E and F). AUC values are plotted for each participant’s data using unique identifiers, with the groupmeans indicated by the vertical bars. Chance performance (AUC = 0.5) is indicated by the dashed line. For each classification scheme, participants with fewerthan 18 trials in each class were excluded from analysis (Table S2). In E and F, "Implicit → Explicit" refers to a classifier trained to discriminate OLD vs. NEW onthe Implicit Recognition Task data and tested on the Explicit Recognition Task data, and "Explicit → Implicit" refers to the converse classification scheme.

9850 | www.pnas.org/cgi/doi/10.1073/pnas.1001028107 Rissman et al.

participants made male/female judgments about old and newfaces (rather than explicit memory judgments), whereas duringthe second half of scanning, participants made explicit recognitiondecisions. Thus, Exp. 2 assessed classification under circum-stances in which old/new recognition was indirectly probed andexamined whether the neural signatures that characterize explicitrecognition are also diagnostic of indirect (implicit) recognition.

ResultsExp. 1: Explicit Recognition Task. Behavioral performance. Sixteenparticipants were scanned while making explicit memory judg-ments on 400 probe faces. For each, participants indicated theirmnemonic experience using one of five responses: recollected asstudied (“R old”), high confidence studied (“HC old”), low con-fidence studied (“LC old”), low confidence unstudied (“LCnew”), or high confidence unstudied (“HC new") (39). Meanrecognition accuracy was 0.71 [(hit rate (0.70) + correct rejection(CR) rate (0.71))/2]; mean d′ (1.15) differed from chance [t(15) =7.42, p < 10−5]. The distribution of responses to objectively old(OLD) and objectively new (NEW) faces confirmed that partic-ipants used the response options appropriately, rarely responding“R old” or “HC old” to NEW faces or “HC new” to OLD faces(Table S1, Exp. 1). Reaction times (RTs) followed an inverted U-shaped function, with the fastest RTs occurring for responses atthe endpoints of the recognition scale (i.e., “R old” and “HCnew”) and the slowest RTs for LC responses. Despite increased

study-test lag, mnemonic interference, and potential fatigue,performance was relatively stable (mean d′ in the first (1.23) andsecond (1.09) half of the session did not significantly differ [t(15) =1.65, P = 0.11].fMRI analyses. Assessing classifier performance. We used regularizedlogistic regression to classify the mnemonic status of individualtrials based on distributed fMRI activation patterns. Classificationperformance was indexed by receiver operating characteristic(ROC) curves, which rank the classification outputs according totheir probability estimates (from strongly favoring Class A tostrongly favoring Class B) and chart the relationship between theclassifier’s true positive rate (probability of correctly labelingexamples of Class A as Class A) and false positive rate (probabilityof incorrectly labeling examples of Class B as Class A) acrossa range of decision boundaries. The area under the curve (AUC)indexes the mean accuracy with which a randomly chosen pair ofClass A and Class B trials could be assigned to their correct classes(0.5 = random performance; 1.0 = perfect performance).Classifying faces as OLD vs. NEW. As a first assessment of the

MVPA classifier’s ability to decode whether a face was OLD orNEW, we analyzed trials in which the participant correctly labeledthe face’s objective mnemonic status, training the classifier todiscriminate OLD faces that participants called “old” (Hits) fromNEW faces called “new” (CRs). In this classification scheme, theobjective and subjective old/new status of the faces in each classwere identical, and thus the classifier could capitalize on neural

Fig. 1. Mnemonic decoding results. Mean ROC curves (A, C, and E) and their corresponding AUC values (B, D, and F) summarize classifier performance forvarious classification schemes in Exp. 1 (A–D) and Exp. 2 (E and F). AUC values are plotted for each participant’s data using unique identifiers, with the groupmeans indicated by the vertical bars. Chance performance (AUC = 0.5) is indicated by the dashed line. For each classification scheme, participants with fewerthan 18 trials in each class were excluded from analysis (Table S2). In E and F, "Implicit → Explicit" refers to a classifier trained to discriminate OLD vs. NEW onthe Implicit Recognition Task data and tested on the Explicit Recognition Task data, and "Explicit → Implicit" refers to the converse classification scheme.

9850 | www.pnas.org/cgi/doi/10.1073/pnas.1001028107 Rissman et al.

Rissman, Greely, & Wagner (2010)


Two possible interpretations

• The largest studies to date are kind of crappy; the small studies are better

• We’re engaging in a collective exercise in wishful thinking


II. On fooling ourselves


Surely, we jest...

• Could we really be so wrong?

• Yes

• Some examples...


Deja vu all over againPfeifle, Chromosoma 98, 81 (1989)], digoxygenin-labeled probes were obtained by random-primer la-beling of cDNA fragments comprising the dmaxORF, and nucleotides 1 to 2083 for dmyc, respec-tively.

22. J. Modolell, W. Bender, M. Meselson, Proc. Natl.Acad. Sci. U.S.A. 80, 1678 (1983).

23. D. L. Lindsley and G. G. Zimm, The Genome ofDrosophila melanogaster (Academic Press, NewYork, 1992).

24. R. L. Marlor, S. M. Parkhurst, V. G. Corces, Mol. Cell.Biol. 6,1129 (1986).

25. Genotypes of flies used in this study: wild-type refersto our isogenic Oregon R stock; RNA and genomicDNA were isolated from Df(1)Pgd-kz/FM6, In(1)sc8y31d dim1 B1 and Tp(3; 1)N264-6/y1 w 1 dm1 flies; thelatter line was also used for RNA in situ hybridizationto ovaries. Both dmi'-containing lines were obtainedfrom the Bloomington stock center.

26. C. B. Moens, A. B. Auerbach, R. A. Conlon, A. L.Joyner, J. Rossant, Genes Dev. 6, 691 (1992).

27. K. Nakayama etal., Cell 85, 707 (1996); H. Kiyokawa

et al., ibid. p. 721; M. L. Fero et al., ibid. p. 733.28. B. Mozer, R. Marlor, S. M. Parkhurst, V. Corces, Mol.

Cell. Biol. 5, 885 (1985).29. We thank N. H. Brown, R. Davis, J. C. J. Eeken, S.

Elledge, J. Kiger, C. D. Laherty, A. P. Mahowald, M.Meyer, G. Poortinga, J. W. Tamkun, C. Thummel, P.P. Tolias, M. Watanabe, and the Bloomington StockCenter for reagents; D. Gottschling, S. Henikoff, T.Neufeld, and J. Roberts for critical readings of themanuscript; P. Goodwin for help with image analysis;J. Torgerson for assistance with the manuscript; andmembers of the Parkhurst, Eisenman, and Edgarlaboratories for advice and support. Funded by Na-tional Institutes of Health-National Cancer Institute(NCI) grants R01CA57138 (to R.N.E.), andRO1GM47852 (to S.M.P.), a Postdoctoral Fellow-ship from the Fonds National Suisse (to P.G.) and anNCI Japanese Foundation for Cancer ResearchTraining Program in the U.S.-Japan CooperativeCancer Committee (to Y.S.).

23 September 1996; accepted 1 November 1996

Association of Anxiety-Related Traits with aPolymorphism in the Serotonin Transporter

Gene Regulatory RegionKlaus-Peter Lesch,* Dietmar Bengel, Armin Heils, Sue Z. Sabol,Benjamin D. Greenberg, Susanne Petri, Jonathan Benjamin,

Clemens R. Muller, Dean H. Hamer, Dennis L. Murphy

Transporter-facilitated uptake of serotonin (5-hydroxytryptamine or 5-HT) has beenimplicated in anxiety in humans and animal models and is the site of action of widely useduptake-inhibiting antidepressant and antianxiety drugs. Human 5-HT transporter (5-HTT) gene transcription is modulated by a common polymorphism in its upstreamregulatory region. The short variant of the polymorphism reduces the transcriptionalefficiency of the 5-HTT gene promoter, resulting in decreased 5-HTT expression and5-HT uptake in lymphoblasts. Association studies in two independent samples totaling505 individuals revealed that the 5-HTT polymorphism accounts for 3 to 4 percent of totalvariation and 7 to 9 percent of inherited variance in anxiety-related personality traits inindividuals as well as sibships.

Anxiety-related traits are fundamental, en-during, and continuously distributed dimen-sions of normal human personality (1-3).Although twin studies have indicated thatindividual variation in measures of anxiety-related personality traits is 40 to 60% her-itable (4), none of the relevant genes hasyet been identified. Variance in personalitytraits, including those related to anxiety, isthought to be generated by a complex in-teraction of environmental and experiential

K.-P. Lesch, A. Heils, S. Petri, Department of Psychiatry,University of Wurzburg, Fuchsleinstrasse 15, 97080Wurzburg, Germany.D. Bengel, B. D. Greenberg, J. Benjamin, D. L. Murphy,Laboratory of Clinical Science, National Institute of Men-tal Health, National Institutes of Health, Bethesda, MD20892, USA.S. Z. Sabol and D. H. Hamer, Laboratory of Biochemistry,National Cancer Institute, National Institutes of Health,Bethesda, MD 20892, USA.C. R. Muller, Institute of Human Genetics, University ofWurzburg, Am Hubland, 97074 Wurzburg, Germany.*To whom correspondence should be addressed. E-mail:[email protected]

factors with a number of gene productsinvolving distinct brain systems such as themidbrain raphe serotonin (5-HT) system(4). Neurotransmission mediated by 5-HTcontributes to many physiologic functionssuch as motor activity, food intake, sleep,and reproductive activity, as well as to cog-nition and emotional states including moodand anxiety (5). By regulating the magni-tude and duration of serotonergic responses,the 5-HT transporter (5-HTT) is central tothe fine-tuning of brain serotonergic neuro-transmission and of the peripheral actionsof 5-HT. In the brain, 5-HTT expression isparticularly abundant in cortical and limbicareas involved in emotional aspects of be-havior (5). The human 5-HTT is encodedby a single gene (SLC6A4) on chromosome17q12 (6-8). Although 5-HTT has longbeen suspected to play a role in behavioraland psychiatric disorders, previous studiesdid not reveal any common, replicated5-HTT gene sequence variation in either

neuropsychiatric patients or healthy indi-viduals (9).

Recently, we reported a polymorphism inthe transcriptional control region upstreamof the 5-HTT coding sequence (10). Initialexperiments demonstrated that the long andshort variants of this 5-H17 gene-linkedpolymorphic region (5-HTTLPR) had differ-ent transcriptional efficiencies when fused toa reporter gene and transfected into humanplacental choriocarcinoma (JAR) cells (10).The 5-HTTLPR is located --1 kb upstreamof the 5-HTF gene transcription initiationsite and is composed of 16 repeat elements.The polymorphism consists of a 44-base pair(bp) insertion or deletion involving repeatelements 6 to 8 (Fig. lA). In the presentstudy, polymerase chain reaction (PCR)-based genotype analysis of 505 subjects re-vealed allele frequencies of 57% for the long(1) and 43% for the short (s) allele ( 1). The5-HTTLPR genotypes were distributed ac-cording to Hardy-Weinberg equilibrium:32% 1/1, 49% lls, and 19% s/s.

Because appropriate cell models for humanserotonergic neurons do not exist and JARcells are monozygotic for the 5- HTTLPR, westudied 5-HTT gene expression in humanlymphoblastoid cell lines. Like 5-HT neuronsand JAR cells, lymphoblasts constitutively ex-press functional 5-HTT and exhibit adeno-sine 3' ,5 '-monophosphate (cAMP)-depen-dent and protein kinase C (PKC)-dependent5-HTT gene regulation, but they do not ex-press dopamine or norepinephrine transport-ers (12). Cell lines with the complete range ofdifferent 5-HTTLPR genotypes can readily beobtained (13).

Lymphoblast cell lines with different ge-notypes were first transfected with con-structs in which a luciferase reporter genewas fused to --1.4 kb of the 5'-flankingpromoter sequence containing the l or sform of the 5-HTTLPR (11, 13, 14). Thebasal activity of the l variant was more thantwice that of the s form of the 5-HTT genepromoter (Fig. iB). Stimulation of PKC byphorbol 12-myristate 13-acetate (PMA) oractivation of adenylyl cyclase with forsko-lin-induced transcriptional activity was ob-served in both the I and s promoter variants,but the dose-dependent increases remainedproportionally smaller in the s variant (Fig.1B).

Although transfection experiments withreporter gene constructs are useful in assay-ing the transcriptional competence of a pro-moter sequence, they could conceivablygive spurious results because of the absenceof distant control elements or chromatineffects. Therefore, we next studied the ex-pression of the native 5-HTT gene in lym-phoblast cell lines cultured from subjectswith different 5-HTTLPR genotypes (15).Cells homozygous for the I form of the

SCIENCE * VOL. 274 * 29 NOVEMBER 1996 1 527

on

July

16,

201

3w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from


ORIGINAL ARTICLE

Meta-analysis of genome-wide association studies forpersonalityMHM de Moor1, PT Costa2, A Terracciano2, RF Krueger3, EJC de Geus1, T Toshiko2,BWJH Penninx4,5,6, T Esko7,8,9, PAF Madden10, J Derringer3, N Amin11, G Willemsen1, J-J Hottenga1,MA Distel1, M Uda12, S Sanna12, P Spinhoven5, CA Hartman4, P Sullivan13, A Realo14, J Allik14,AC Heath10, ML Pergadia10, A Agrawal10, P Lin10, R Grucza10, T Nutile15, M Ciullo15, D Rujescu16,I Giegling16, B Konte16, E Widen17, DL Cousminer17, JG Eriksson18,19,20,21,22, A Palotie17,23,24,25,L Peltonen17,23,24,25,{, M Luciano26, A Tenesa27, G Davies26, LM Lopez26, NK Hansell28, SE Medland28,L Ferrucci2, D Schlessinger2, GW Montgomery28, MJ Wright28, YS Aulchenko11, ACJW Janssens11,BA Oostra29, A Metspalu7,8,9, GR Abecasis30, IJ Deary26, K Raikkonen31, LJ Bierut10, NG Martin28,CM van Duijn11,32 and DI Boomsma1,32

1Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands; 2National Institute on Aging,NIH, Baltimore, MD, USA; 3Department of Psychology, University of Minnesota, Minneapolis, MN, USA; 4Department ofPsychiatry, University Medical Center Groningen, Groningen, The Netherlands; 5Departments of Clinical Psychology andPsychiatry, Leiden University, Leiden, The Netherlands; 6Department of Psychiatry, EMGOþ Institute, Neuroscience CampusAmsterdam, VU University Medical Center Amsterdam, Amsterdam, The Netherlands; 7Institute of Molecular and Cell Biology,University of Tartu, Tartu, Estonia; 8Estonian Biocentre, Tartu, Estonia; 9Estonian Genome Project of University of Tartu, Tartu,Estonia; 10Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA; 11Department ofEpidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands; 12Istituto di Neurogenetica eNeurofarmacologia, CNR, Monserrato, Cagliari, Italy; 13Department of Genetics, University of North Carolina at Chapel Hill,Chapel Hill, NC, USA; 14Department of Psychology, University of Tartu, Tartu, Estonia; 15Institute of Genetics and Biophysics,A Buzzati-Traverso—CNR, Naples, Italy; 16Department of Psychiatry, University of Munich (LMU), Munich, Germany;17Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; 18National Institute for Health andWelfare, Helsinki, Finland; 19Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland;20Vasa Central Hospital, Vasa, Finland; 21Unit of General Practice, Helsinki University Central Hospital, Helsinki, Finland;22Folkhalsan Research Centre, Helsinki, Finland; 23Wellcome Trust Sanger Institute, Cambridge, UK; 24Broad Institute ofHarvard and MIT, Cambridge, MA, USA; 25Department of Medical Genetics, University of Helsinki and Helsinki UniversityHospital, Helsinki, Finland; 26Department of Psychology, Centre for Cognitive Ageing and Cognitive Epidemiology, TheUniversity of Edinburgh, Edinburgh, UK; 27MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine,Western General Hospital, Edinburgh, UK; 28Queensland Institute of Medical Research, Brisbane, QLD, Australia;29Department of Clinical Genetics, Erasmus University Medical Center, Rotterdam, The Netherlands; 30Department ofBiostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA and 31Department of Psychology,University of Helsinki, University of Helsinki, Helsinki, Finland

Personality can be thought of as a set of characteristics that influence people’s thoughts,feelings and behavior across a variety of settings. Variation in personality is predictive of manyoutcomes in life, including mental health. Here we report on a meta-analysis of genome-wideassociation (GWA) data for personality in 10 discovery samples (17 375 adults) and fivein silico replication samples (3294 adults). All participants were of European ancestry.Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness andConscientiousness were based on the NEO Five-Factor Inventory. Genotype data of B2.4Msingle-nucleotide polymorphisms (SNPs; directly typed and imputed using HapMap data)were available. In the discovery samples, classical association analyses were performed underan additive model followed by meta-analysis using the weighted inverse variance method.Results showed genome-wide significance for Openness to Experience near the RASA1 geneon 5q14.3 (rs1477268 and rs2032794, P = 2.8" 10#8 and 3.1" 10#8) and for Conscientiousness

Received 10 April 2010; revised 14 November 2010; accepted 16 November 2010; published online 21 December 2010

Correspondence: Dr MHM de Moor, Department of Biological Psychology, VU University Amsterdam, van der Boechorststraat 1, 1081 BT,Amsterdam, The Netherlands.E-mail: [email protected]{Deceased.32Equal last authorship.

Molecular Psychiatry (2012) 17, 337–349& 2012 Macmillan Publishers Limited All rights reserved 1359-4184/12

www.nature.com/mp

34


35


Hypothesis-driven tests in neuroimaging

• Paradigmatic case is focusing on a single ROI for theoretical reasons

• This is often held up as an example of the right way to do science

• Is it?• How many ROIs were actually tested?• Were the ROIs truly a priori?• Correction for multiple comparisons?

36


Seeing things

• Can you distinguish good data from bad?

37


38


39


40


The first principle

41

One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off, because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of the electron, after Millikan. If you plot them as a function of time, you find that one is a little bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher.

Why didn't they discover that the new number was higher right away? It's a thing that scientists are ashamed of--this history--because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong--and they would look for and find a reason why something might be wrong. When they got a number closer to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that. ...

The first principle is that you must not fool yourself--and you are the easiest person to fool. So you have to be very careful about that. After you've not fooled yourself, it's easy not to fool other scientists. You just have to be honest in a conventional way after that.

--Richard Feynman, Cargo Cult Science


You are the easiest person to fool

“ For example, the maximal correlation between right superior PFC activation and choice performance was r = -.75. The magnitude of this correlation may surprise those unfamiliar with recent imaging studies that find similarly strong correlations between isolated brain activity and overt behavior across a wide variety of contexts ... We would suggest that given the magnitude of the correlations obtained, it is not unreasonable to entertain the possibility that complex cognitive functions are localized to specific regions.”

Yarkoni et al. (2005), p. 552


Big Data helps

• It’s more difficult to fool yourself when you have large N’s

• Fudging tends to take place on the margins• With enough data, there are no margins

• Large datasets are more likely to be public, transparent (e.g., fc1000)

• Not a panacea, of course

43


III. The importance of being specific

44


45


The problem of reverse inference

• Reasoning from brain activity to mental states is hard (Poldrack, 2006)

• P(State|Activation) is not the same as P(Activation|State)

• Saying activation X implies state S1 is simultaneously saying that X implies the absence of states S2, S3, S4 ...

• But how would we know?

• No individual study, no matter how big, can tell us

46


Molenberghs et al (2011) - Mirror system

Engelmann et al. (2011)Smoking cue reactivity

Kim (2011)Subsequent memory

Meta-analysis doesn’t help

Kim et al. (2011) - Task-switching

Chase et al. (2011) - Drug craving 47


Fan et al. (2011) - Empathy

Swick et al. (2011) Go/NoGo & Stop-Signal

Rehme et al. (2012) - Movement after stroke

Brooks et al. (2011) Subliminal arousing stimuli

Tillisch et al. (2011) - Rectal distension

48


Nelson et al. (2010) 49

All regions are not equal(ly active)


E.g., anterior insula is “selectively” activated by...

• Disgust

• Pain

• Empathy

• Risk

• Interoceptive awareness

• Autonomic arousal

• Self-reflection

• Task-switching

• Conscious error perception

• Response inhibition

• Speech production

• Sustained attention

• Etc...

50


Conventional meta-analysis is not enough

• We need comprehensive coverage

• Must directly compare studies of one task with all others

• Not easy to do manually

• Large-scale data mining is necessary

51


IV. One how

• There are many, many data mining approaches

• You’ve already learned about several in this course

• You’ll learn about some more in the next few days

• But I’m giving this particular talk

• This particular talk where you’re sitting here and listening to me reading directly off the slide

• Which is the kind of thing you should never, ever do

• Ever

• So I will now talk about my approach

52


53


Neurosynth is...

• A website

54


A website that has...

• Automated meta-analysis images for hundreds of concepts

• Data on ~6,000 neuroimaging studies

• 150,000+ coactivation maps

• An image library you can explore

55


Neurosynth is...

• A website

• A database

56


A database

• The largest extant database of neuroimaging activation coordinates

• Maps activations reported in articles onto concepts

• Generated using two very simple text mining assumptions

57


Assumption 1: If it looks like a duck...

• Anything that looks like brain activation is brain activation

• Parser looks for x/y/z-like numbers in sequence within HTML tables

58


Assumption 2: The brain is a bag of words

• Each published article is an unordered list of words

• A study is about the words it uses with high frequency

59


Yarkoni et al (2011, Nature Methods)

What these assumptions buy us

60


0 2 40

2

4

6 visualpainpainfulmemorynoxiousperceptionauditory

−8 −6 −4 −2 0 2 4 6 8

−10

−8

−6

−4

−2

0

2

4

6

8

z=0

Figure 1: Visualization of the voxel space x restricted to the z = 0 transversal slice. The voxels and termswith the highest associations are plotted together with a blue outline of the brain and red Talairach axes.For each term only the 50 most associated voxels are shown. The labeling on axes are the centimeters fromthe anterior commissure, — the structure indicating the origo of the Talairach space.

Tzourio-Mazoyer et al., 2002). This is in agreement with an expert review stating “the anterior cingulatecortex (ACC) is the cortical region that is activated in almost every study of elicited pain [...]” (Ingvar,1999). Figure 1 shows an example where the analysis has been restricted to the z = 0 transversal slice (zis the inferior-superior axis). Under this constraint the most important “pain” locations are found in theanterior insula and thalamus — as also found in pain studies (Ingvar, 1999). A number of other pain-relatedwords appear in the list, e.g., painful, noxious and heat. These are associated with voxels in the insular region(−36, 0, 4). Thus the highest association are in alignment with an expert review (Ingvar, 1999). Figure 2displays the load of the individual studies on the pain components at the voxel with the highest association.19 papers in total contribute to the association and at the specific voxel with the highest association thereare 3 studies that dominates with over 62% of the probability. The reason why the pain topic dominates isthat the Brede database presently records many thermal pain studies. Other words are associated with thesefrequent occurring studies: warm, hot, cold, somatosensory and sensation, and the general term “perception”associates with the same voxel as “noxious”.

The second highest is “visual” associated with (40,−64,−8). This is labeled as occipital lobe by the

5

Mining for associations between text and brain activation in a

functional neuroimaging database

Finn Arup Nielsen1,2, Lars Kai Hansen2, Daniela Balslev1,3

1Neurobiology Research Unit, Rigshospitalet, Copenhagen University Hospital,Copenhagen, Denmark;

2Informatics and Mathematical Modelling, Technical University of Denmark,Lyngby, Denmark

3Danish Research Centre for Magnetic Resonance, Copenhagen University HospitalHvidovre, Denmark

January 4, 2005

Abstract

We describe a method for mining a neuroimaging database for associations between text and brain

locations. The objective is to discover association rules between words indicative of cognitive function as

described in abstracts of neuroscience papers and sets of reported stereotactic Talairach coordinates. We

invoke a simple probabilistic framework in which kernel density estimates are used to model distributions

of brain activation foci conditioned on words in a given abstract. The principal associations are found

in the joint probability density between words and voxels. We show that the statistically motivated

associations are well aligned with general neuroscientific knowledge.

Keywords: Databases; Data Interpretation, Statistical; Information Storage and Retrieval; MagneticResonance Imaging; Positron-Emission Tomography; Brain Mapping; Meta-Analysis; Neuroimaging; Datamining

1 Introduction

One of the important goals of functional neuroimaging is to associate a specific cognitive function witha specific brain area or a network of such areas. A neuroimaging study typically investigates a cognitivefunction by manipulating the subjects’ behavior and observes the changes in brain activity that correlatewith this manipulation. Brain areas with the highest change in activity are reported as a list with three-dimensional coordinates (locations) of the activation foci in the so-called Talairach space, — a stereotacticspace allowing approximate comparison of brains from different humans (Talairach and Tournoux, 1988).The accuracy of such ‘brain maps’, however, depends critically on several factors. The experiment should beable to control the subject’s behavior; the data acquisition protocol should be able to accurately measure thechanges; the data analysis should be able to tease out the stimulus-induced changes from other fluctuationsand confounds; the registration of the activation locations to the standard volume should be accurate. It isvery difficult if not impossible for an experiment to fulfill these criteria, hence, a single neuroimaging studyrepresents a ‘noisy’ measurement of the given effect. To reduce the noise in a neuroimaging database wemay investigate for consensus in the brain-behavior relationships across a multiple relevant studies. Ideally,this would be a search through the published neuroimaging studies, in practice we are limited to mining ofrather limited databases.

1

61


Pain vs. pain

Blue: manual Green: automated62


63


Quantitative reverse inference

64


Current state of the database

• 5,800 studies drawn from 50+ journals

• ~ 200,000 reported foci

• Represents almost any psychological state that can be indexed with words

• Considerable potential for growth--many more coming soon!

• Database is completely open

65


Neurosynth is...

• A website

• A database

• Software

66


Open-source software

• http://github.com/neurosynth/

• Available packages:• ACE: Automated coordinate extraction and very basic text mining (Python)• Core tools: Large-scale synthesis of fMRI data (Python)• Neurosynth viewer: JavaScript library for interactive 2D visualization (CoffeeScript)

• Everything on the website was generated using these tools

• Focus is on simple, large-scale analysis and ease of use

67


http://github.com/neurosynth/

http://github.com/neurosynth/

Running a meta-analysis

68


69


Identifying regions coactivated with a seed

70


Coactivation

• A common question in fMRI research: what brain regions co-activate?

• Meta-analytic functional connectivity (Toro et al, 2008; Robinson et al., 2010)

• For any given region, what other regions are reported in the same articles?

PL#IL#

dACC#

dACC$rdACC$

vmPFC$

Co+activation$maps$(a)#

ACC$

Positive$emotion$Negative$emotion$

aMCC$ pMCC$

Valence$effects$in$human$emotion$

Subdivisions$of$rat$MPFC$

dmPFC$

Amygdala$ Dorsal$Raphe$

Ventral$PAG$

DorsalPAG$

Nuc.$accumbens$

Hypothalamus$

(b)# (c)# (d)#

Figure_inbox1

Roy, Shohamy, & Wager (2012)71


Data-driven insula parcellation

!

72Chang, Yarkoni, Khaw & Sanfey (2013)


Insula networks

!

73

Chang, Yarkoni, Khaw & Sanfey (2013)


Classification and decoding

74


Classification of cognitive states

Yarkoni et al (2011, Nature Methods)75


76


Decoding mental states in single subjects

• Can we identify cognitive states in individual subjects?

• Test ability to classify working memory, emotion, and pain

77

668 | VOL.8 NO.8 | AUGUST 2011 | NATURE METHODS

ARTICLES

goal-directed cognition21–23 showed con-sistent activation in the forward analysis but were relatively nonselective in the reverse analysis (Fig. 2). For emotion, the reverse inference map revealed much more selective activation in the amygdala and ventromedial prefrontal cortex (Fig. 3). For pain, the regions of maximal pain-related activation in the insula and DACC shifted from anterior foci in the forward analysis to posterior ones in the reverse analysis (Fig. 3). This is consistent with studies of nonhuman primates that have impli-cated the dorsal posterior insula as a primary integration center for nociceptive afferents24 and with studies of humans in which anterior aspects of the so-called ‘pain matrix’ responded nonse-lectively to multiple modalities25.

Several frontal regions that showed consistent activation for emotion and pain in the forward analysis were associated with a decreased likelihood that a study involved emotion or pain in the reverse inference analysis (Fig. 3). This seeming paradox reflected the fact that even though lateral and medial frontal regions had been consistently activated in studies of emotion and pain, they had been activated even more frequently in studies that did not involve emotion or pain (Supplementary Fig. 8). Thus, the fact that these regions showed involvement in pain and emotion prob-ably reflected their much more general role in cognition (for example, sustained attention or goal-directed processing22,23) rather than processes specific to pain or emotion.

These results showed that without the ability to distinguish con-sistency from selectivity, neuroimaging data can produce misleading inferences. For instance, neglecting the high base rate of DACC activ-ity might lead researchers in the areas of cognitive control, pain and emotion to conclude that the DACC has a key role in each domain. Instead, because the DACC is activated consistently in all of these states, its activation may not be diagnostic of any one of them and conversely, might even predict their absence. The NeuroSynth frame-work can potentially address this problem by enabling researchers to conduct quantitative reverse inference on a large scale.

Open-ended classification of cognitive statesAn emerging frontier in human neuroimaging is brain ‘decoding’: inferring a person’s cognitive state from their observed brain activ-ity. The problem of decoding is essentially a generalization of the univariate reverse inference problem addressed above: instead of predicting the likelihood of a particular cognitive state given acti-vation at a single voxel, one can generate a corresponding predic-tion based on an entire pattern of brain activity. The NeuroSynth

framework is well positioned for such an approach: whereas previ-ous decoding approaches have focused on discriminating between narrow sets of cognitive states and have required extensive train-ing on raw fMRI datasets (for example, refs. 26–28), the breadth of cognitive concepts represented in the NeuroSynth database affords relatively open-ended decoding, with little or no training on new datasets.

To assess the ability of our approach to decode and classify cognitive states, we trained a naive Bayes classifier29 that could discriminate between flexible sets of cognitive states given new images as input (Fig. 1c). First, we tested the classifier’s ability to classify studies in the NeuroSynth database that had been associ-ated with different terms. In a tenfold cross-validated analysis, the classifier discriminated between studies of working memory, emotion and pain with high sensitivity and specificity (Fig. 4a), showing that each of these domains had a relatively distinct neural signature (Fig. 4b).

To assess the classifier’s ability to decode cognitive states in individual human subjects, we applied the classifier to 281 sin-gle-subject activation maps derived from contrasts between: n-back working memory performance and rest (94 maps); negative and neutral emotional photographs (108 maps); and intense and mild thermal pain (79 maps). The classifier performed substantially above chance, identifying the originating study type with sensitivities of 94%, 70% and 65%, respectively (chance = 33%), and specificities of 80%, 86% and 98% (Fig. 4a). Moreover, there were systematic differences in activation patterns for cor-rectly and incorrectly classified subjects. For example, incorrectly classified subjects in physical pain tasks (Fig. 4c) systematically activated the lateral orbitofrontal cortex and dorsomedial prefrontal cortex but not secondary somatosensory cortex or the posterior insula, suggesting that the discomfort owing to noxious heat in these subjects may have been qualitatively different (for example, emotionally generated versus physically generated pain). Thus, these findings demonstrate the viability of decoding cognitive states in new subjects without training and suggest new hypoth-eses for exploration.

Next, to generalize beyond working memory, emotion and pain, we selected 25 broad psychological terms that occurred at high frequency in the database (Fig. 5). We estimated classifi-cation accuracy in tenfold cross-validated two-alternative and multiclass analyses. The classifier performed substantially above

a

c

bFigure 4 | Three-way classification of working memory, emotion and pain. (a) Naive Bayes classifier performance when cross-validated on studies in the database (left) or applied to individual subjects from studies not in the database (right). (b) Whole-brain maximum posterior probability map; each voxel is colored by the term with the highest associated probability. (c) Whole-brain maps showing the proportion of individual subjects in the three pain studies (n = 79 subjects total) who showed activation at each voxel (P < 0.05, uncorrected), averaged separately for subjects who were classified correctly (n = 51 subjects; top) or incorrectly (n = 28 subjects; bottom). Regions are color-coded according to the proportion of subjects in the sample who showed activation at each voxel.


Insula networks

!

78



A tale of two inferences

Emotion

Gustation

Face

Anxie

ty

Olfaction

Switching

Inhibition

Error Processing

Conflict

Feedback

Pain

Som

atos

enso

ry

Sensorimotor

Music

Auditory

−0.2 0 0.2 0.4 0.6 0.8 Emotion

Gustation

Face

Anxie

ty

Olfaction

Switching

Inhibition

Error Processing

Conflict

Feedback

Pain

Som

atos

enso

ry

Sensorimotor

Music

Auditory

−0.2 0 0.2 0.4 0.6 0.8

Insula RegionVentralDorsalPosterior

Forward Inference Reverse Inference

79



Work in progress

80


All of this is kind of neat...

• (if I do say so myself)

• ...but there are all kinds of limitations

81


All kinds of limitations!

• The data are very low quality• No concept of contrasts, deactivations, etc.• No ontology--every article is a bag of words

• No centralization of data• Neurosynth works on a “pull” model

• Arguably the wrong kind of data• Coordinates are thin slices of information• Images! Images! We want images!

• Still requires (a bit of) technical skill to use

• These are not easy problems to solve

82


Neurosynth is...

• A website

• A database

• Software

• Open

83


Open

• You can download all the data, tools, results

• We provide a REST API for structured queries• E.g., http://beta.neurosynth.org/studies/280.xml

• Allows other applications to hook into Neurosynth

84


http://beta.neurosynth.org/studies/280.xml

http://beta.neurosynth.org/studies/280.xml

Improving data quality and annotation

• Revamped coordinate extraction package• Easy to add new publishers, journals, etc.• Quality improving slowly

• Text-mining is more than just word counting• E.g., “topic mapping” (Poldrack et al, 2012)

• Crowdsourcing• Can’t eliminate human effort• Can we build elegant interfaces that make contributing easy?• E.g., Roberto Toro’s brainspell

85


86


87

Building ontologies

• Neurosynth is deliberately ontology agnostic

• It will use anything you give it

• Examples:• Russ Poldrack’s LDA topics (Poldrack et al., 2012, PLoS Comp. Bio)• Josh Carp’s NeuroTrends dataset (Carp, 2012, NeuroImage)

• Developing a “feature set” library anyone can contribute to

• Can ask questions like:• Which ontologies carve brain activity more sensibly?• How are different ontologies related?


Towards images instead of coordinates

• We’re still relying on discrete coordinates

• We would prefer to operate on images (e.g., Salimi-Khorshidi et al., 2009)

• Can we incentivize people to share?

88


NeuroVault

• neurovault.org

• Spearheaded by Chris Gorgolewski (see poster #1445)

• A public repository of whole-brain images

• Why use it?• Permanent URL w/ free storage• Interactive visualization• Easy sharing• Interfaces with other tools

89


90


Enables new applications

• NeuroVault has an open API

• Any image can be downloaded

• Opens the door to new applications

• For example...

91


92


93


The medium-term goal...

• You will hopefully be soon able to: • Perform sophisticated real-time meta-analyses in-browser• Instantly share the results interactively• Evaluate all details of others’ meta-analyses• Annotate anything: voxels, contrasts, studies, etc.

• Part of a broader movement towards open infrastructure and tools in neuroscience and neurotechnology• Allen Brain Atlas• Human Connectome Project• Etc. etc. etc.

94


95


Thanks to...

96R01MH096906

+ manyothers...


Documents

Data mining in neuroimaging - brain mapping · 2013. 7. 17. · (FWER) corrected at pb.05. Other colored regions are FWER corrected for spatial extent at pb.05 with primary alpha