53
Image from Russian Varieties of “Voodoo” Ed Vul MIT

Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

  • View
    218

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Image from Russian Newsweek

Varieties of

“Voodoo”

Ed VulMIT

Page 2: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

neurocritic

bum

pyb

rain

s

“What is the most we can learn about the mind by studying the brain?”

Reception

“an unrepentant Vul”

“I haven't had so much fun since Watergate!”

“there are people who

would otherwise be dead if we

adopted Vul’s

opinions”

“grossly over-strong pronouncements”

“they look like idiots because they're out of their statistical depth”

“such interchanges, in my experience, lead to more heat than light”

Page 3: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Reception“such interchanges, in my experience, lead to more heat

than light”

From SEED magazine

Page 4: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Varieties of “voodoo”

r = 0.96

Historical perspective

What captured our attention?

The “voodoo correlation” methodand generalizations

Objections

Solutions

Page 5: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

B-Projective Psychokinesis TestEdward Cureton (1950)

Can we use the same data to make an item analysis and to validate the test?

Thanks to: Dirk Vorberg!

This analysis produces high

validity from pure noise.

851 . . .1) Noise data:

29 students & grades85 random flips/student

2) Find tags predicting GPA3) Evaluate validity on same data

Result: validity = 0.82

Page 6: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Cureton (1950)

Page 7: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

A new name in each field

• Auctioneering: “the winner’s curse”• Machine learning: “testing on training

data”“data snooping”

• Modeling: “overfitting”• Survey sampling: “selection bias”• Logic: “circularity”• Meta-analysis: “publication bias”• fMRI: “double-dipping”

“non-independence”

Page 8: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Varieties of “voodoo”

r = 0.96

Historical perspective

What captured our attention?

The “voodoo correlation” methodand generalizations

Objections

Solutions

Page 9: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Large Correlations

Eisenberger, Lieberman, & Satpute (2005)

r = 0.86

Page 10: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Large Correlations

Takahashi, Kato, Matsuura, Koeda, Yahata, Suhara, & Okubo (2008)

r = 0.82

Page 11: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Large Correlations

Hooker, Verosky, Miyakawa, Knight, & D’Esposito (2008)

r = 0.88

Page 12: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Large Correlations

Eisenberger, Lieberman, & Williams (2003)

r = 0.88

Page 13: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

How common are these

correlations?

Phil Nguyen surveyed

soc/aff/person fMRI, in search of these

correlations.

Vul, Harris, Winkielman, & Pashler (2009)

Page 14: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

What’s wrong with large correlations?

• (Expected) observed correlations are limited by reliability:

• Maximum expected observed correlation should be the geo. mean. of reliabilities…

Expected observed correlation

Geometric mean of measure reliabilities

True underlyingcorrelation

Page 15: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Reliability of personality measures

• Viswesvaran and Ones (2000) reliability of the Big Five: .73 to .78.

• Hobbs and Fowler (1974)MMPI: .66 - .94 (mean=.84)

We say: 0.8 optimistic guess for ad hoc scales

Page 16: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Reliability of fMRI measures

• (a lot of variability depending on measure)• Choose estimates most relevant for

across-subject correlations– 0-0.76 Kong et al. (2006)– 0.23 – 0.93 (mean=0.60) Manoach et al (2001) – ~0.8 Aron, Gluck, and Poldrack (2006)– 0.59-0.64 Grinband (2008)

• We say: not often over 0.7

Page 17: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Maximum expected correlation

Expected observed correlation

Geometric mean of measure reliabilities

True underlyingcorrelation

fMRI reliability

Personalityreliability

Max.truecorr.

Maximum expected correlatio

n

0.7 0.81.00.74

Page 18: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

“Upper bound”?

Many exceeding the maximum possible expected correlation.

(variability is possible, but this struck us as exceedingly unlikely)

Vul, Harris, Winkielman, & Pashler (2009)

Page 19: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Varieties of “voodoo”

r = 0.96

Historical perspective

What captured our attention?

The “voodoo correlation” methodand generalizations

Objections

Solutions

Page 20: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

How are these correlations produced?

Would you please be so kind as to answer a few very quick questions about the analysis that produced, i.e., the correlations on page XX?...Response?

“No response”

The data plotted reflect the percent signal change or difference in parameter estimates (according to some contrast) of…1 …the average of a number of voxels2 …one peak voxel that was most significant according to some functional measure

1) What is the brain measure: One peak voxel, or average of several voxels?

If 1: The voxels whose data were plotted (i.e., the “region of interest”) were selected based on…1a …only anatomical constraints1b …only functional constraints1c … anatomical and functional constraints

1.1) How was this set of voxels selected?

Go to question 2“Independent”

Page 21: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

How are these correlations produced?The functional measure used to

select the voxel(s) plotted in the figure was…A …a contrast within individual subjectsB …the result of running a regression, across subjects, of the behavioral measure of interest against brain activation (or a contrast) at each voxelC …something else?

2) What sort of functional selection?

Finally: the fMRI data (runs/blocks/trials) displayed in the figure were…A …the same data as those employed in the analysis used to select voxelsB …different data from those employed in the analysis used to select voxels.

3) Same or different data?“Independent”

“Non-Independent”

54% said correlations were computed in voxels that were selected for containing high correlations.

Page 22: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

54% did what?

Vul, Harris, Winkielman, & Pashler (2009)

Page 23: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

What would this do on noise?Simulate 10 subjects, 10,000 voxels each.

This analysis produces high correlations from pure noise.

Vul, Harris, Winkielman, & Pashler (2009)

Page 24: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

False alarms and inflation

Vul, Harris, Winkielman, & Pashler (2009)

Multiple comparisons correction:

1) Minimizes false alarms2) Exacerbates effect-size

inflation

Page 25: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Source of high correlations

Our “upper bound”

estimateColor coded by analysis

method ascertained from survey.

Vul, Harris, Winkielman, & Pashler (2009)

Biased method produced the

majority of the “surprisingly high”

correlations

Page 26: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Varieties of “voodoo”

r = 0.96

Historical perspective

What captured our attention?

The “voodoo correlation” methodand generalizations

Objections

Solutions

Page 27: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Same problem in different guises

• Null-hypothesis tests on non-independent data.

• Computing meaningless effect sizes (e.g., correlations).

• Showing uninformative/misleading graphs…(with error bars?)

Page 28: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Testing null-hypothesesnon-independently

Somerville et al, 2006

r = 0.48 p<0.038

Partially non-

independent null-

hypotheses

Page 29: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Baker, Hutchison, & Kanwisher (2007)

High selectivity from pure

noise.

Page 30: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Computing effect sizesnon-independently

“r2 = 0.74”

Eisenberger, Lieberman, & Satpute (2005)

A “voodoo” correlation

Page 31: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Modulating motion perceptionnon-independently

Low load High load

Motion No motion Motion No motion

!

Non-independent selection produces bizarre spurious

patternsRees, Frith, & Lavie (1997)

Page 32: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Data Analysis

voxels

M2(data2)

M1(data1) > α

Reporting peak voxel, cluster mean, etc. amounts to two

analysis steps.

Page 33: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Data Analysis

voxels

M2(data2)

M1(data1) > αM1 = M2

&

data1 = data2

Non-Independent:

M1 ≠ M2

or

data1 ≠ data2

Independent:

Page 34: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Data Analysis

voxels

M2(data2)

M1(data1) > α

Non-Independent: Independent:

Conclusions can only be based on selection step

M2 is biased

Conclusions can be based on both

measures

M2 provides unbiased

estimates of effect size

Page 35: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Data Analysis

voxels

M2(data2)

M1(data1) > α

Non-Independent: Independent:

Requires the most stringent multiple comparisons corrections!

(and reduced power)

Selection step need not have

the most stringent

corrections.

Inferences from M2 amount to a single test:

greater power

Page 36: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Worst possible combination:

• Inadequate multiple comparisons (often used justifiably in independent analyses)

• Non-independent null-hypothesis testing.

• No conclusions may be drawn from either step!

Page 37: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

p < 0.005uncorrected

peak voxel:F(2,14) = 22.1;

p < .0004

Significant results,

potentially out of nothing!

Summerfield et al, 2006

Uncorrected thresholds and non-independence

Page 38: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Significant results, potentially out of nothing!

Eisenberger et al, 2003

The “Forman” error and non-independence

p < 0.005cluster-size = 10

r = 0.88p < 0.005

Page 39: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Generalizations

• Whenever the same data and measure are used to select voxels and later assess their signal:– Effect sizes will be inflated (e.g., correlations)– Data plots will be distorted and misleading– Null-hypothesis tests will be invalid– Only the selection step may be used for

inference

• If multiple comparisons are inadequate, results may be produced from pure noise.

Page 40: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Varieties of “voodoo”

r = 0.96

Historical perspective

What captured our attention?

The “voodoo correlation” methodand generalizations

Objections

Solutions

Page 41: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

This is not specific to social neuroscience!

• We completely agree: the problem is very general

• Some fields are more susceptible:– fMRI, genetics, finance, immunology, etc.

• Non-independence arises when – A whole lot of data are available– Only some contains relevant signal– Researchers try to find relevant bits, and

estimate signal without extra costs of data

Page 42: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Inflation isn’t so bad

• Cannot estimate inflation in general – must reanalyze data.

• We know of one comparison (in real data) of independent – non-independent correlations

Poldrack and Mumford (submitted)

~0.8 down to ~0.5

(non-indep. R2 inflated by

more than a factor of 2)

Page 43: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Location, not magnitude, matters

• Do we want to study anatomy for its own sake?• Only location matters, but…

Localization seems to be overestimated by underpowered studies:Studies with adequate power yield small widespread correlations(Yarkoni and Braver, in press)

• Do we want to use measures/manipulations of the brain to predict/alter behavior?• Effect size really matters!

Page 44: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Restriction of range?

• Lieberman et al:Independent studies are underestimating correlations due to restricted range

• No evidence of such an effect in our surveyed data95% CI: -0.02 – 0.06

Page 45: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Inflated correlations not used for inference?

Eisenberger, Lieberman, & Satpute (2005)

Page 46: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Varieties of “voodoo”

r = 0.96

Historical perspective

What captured our attention?

The “voodoo correlation” methodand generalizations

Objections

Solutions

Page 47: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Suggestions for practitioners

1: Independent localizersUse different data or orthogonal measure to select voxels.

Page 48: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Suggestions for practitioners

1: Independent localizers2: Cross-validation methods

Use one portion of data to select, another to obtain a validation measure, repeat.

What is “left out” determines generalization:- leave one run out: generalize to new runs of

same subjects - leave one subject out: generalize to new subjects!

Page 49: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Suggestions for practitioners

1: Independent localizers2: Cross-validation methods3: Random/Permutation simulations to

evaluate independenceIf orthogonality of measures or independence of data are not obvious, try the analysis on random data (Generates empirical null-hypothesis distributions.)

Page 50: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Warnings for consumers/reviewers

• If analysis is not independent:Do not be seduced by biased graphs, statisticsRely only on the multiple comparisons correction in the voxel selection step for conclusions.

• Heuristics for spotting non-independence– Heat-map, arrow, plot– New coordinates for

each analysis

• Read the methods!

Page 51: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Questions for statisticians

• How can we quantify spatial uncertainty given anatomical variability?–…to assess “replications”–…to cross-validate across subjects–…to generalize results

• Is it possible to jointly estimate signal location and strength without bias?

?

Page 52: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Closing• Reusing the same data and measure to

(a) select data and (b) estimate effect sizes leads to overestimation: “Baloney” - Cureton (1950)

• Highest correlations in social neuroscience individual differences research are obtained in this manner: “Voodoo”

• Many fMRI papers contain variants of this error 42% in 2008 (Kriegeskorte et al, 2009; Vul & Kanwisher, in press)

Simple solutions exist to avoid these problems!

Page 53: Image from Russian Newsweek Varieties of “Voodoo” Ed Vul MIT

Thanks!Thanks to:

Nancy Kanwisher, Hal Pashler, Chris Baker, Niko Kriegeskorte, Russ Poldrack

and many others for very engaging discussions.

Vul, Harris, Winkielman, Pashler (2009) Puzzlingly high correlations in fMRI studies of emotion, personality and social cognition, Perspectives on Psychological Science.

Kriegeskorte, Simmons, Bellgowan, Baker (2009) Circular analysis in systems neuroscience: the dangers of double dipping, Nature Neuroscience.

Yarkoni (2009) Big correlations in little studies…, Perspectives on Psychological Science.

Lieberman, Berkman, Wager (2009) Correlations in social neuroscience aren’t voodoo…, Perspectives on Psychological Science.

Vul, Harris, Winkielman, Pashler (2009) Reply to comments on “Puzzlingly high correlations…”, Perspectives on Psychological Science.

Cureton (1950) Validity, Reliability, and Baloney. Educational and Psychological Measurement.

Vul, Kanwisher (in press) Begging the question: The non-independence error in fMRI Data Analysis, Foundational issues for human brain mapping.

Poldrack, Mumford (submitted) Independence in ROI analyses: Where’s the voodoo?

Nichols, Poline (2009) Commentary on Vul et al…, Perspectives on Psychological Science.

Yarkoni, Braver (in press) Cognitive neuroscience approaches to individial differences in working memory… Handbook of individual differences in cognition.