Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why we (usually) don’t worry about multiplecomparisons
Andrew Gelman, Jennifer Hill, and Masanao YajimaColumbia University
8 Nov 2007
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Contents
I What is the multiple comparisons problem?
I Why don’t I (usually) care about it?
I Some stories
I Statistical framework and multilevel modeling
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Contents
I What is the multiple comparisons problem?
I Why don’t I (usually) care about it?
I Some stories
I Statistical framework and multilevel modeling
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Contents
I What is the multiple comparisons problem?
I Why don’t I (usually) care about it?
I Some stories
I Statistical framework and multilevel modeling
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Contents
I What is the multiple comparisons problem?
I Why don’t I (usually) care about it?
I Some stories
I Statistical framework and multilevel modeling
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Contents
I What is the multiple comparisons problem?
I Why don’t I (usually) care about it?
I Some stories
I Statistical framework and multilevel modeling
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
What is the multiple comparisons problem?
I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”
I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
What is the multiple comparisons problem?
I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”
I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
What is the multiple comparisons problem?
I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”
I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
What is the multiple comparisons problem?
I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”
I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
What is the multiple comparisons problem?
I Even if nothing is going on, you can find thingsI Data snoopingI Overwhelmed by data and plausible “findings”
I “If not accounted for, false positive differences are very likelyto be identified”: 5% of our 95% intervals will be wrong
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why don’t I (usually) care about multiple comparisons?
I When looked at one way, multiple comparisons seem like amajor worry
I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why don’t I (usually) care about multiple comparisons?
I When looked at one way, multiple comparisons seem like amajor worry
I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why don’t I (usually) care about multiple comparisons?
I When looked at one way, multiple comparisons seem like amajor worry
I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why don’t I (usually) care about multiple comparisons?
I When looked at one way, multiple comparisons seem like amajor worry
I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why don’t I (usually) care about multiple comparisons?
I When looked at one way, multiple comparisons seem like amajor worry
I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Why don’t I (usually) care about multiple comparisons?
I When looked at one way, multiple comparisons seem like amajor worry
I But from another perspective, they don’t matter at all:I I don’t (usually) study study phenomena with zero effectsI I don’t (usually) study comparisons with zero differencesI I don’t mind being wrong 5% of the time
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Some stories
I SAT coaching in 8 schools
I Effects of electromagnetic fields at 38 frequencies
I Teacher and school effects in NYC schools
I Grades and classroom seating
I Beautiful parents have more daughters
I Comparing test scores across states
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
SAT coaching in 8 schools
Estimated Standard errortreatment of effect
School effect, yj estimate, σj
A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18
I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis
I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
SAT coaching in 8 schools
Estimated Standard errortreatment of effect
School effect, yj estimate, σj
A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18
I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis
I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
SAT coaching in 8 schools
Estimated Standard errortreatment of effect
School effect, yj estimate, σj
A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18
I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis
I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
SAT coaching in 8 schools
Estimated Standard errortreatment of effect
School effect, yj estimate, σj
A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18
I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis
I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
SAT coaching in 8 schools
Estimated Standard errortreatment of effect
School effect, yj estimate, σj
A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18
I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis
I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
SAT coaching in 8 schools
Estimated Standard errortreatment of effect
School effect, yj estimate, σj
A 28 15B 8 10C −3 16D 7 11E −1 9F 1 11G 18 10H 12 18
I Separate experiment in each schoolI Variation in treatment effects is indistinguishable from 0I Multilevel Bayes analysis
I Overlappling confidence intervals for the 8 school effectsI Statements such as Pr (effect in A > effect in C)= 0.7
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Effects of electromagnetic fields at 38 frequencies
0 100 200 300 400 500Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0.0
0.2
0.4
Estimates with statistical significance
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)E
st. t
reat
men
t effe
ct
I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Effects of electromagnetic fields at 38 frequencies
0 100 200 300 400 500Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0.0
0.2
0.4
Estimates with statistical significance
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)E
st. t
reat
men
t effe
ct
I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Effects of electromagnetic fields at 38 frequencies
0 100 200 300 400 500Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0.0
0.2
0.4
Estimates with statistical significance
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)E
st. t
reat
men
t effe
ct
I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Effects of electromagnetic fields at 38 frequencies
0 100 200 300 400 500Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0.0
0.2
0.4
Estimates with statistical significance
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)E
st. t
reat
men
t effe
ct
I Background: electromagnetic fields and cancerI Original article summarized using p-valuesI Confidence intervals show comparisons more clearly
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Separate estimates and hierarchical Bayes estimates
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0 100 200 300 400 500
0.0
0.2
0.4
Multilevel estimates ± standard errors
Frequency of magnetic field (Hz)E
stim
ated
trea
tmen
t effe
ct
I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Separate estimates and hierarchical Bayes estimates
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0 100 200 300 400 500
0.0
0.2
0.4
Multilevel estimates ± standard errors
Frequency of magnetic field (Hz)E
stim
ated
trea
tmen
t effe
ct
I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Separate estimates and hierarchical Bayes estimates
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0 100 200 300 400 500
0.0
0.2
0.4
Multilevel estimates ± standard errors
Frequency of magnetic field (Hz)E
stim
ated
trea
tmen
t effe
ct
I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Separate estimates and hierarchical Bayes estimates
0 100 200 300 400 500
0.0
0.2
0.4
Estimates ± standard errors
Frequency of magnetic field (Hz)
Est
. tre
atm
ent e
ffect
0 100 200 300 400 500
0.0
0.2
0.4
Multilevel estimates ± standard errors
Frequency of magnetic field (Hz)E
stim
ated
trea
tmen
t effe
ct
I Most comparisons are no longer statistically significantI “Multiple comparisons” is less of a concernI We moved the intervals together instead of widening them!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Teacher and school effects in NYC schools
I Goal is to estimate range of variation(How important are teachers? Schools?)
I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)
I The “multiple comparisons” issue never arises!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Teacher and school effects in NYC schools
I Goal is to estimate range of variation(How important are teachers? Schools?)
I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)
I The “multiple comparisons” issue never arises!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Teacher and school effects in NYC schools
I Goal is to estimate range of variation(How important are teachers? Schools?)
I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)
I The “multiple comparisons” issue never arises!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Teacher and school effects in NYC schools
I Goal is to estimate range of variation(How important are teachers? Schools?)
I Key statistic is year-to-year persistence (e.g., for teachersranked in top 25% one year, how well do they do the next?)
I The “multiple comparisons” issue never arises!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Grades and classroom seating
I Classroom demonstration
I Assign students random numbers as “grades”
I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand
I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)
I This is a pure multiple comparisons problem!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Grades and classroom seating
I Classroom demonstration
I Assign students random numbers as “grades”
I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand
I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)
I This is a pure multiple comparisons problem!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Grades and classroom seating
I Classroom demonstration
I Assign students random numbers as “grades”
I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand
I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)
I This is a pure multiple comparisons problem!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Grades and classroom seating
I Classroom demonstration
I Assign students random numbers as “grades”
I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand
I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)
I This is a pure multiple comparisons problem!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Grades and classroom seating
I Classroom demonstration
I Assign students random numbers as “grades”
I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand
I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)
I This is a pure multiple comparisons problem!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Grades and classroom seating
I Classroom demonstration
I Assign students random numbers as “grades”
I Ask students with “grades” 0–25 to raise one finger,students with “grades” 75–100 to raise one hand
I Instructor scans the room to find a statistically significantcomparison (e.g., “boys on the left side of the classroom havehigher grades than girls in the back row”)
I This is a pure multiple comparisons problem!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Beautiful parents have more daughters
I S. Kanazawa (2007). Beautiful parents have more daughters:a further implication of the generalized Trivers-Willardhypothesis. Journal of Theoretical Biology.
I Attractiveness was measured on a 1–5 scale(“very unattractive” to “very attractive”)
I 56% of children of parents in category 5 were girlsI 48% of children of parents in categories 1–4 were girls
I Statistically significant (2.44 s.e.’s from zero, p = 1.5%)
I But the simple regression of sex ratio on attractiveness is notsignificant (estimate is 1.5 with s.e. of 1.4)
I Multiple comparisons problem: 5 natural comparisons × 4possible time summaries!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Comparing test scores across states
I National Assessment of Educational Progress (NAEP)
I Comparing states: which comparisons are statisticallysignificant?
I 50× 49/2: a classic multiple comparions problem!
I Our multilevel inferences
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Comparing test scores across states
I National Assessment of Educational Progress (NAEP)
I Comparing states: which comparisons are statisticallysignificant?
I 50× 49/2: a classic multiple comparions problem!
I Our multilevel inferences
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Comparing test scores across states
I National Assessment of Educational Progress (NAEP)
I Comparing states: which comparisons are statisticallysignificant?
I 50× 49/2: a classic multiple comparions problem!
I Our multilevel inferences
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Comparing test scores across states
I National Assessment of Educational Progress (NAEP)
I Comparing states: which comparisons are statisticallysignificant?
I 50× 49/2: a classic multiple comparions problem!
I Our multilevel inferences
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Comparing test scores across states
I National Assessment of Educational Progress (NAEP)
I Comparing states: which comparisons are statisticallysignificant?
I 50× 49/2: a classic multiple comparions problem!
I Our multilevel inferences
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Classical multiple comparisons inferences for NAEP
Ma
ine
(M
E)
Min
ne
sota
(M
N)
Co
nn
ect
icu
t (C
T)
Wis
con
sin
(W
I)
No
rth
Da
kota
(N
D)
Ind
ian
a (
IN)
Iow
a (
IA) ‡
Ma
ssa
chu
sett
s (M
A)
Texa
s (T
X)
Ne
bra
ska
(N
E)
Mo
nta
na
(M
T) ‡
New
Je
rsey
(N
J)‡
Uta
h (
UT
)
Mic
hig
an
(M
I)‡
Pe
nn
sylv
an
ia (
PA
)‡
Co
lora
do
(C
O)
Wa
shin
gto
n (
WA
)
Ve
rmo
nt
(VT
) ‡
Mis
sou
ri (
MO
)
No
rth
Ca
rolin
a (
NC
)
DD
ES
S (
DD
)
Ala
ska
(A
K) ‡
Ore
go
n (
OR
)
We
st V
irg
inia
(W
V)
Do
DD
S (
DO
)
Wyo
min
g (
WY
)
Vir
gin
ia (
VA
)
New
Yo
rk (
NY
) ‡
Ma
ryla
nd
(M
D)
Rh
od
e I
sla
nd
(R
I)
Ke
ntu
cky
(KY
)
Ten
ne
sse
e (
TN
)
Nev
ad
a (
NV
) ‡
Ari
zon
a (
AZ
)‡
Ark
an
sas
(AR
)
Flo
rid
a (
FL
)
Ge
org
ia (
GA
)
De
law
are
(D
E)
Haw
aii
(HI)
New
Mex
ico
(N
M)
So
uth
Ca
rolin
a (
SC
) ‡
Ala
ba
ma
(A
L)
Ca
lifo
rnia
(C
A)
Lo
uis
ian
a (
LA
)
Mis
siss
ipp
i (M
S)
Gu
am
(G
U)
Dis
tric
t o
f C
olu
mb
ia (
DC
)
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Classical inferences for NAEP: close-up
Ma
ine
(M
E)
Min
ne
sota
(M
N)
Co
nn
ect
icu
t (C
T)
Wis
con
sin
(W
I)
No
rth
Da
kota
(N
D)
Ind
ian
a (
IN)
Iow
a (
IA) ‡
Ma
ssa
chu
sett
s (M
A)
Texa
s (T
X)
Ne
bra
ska
(N
E)
Mo
nta
na
(M
T) ‡
New
Je
rsey
(N
J)‡
Uta
h (
UT
)
Mic
hig
an
(M
I)‡
Pe
nn
sylv
an
ia (
PA
)‡
Co
lora
do
(C
O)
Wa
shin
gto
n (
WA
)
Ve
rmo
nt
(VT
) ‡
Mis
sou
ri (
MO
)
No
rth
Ca
rolin
a (
NC
)
DD
ES
S (
DD
)
Ala
ska
(A
K) ‡
Ore
go
n (
OR
)
We
st V
irg
inia
(W
V)
Do
DD
S (
DO
)
Wyo
min
g (
WY
)
Vir
gin
ia (
VA
)
New
Yo
rk (
NY
) ‡
Ma
ryla
nd
(M
D)
Rh
od
e I
sla
nd
(R
I)
Ke
ntu
cky
(KY
)
Ten
ne
sse
e (
TN
)
Nev
ad
a (
NV
) ‡
Ari
zon
a (
AZ
)‡
Ark
an
sas
(AR
)
Flo
rid
a (
FL
)
Ge
org
ia (
GA
)
De
law
are
(D
E)
Haw
aii
(HI)
New
Mex
ico
(N
M)
So
uth
Ca
rolin
a (
SC
) ‡
Ala
ba
ma
(A
L)
Ca
lifo
rnia
(C
A)
Lo
uis
ian
a (
LA
)
Mis
siss
ipp
i (M
S)
Gu
am
(G
U)
Dis
tric
t o
f C
olu
mb
ia (
DC
)
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
ME
MN
C T
WI
ND
IN
IA
MA
T X
NE
MT
NJ
UT
MI
PA
C O
WA
V T
MO
NC
DD
AK
OR
WV
DO
WY
VA
NY
MD
R I
K Y
T N
NV
AZ
AR
F L
G A
DE
HI
NM
S C
AL
C A
LA
MS
G U
DC
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Multilevel inferences for NAEP: close-up
Ma
ine
M
inn
eso
ta
Co
nn
ect
icu
t W
isco
nsi
n
No
rth
Da
kota
In
dia
na
Io
wa
M
ass
ach
use
tts
Te
xas
Ne
bra
ska
M
on
tan
a
Ne
w J
ers
ey
Uta
h
Mic
hig
an
P
en
nsy
lva
nia
C
olo
rad
o
Wa
shin
gto
n
Ve
rmo
nt
Mis
sou
ri
No
rth
Ca
rolin
a
Ala
ska
O
reg
on
W
est
Vir
gin
ia
Wyo
min
g
Vir
gin
ia
Ne
w Y
ork
M
ary
lan
d
Rh
od
e I
sla
nd
K
en
tuck
y T
en
ne
sse
e
Ne
vad
a
Ari
zon
a
Ark
an
sas
Flo
rid
a
Ge
org
ia
De
law
are
H
aw
aii
Ne
w M
exi
co
So
uth
Ca
rolin
a
Ala
ba
ma
C
alif
orn
ia
Lo
uis
ian
a
Mis
siss
ipp
i
C omparis ons of A verage Mathematic s S c ale S c hores forG rade 4 P ublic S c hools in P artic ipating J uris dic tions
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
MS
LA
C A
AL
S C
NM
HI
DE
G A
F L
AR
AZ
NV
T N
K Y
R I
MD
NY
V A
WY
WV
OR
AK
NC
MO
V T
WA
C O
P A
MI
UT
NJ
MT
NE
T X
MA
IA
IN
ND
WI
C T
MN
ME
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
NAEP: classical vs. multilevel
I Both procedures are algorithmic (“push a button”)
I Both procedures treat 50 states exchangeably
I Multilevel inferences are sharper (more comparisons are“statistically significant”)
I How can this be?
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
NAEP: classical vs. multilevel
I Both procedures are algorithmic (“push a button”)
I Both procedures treat 50 states exchangeably
I Multilevel inferences are sharper (more comparisons are“statistically significant”)
I How can this be?
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
NAEP: classical vs. multilevel
I Both procedures are algorithmic (“push a button”)
I Both procedures treat 50 states exchangeably
I Multilevel inferences are sharper (more comparisons are“statistically significant”)
I How can this be?
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
NAEP: classical vs. multilevel
I Both procedures are algorithmic (“push a button”)
I Both procedures treat 50 states exchangeably
I Multilevel inferences are sharper (more comparisons are“statistically significant”)
I How can this be?
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
NAEP: classical vs. multilevel
I Both procedures are algorithmic (“push a button”)
I Both procedures treat 50 states exchangeably
I Multilevel inferences are sharper (more comparisons are“statistically significant”)
I How can this be?
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Something for nothing? A free lunch?
I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50
I Not an issue with NAEP
I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust
I Classical procedure does not learn from the data
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Something for nothing? A free lunch?
I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50
I Not an issue with NAEP
I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust
I Classical procedure does not learn from the data
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Something for nothing? A free lunch?
I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50
I Not an issue with NAEP
I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust
I Classical procedure does not learn from the data
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Something for nothing? A free lunch?
I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50
I Not an issue with NAEP
I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust
I Classical procedure does not learn from the data
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
SAT coaching in 8 schoolsEffects of electromagnetic fields at 38 frequenciesTeacher and school effects in NYC schoolsGrades and classroom seatingBeautiful parents have more daughtersComparing test scores across states
Something for nothing? A free lunch?
I Classical multiple comparisons worries aboutθ1 = θ2 = · · · = θ50
I Not an issue with NAEP
I Multilevel model estimates the group-level variance, decidesbased on the data how much to adjust
I Classical procedure does not learn from the data
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Message from the examples
I Classical multiple comparisons corrections don’t seem soimportant when we fit hierarchical models
I But they can be crucial for classical comparisons
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Message from the examples
I Classical multiple comparisons corrections don’t seem soimportant when we fit hierarchical models
I But they can be crucial for classical comparisons
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Message from the examples
I Classical multiple comparisons corrections don’t seem soimportant when we fit hierarchical models
I But they can be crucial for classical comparisons
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Statistical framework
I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)
I Comparisons have the form, θj − θk .
I For simplicity, suppose data come from J separate experiments
I Type S errors
I Multilevel modeling as a solution to the multiple comparisonsissue
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Statistical framework
I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)
I Comparisons have the form, θj − θk .
I For simplicity, suppose data come from J separate experiments
I Type S errors
I Multilevel modeling as a solution to the multiple comparisonsissue
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Statistical framework
I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)
I Comparisons have the form, θj − θk .
I For simplicity, suppose data come from J separate experiments
I Type S errors
I Multilevel modeling as a solution to the multiple comparisonsissue
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Statistical framework
I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)
I Comparisons have the form, θj − θk .
I For simplicity, suppose data come from J separate experiments
I Type S errors
I Multilevel modeling as a solution to the multiple comparisonsissue
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Statistical framework
I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)
I Comparisons have the form, θj − θk .
I For simplicity, suppose data come from J separate experiments
I Type S errors
I Multilevel modeling as a solution to the multiple comparisonsissue
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Statistical framework
I Goal is to estimate θj , for j = 1, . . . , J (for example, effects ofJ schools)
I Comparisons have the form, θj − θk .
I For simplicity, suppose data come from J separate experiments
I Type S errors
I Multilevel modeling as a solution to the multiple comparisonsissue
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Type S (sign) errors
I I’ve never made a Type 1 error in my lifeI Type 1 error is θj = θk , but I claim they’re differentI I’ve never studied anything where θj = θk
I I’ve never made a Type 2 error in my lifeI Type 2 error is θj 6= θk , but I claim they’re the sameI I’ve never claimed that θj = θk
I But I make errors all the time!
I Type S error: θ1 > θ2, but I claim that θ2 > θ1 (or vice versa)
I Type S errors can occur when we make claims with confidence(i.e., have confidence intervals for θj − θk that exclude zero)
I We want to make fewer claims with confidence when we areless certain about the ranking of the θj ’s
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Multilevel (hierarchical) modeling
I Key parameter: σθ, the sd of the true θj ’sI Understand through special cases:
I σθ ≈ 0: no variationI Multilevel model pools the estimated θj ’s toward each otherI “Multiple comparisons” correction is done by shrinking
comparisonsI Very few claims with confidence (far fewer than 5%)
I σθ →∞: large variationI Multilevel model is equivalent to estimating each θj separatelyI “Multiple comparisons” corrections are not needed
I Bayesian multilevel modeling bounds the Type S error rate byautomatically restricting the rate of claims with confidence
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Conclusions
I “Multiple comparisons” is a real concern, but . . .
I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider
I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence
I Learn from the data how much to adjust
I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Conclusions
I “Multiple comparisons” is a real concern, but . . .
I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider
I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence
I Learn from the data how much to adjust
I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Conclusions
I “Multiple comparisons” is a real concern, but . . .
I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider
I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence
I Learn from the data how much to adjust
I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Conclusions
I “Multiple comparisons” is a real concern, but . . .
I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider
I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence
I Learn from the data how much to adjust
I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Conclusions
I “Multiple comparisons” is a real concern, but . . .
I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider
I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence
I Learn from the data how much to adjust
I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Conclusions
I “Multiple comparisons” is a real concern, but . . .
I Don’t “fix” by altering p-values or (equivalently) by makingconfidence intervals wider
I Instead, multilevel modeling does partial pooling wherenecessary (especially when much of the variation in the datacan be explained by noise), so that few claims can be madewith confidence
I Learn from the data how much to adjust
I By comparison, classical intervals can have Type S error ratesas high as 50%: then, multiple comparisons can be a big deal!
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 1
I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)
I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.
I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 1
I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)
I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.
I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 1
I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)
I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.
I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 1
I The key step in getting everything to work is to move the coefficientestimates: not just to adjust the p-values or widen intervals, but toactually move the estimates toward each other (or, more generally, towarda group-level regression model)
I But, to be honest, multilevel modeling can be difficult for complicatedstructures. For example, analyzing 16 schools or 16 outcomes is simpleenough, but a 24 structure (e.g., 2 groups × 2 outcomes × 2 flavors oftreatment × 2 time points) is more of a research problem.
I There was some discussion of main analyses as compared to minoranalyses, with the thought that different levels of significance can be usedfor these two areas. I would prefer to use the full model for all of this andthen report all the results graphically. In no case do I think would it benecessary to do a multiple comparisons adjustment. As noted in the firstbullet point above, the multilevel inference should do fine at combiningthe information.
Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 2
I The key concern about multiple comparisons is when the noise level ishigh, so that classical Type S error rates will be high. Having manycomparisons (100, or 1000, or whatever) is not a problem at all in thecontext of multilevel modeling. This gets back to our willingness to bewrong occasionally. If we are testing 1000 comparisons, and there’s reallynothing much going on, then the multilevel analysis will do lots ofpooling, and the result is to make very few claims with confidence.
I For this presentation, I’d been thinking about the sort of examples wherethe differences are unquestionably real, and the key concern is Type Serrors: for example, saying that School A is better than School B, whenSchool B is really better than School A. But at the meeting, there waslots of discussion of examples where the differences might actually bevery close to zero: for example, comparing different educationalinterventions, none of which might be very effective. Here we would wantto be thinking about “Type M” (magnitude) errors: saying that atreatment effect is near zero when it is actually large, or saying that it’slarge when it’s near zero.Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 2
I The key concern about multiple comparisons is when the noise level ishigh, so that classical Type S error rates will be high. Having manycomparisons (100, or 1000, or whatever) is not a problem at all in thecontext of multilevel modeling. This gets back to our willingness to bewrong occasionally. If we are testing 1000 comparisons, and there’s reallynothing much going on, then the multilevel analysis will do lots ofpooling, and the result is to make very few claims with confidence.
I For this presentation, I’d been thinking about the sort of examples wherethe differences are unquestionably real, and the key concern is Type Serrors: for example, saying that School A is better than School B, whenSchool B is really better than School A. But at the meeting, there waslots of discussion of examples where the differences might actually bevery close to zero: for example, comparing different educationalinterventions, none of which might be very effective. Here we would wantto be thinking about “Type M” (magnitude) errors: saying that atreatment effect is near zero when it is actually large, or saying that it’slarge when it’s near zero.Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models
What is the multiple comparisons problem?Why don’t I (usually) care?
Some storiesStatistical framework and multilevel modeling
Message from the examplesStatistical frameworkConclusionsFurther thoughts
Further thoughts inspired by the workshop: part 2
I The key concern about multiple comparisons is when the noise level ishigh, so that classical Type S error rates will be high. Having manycomparisons (100, or 1000, or whatever) is not a problem at all in thecontext of multilevel modeling. This gets back to our willingness to bewrong occasionally. If we are testing 1000 comparisons, and there’s reallynothing much going on, then the multilevel analysis will do lots ofpooling, and the result is to make very few claims with confidence.
I For this presentation, I’d been thinking about the sort of examples wherethe differences are unquestionably real, and the key concern is Type Serrors: for example, saying that School A is better than School B, whenSchool B is really better than School A. But at the meeting, there waslots of discussion of examples where the differences might actually bevery close to zero: for example, comparing different educationalinterventions, none of which might be very effective. Here we would wantto be thinking about “Type M” (magnitude) errors: saying that atreatment effect is near zero when it is actually large, or saying that it’slarge when it’s near zero.Andrew Gelman, Jennifer Hill, and Masanao Yajima Multiple comparisons using multilevel models