Upload
stella-walker
View
213
Download
1
Tags:
Embed Size (px)
Citation preview
Making Statistics Surprising
Roger WattKelly YoungerLizzie Collins
Rebecca SkinnerFrancesca Worsnop
Idea
Knowledge
Science from the outside
Idea Result
Evidence Knowledge
Science from the inside
Idea Result
Evidence Knowledge
Science from the inside
Idea Result
Hypothesis
Design
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
Idea Result
Hypothesis
Design
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
What matters here?
Idea Result
Hypothesis
Design
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
Decisions required
What matters here?
Idea Result
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
What variables?What types of variable?What relationships between variables?
What sampling method?What deployment of sample (between/within)? What sample size?
Hypothesis
Design
Knowledge
What matters here?
Lesson
• We must make decisions– these matter
• We may have preferences– these don’t matter
The Student Journey
Idea Result
Hypothesis
Design
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
What appears to matter here to a student?
Result
Data AnalysisInference
What appears to matter here to a student? What test?t-testchi-sqrcorrelationANOVAregressionANCOVAMANOVA
How to test?FormulaeCalculationsΣ(xi-x)2
SPSSWhat columns?
Numbers….Dozens of numbersSSQF, t, pHow many sig figs?
The Student Experience
• Stats is Hard– disconnected facts– tedious arithmetic
• Stats is Disempowering– easy to make simple mistakes– myriad of details obscure concepts
• Stats is not fun– no pleasant surprises
The Main Goal: Doing stats
• Understanding:– Preserve the whole picture
• Conceptual Insight:– Full grasp of issues that matter for the outcome
• Skills:– Confident in essentials
The Plan
• Materials– Whole picture always present– Concentrate on research decisions– Remove disconnected facts
• Learning– Repeated Experience– Immediate feedback– Discovery
Idea Result
Hypothesis
Design
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
The Whole Picture
IdeaResult
Hypothesis
Design
Evidence
Data Analysis
Inference
Knowledge
Persuading
Describing
Research Decisions
Knowledge
Idea Result
Evidence
Data Analysis
Inference
Knowledge
Persuading
DescribingWhat variables?What types of variable?What relationships between variables?
What sampling method?What deployment of sample (between/within)? What sample size?
Hypothesis
Design
BrawStats
• Materials– Whole process always visible– Decisions require user input• everything else automatic
• Learning – Encourages experimenting & discovery– Every action produces a relevant graphical output• immediately
BrawStats
• Hypothesis– How many variables?– What variables?– What types of variable?
– What relationship between variables?
Variables
Variables
Variables
Logic
Variables
Logic
female male50
100
150
gender
IQ
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
Variables
Logic
Prediction
BrawStats
• Design– How to sample?– Within/Between?– How many participants?
female male50
100
150
gender
IQ
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
Variables
Logic
Prediction
female male50
100
150
gender
IQ
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
Variables
Logic
Prediction
female male50
100
150
gender
IQ
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
Variables
Logic
Prediction
Design
BrawStats
• Everything else– done for you
female male50
100
150
gender
IQ
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
Variables
Logic
Prediction
Design
female male50
100
150
gender
IQ
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
Variables
Logic
Prediction
Design
Variables
Logic
Prediction
Design
Variables
Logic
Prediction
Design
Evidence
Variables
Logic
Prediction
Design
Evidence
BrawStats
• Structure1. Whole process always visible2. Decisions require user input3. Everything else automatic
• Learning 4. Every action produces a relevant graphical
output immediately5. Encourages experimenting & discovery
1. Whole process always visible
2. Decisions require user input
3. Everything else automatic
4. Relevant graphical output immediately
5. Encourages experimenting & discovery
The Main Goal: Doing stats
• Understanding:– Preserve the whole picture
• Conceptual Insight:– Full grasp of issues that matter for the outcome
• Skills:– Confident in essentials
The Next Goal : Expected Outcomes
• Understanding:– Relationship of outcome to chance (sampling error)
• Conceptual Insight:– Strengths and weaknesses of statistical testing
(NHST)
• Skills:– Interpret statistical outcomes
The Next Goal : Expected Outcomes
• Understanding:– Relationship of outcome to chance (sampling error)
• Conceptual Insight:– Strengths and weaknesses of statistical testing
(NHST)
• Skills:– Interpret statistical outcomes
The Next Goal : Expected Outcomes
• Understanding:– Relationship of outcome to chance (sampling error)
• Conceptual Insight:– Strengths and weaknesses of statistical testing
(NHST)
• Skills:– Interpret statistical outcomes
The Next Goal: Expected Outcomes
• Understanding:– Relationship of outcome to chance (sampling error)
• Conceptual Insight:– Strengths and weaknesses of statistical testing
(NHST)
• Skills:– Interpret statistical outcomes
Consequences of the p-value distribution
H0 Correct H0 Incorrect
p<=0.05 Type I error
p>0.05 Type II error
We are locked into the type of system given by this truth table:
0.01 0.1 1.0
0.2
0.4
0.6
0.8
1
criterion p
p(Ty
pe I
erro
r)t-test independent samples (n=63100)
0.01 0.1 1.0
0.2
0.4
0.6
0.8
1
p(Ty
pe II
err
or)
Lessons
• sampling error matters• p-value – depends on sampling error– is poorly behaved
• p-values cannot be easily interpreted
The Last Goal: Exploring stats
• Understanding:– Relationship of outcome to design decisions
• Conceptual Insight:– Strengths and weaknesses of designs
• Skills:– Make optimal decisions
Knowledge
Idea Result
Evidence
Data Analysis
Inference
Knowledge
Persuading
DescribingWhat variables?What types of variable?What relationships between variables?
What sampling method?What deployment of sample (between/within)? What sample size?
Hypothesis
Design
The Basic Design Choices
• Variable Type• Between/Within• No participants• Sampling strategy
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
Hypothesis Dependent VariableIndependent Variable
IQ (Interva l )gender (Categorica l )
Mean = 100female(50%)
Std = 15male(50%)
Predicted Means
IQ
genderfemale107
male
93
The Basic Design Choices
• Variable Type• Between/Within• No participants• Sampling strategy
i o c5 c4 c3 c20
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
type of IV
Pearson correlation(n=11260) IQ
i o c5 c4 c3 c21
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
i o c5 c4 c3 c20
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
type of IV
Pearson correlation(n=18380) IQ
i o c5 c4 c3 c21
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
The Basic Design Choices
• Variable Type• Between/Within• No participants• Sampling strategy
i r0
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
repeated measures
t-test paired samples(n=10480)gender
i r1
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
i r0
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
repeated measures
t-test paired samples(n=162040)gender
i r1
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
The Basic Design Choices
• Variable Type• Between/Within• No participants• Sampling strategy
20 40 60 80 1000
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
no of participants
t-test independent samples(n=2780) gender
20 40 60 80 1001
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
20 40 60 80 1000
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
no of participants
t-test independent samples(n=18000) gender
20 40 60 80 1001
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
The Basic Design Choices
• Variable Type• Between/Within• No participants• Sampling strategy
0.2 0.4 0.6 0.80
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
independence
t-test independent samples(n=27100) gender
0.2 0.4 0.6 0.81
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
0.2 0.4 0.6 0.80
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
independence
t-test independent samples(n=13580) gender
0.2 0.4 0.6 0.81
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
The Basic Assumptions
• Normality:– skew– kurtosis
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
skew
t-test independent samples(n=8270) gender
-1 -0.5 0 0.5 11
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
skew
t-test independent samples(n=15000) gender
-1 -0.5 0 0.5 11
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
kurtosis
t-test independent samples(n=8640) gender
-1 -0.5 0 0.5 11
0.8
0.6
0.4
0.2
0
p(Ty
pe II
err
or)
-1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
p(Ty
pe I
erro
r)
kurtosis
t-test independent samples(n=8640) gender
Lessons
• early decisions matter:– interval>ordinal>categorical– no participants– sampling strategy• between/within• non-independence
• not much else matters– skew– kurtosis
The Student Experience
• Stats is Hard– disconnected facts– tedious arithmetic
• Stats is Disempowering– easy to make simple mistakes– myriad of details obscure concepts
• Stats is not fun– no pleasant surprises
The Main Goal: Doing stats
• Understanding:– Preserve the whole picture
• Conceptual Insight:– Full grasp of issues that matter for the outcome
• Skills:– Confident in essentials
The Plan
• Materials– Whole picture always present– Concentrate on research decisions– Remove disconnected facts
• Learning– Repeated Experience– Immediate feedback– Discovery
BrawStats
• Materials– Whole process always visible– Decisions require user input• everything else automatic
• Learning – Encourages experimenting & discovery– Every action produces a relevant graphical output• immediately
Lessons
• It (almost) worked– not sure why– maybe because:• no numbers/arithmetic• single coherent process• it is (??) self-explaining & self-illustrating• foraging for undocumented features