Upload
elijah-boone
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
CAUSE Webinar:Introducing Math Majors to Statistics
Allan Rossman and Beth Chance
Cal Poly – San Luis Obispo
April 8, 2008
April 8, 2008 CAUSE Webinar 2
Outline
Goals Guiding principles Content of an example course Assessment Examples (four)
April 8, 2008 CAUSE Webinar 3
Goals
Redesign introductory statistics course for mathematically inclined students in order to: Provide balanced introduction to the practice
of statistics at appropriate mathematical level Better alternative than “Stat 101” or “Math
Stat” sequence for math majors’ first statistics course
April 8, 2008 CAUSE Webinar 4
Guiding principles (Overview)1. Put students in role of active investigator
2. Motivate with real studies, genuine data
3. Repeatedly experience entire statistical process from data collection to conclusion
4. Emphasize connections among study design, inference technique, scope of conclusions
5. Use variety of computational tools
6. Investigate mathematical underpinnings
7. Introduce probability “just in time”
April 8, 2008 CAUSE Webinar 5
Principle 1: Active investigator Curricular materials consist of investigations
that lead students to discover statistical concepts and methods Students learn through constructing own
knowledge, developing own understanding Need direction, guidance to do that
Students spend class time engaged with these materials, working collaboratively, with technology close at hand
April 8, 2008 CAUSE Webinar 6
Principle 2: Real studies, genuine data Almost all investigations focus on a recent
scientific study, existing data set, or student collected data Statistics as a science Frequent discussions of data collection issues
and cautions Wide variety of contexts, research questions
April 8, 2008 CAUSE Webinar 7
Real studies, genuine data
Popcorn and lung cancer Historical smoking studies Night lights and myopia Effect of observer with
vested interest Kissing the right way Do pets resemble their
owners Who uses shared armrest Halloween treats Heart transplant mortality
Lasting effects of sleep deprivation
Sleep deprivation and car crashes
Fan cost index Drive for show, putt for
dough Spock legal trial Hiring discrimination Comparison shopping Computational linguistics
April 8, 2008 CAUSE Webinar 8
Principle 3: Entire statistical process First two weeks:
Data collection Observation vs. experiment (Confounding, random assignment vs.
random sampling, bias) Descriptive analysis
Segmented bar graph Conditional proportions, relative risk, odds ratio
Inference Simulating randomization test for p-value, significance Hypergeometric distribution, Fisher’s exact test
Repeat, repeat, repeat, … Random assignment dotplots/boxplots/means/medians
randomization test Sampling bar graph binomial normal approximation
April 8, 2008 CAUSE Webinar 9
Principle 4: Emphasize connections Emphasize connections among study design,
inference technique, scope of conclusions Appropriate inference technique determined by
randomness in data collection process Simulation of randomization test (e.g., hypergeometric) Repeated sampling from population (e.g., binomial)
Appropriate scope of conclusion also determined by randomness in data collection process Causation Generalizability
April 8, 2008 CAUSE Webinar 10
Principle 5: Variety of computational tools For analyzing data, exploring statistical concepts Assume that students have frequent access to
computing Not necessarily every class meeting in computer lab
Choose right tool for task at hand Analyzing data: statistics package (e.g., Minitab) Exploring concepts: Applets (interactivity,
visualization) Immediate updating of calculations: spreadsheet
(Excel)
April 8, 2008 CAUSE Webinar 11
Principle 6: Mathematical underpinnings Primary distinction from “Stat 101” course
Some use of calculus but not much Assume some mathematical sophistication
E.g., function, summation, logarithm, optimization, proof Often occurs as follow-up homework exercises
Examples Counting rules for probability
Hypergeometric, binomial distributions Principle of least squares, derivatives to find minimum
Univariate as well as bivariate setting Margin-of-error as function of sample size, population
parameters, confidence level
April 8, 2008 CAUSE Webinar 12
Principle 7: Probability “just in time” Whither probability?
Not the primary goal Studied as needed to address statistical issues Often introduced through simulation
Tactile and then computer-based Addressing “how often would this happen by chance?”
Examples Hypergeometric distribution: Fisher’s exact test for 2×2
table Binomial distribution: Sampling from random process Continuous probability models as approximations
April 8, 2008 CAUSE Webinar 13
Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6
Data Collection Observation vs. experiment, confounding, randomization
Random sampling, bias, precision, nonsampling errors
Paired data Independent random samples
Bivariate
Descriptive Statistics
Conditional proportions, segmented bar graphs, odds ratio
Quantitative summaries, transformations, z-scores, resistance
Bar graph Models, Probability plots, trimmed mean
Scatterplots, correlation, simple linear regression
Probability Counting, random variable, expected value
empirical rule Bermoulli processes, rules for variances, expected value
Normal, Central Limit Theorem
Sampling/ Randomization Distribution
Randomization distribution for
Randomization distribution for
Sampling distribution for X,
Large sample sampling distributions for
,
Sampling distributions of , OR,
Chi-square statistic, F statistic, regression coefficients
Model Hypergeometric Binomial Normal, t Normal, t, log-normal
Chi-square, F, t
Statistical Inference
p-value, significance, Fisher’s Exact Test
p-value, significance, effect of variability
Binomial tests and intervals, two-sided p-values, type I/II errors
z-procedures for proportions t-procedures, robustness, bootstrapping
Two-sample z- and t-procedures, bootstrap, CI for OR
Chi-square for homogeneity, independence, ANOVA, regression
21 ˆˆ pp 21 xx p̂
x p̂21 ˆˆ pp
21 xx
Content of Example Course (ISCAM)
Assessments
Investigations with summaries of conclusions Worked out examples Practice problems
Quick practice, opportunity for immediate feedback, adjustment to class discussion
Homework exercises Technology explorations (labs)
e.g., comparison of sampling variability with stratified sampling vs. simple random sampling
Student projects Student-generated research questions, data collection
plans, implementation, data analyses, report
April 8, 2008 CAUSE Webinar 14
April 8, 2008 CAUSE Webinar 15
Example 1: Friendly Observers Psychology experiment
Butler and Baumeister (1998) studied the effect of observer with vested interest on skilled performance
A: vested interest
B: no vested interest
Total
Beat threshold
3 8 11
Do not beat threshold
9 4 13
Total 12 12 24
How often would such an extreme experimental difference occur by chance, if there was no vested interest effect?
667.ˆ
250.ˆ
B
A
p
p
April 8, 2008 CAUSE Webinar 16
Example 1: Friendly Observers Students investigate this question through
Hands-on simulation (playing cards) Computer simulation (Java applet) Mathematical model
counting techniques
0498.
12
24
12
13
0
11
11
13
1
11
10
13
2
11
9
13
3
11
)3(
XPvaluep
April 8, 2008 CAUSE Webinar 17
Example 1: Friendly Observers Focus on statistical process
Data collection, descriptive statistics, inferential analysis Arising from genuine research study
Connection between the randomization in the design and the inference procedure used
Scope of conclusions depends on study design Cause/effect inference is valid
Use of simulation motivates the derivation of the mathematical probability model Investigate/answer real research questions in first two weeks
April 8, 2008 CAUSE Webinar 18
Example 2: Sleep Deprivation Physiology Experiment
Stickgold, James, and Hobson (2000) studied the long-term effects of sleep deprivation on a visual discrimination task
sleep condition n Mean StDev Median IQR deprived 11 3.90 12.17 4.50 20.7unrestricted 10 19.82 14.73 16.55 19.53
How often would such an extreme experimental difference occur by chance, if there was no sleep deprivation effect?
(3 days later!)
April 8, 2008 CAUSE Webinar 19
Example 2: Sleep Deprivation Students investigate this question through
Hands-on simulation (index cards) Computer simulation (Minitab) Mathematical model
p-value=.0072
15.92
p-value .002
April 8, 2008 CAUSE Webinar 20
Example 2: Sleep Deprivation Experience the entire statistical process
again Develop deeper understanding of key ideas
(randomization, significance, p-value) Tools change, but reasoning remains same
Tools based on research study, question – not for their own sake
Simulation as a problem solving tool Empirical vs. exact p-values
Example 3: Infants’ Social Evaluation Sociology study
Hamlin, Wynn, Bloom (2007) investigated whether infants would prefer a toy showing “helpful” behavior to a toy showing “hindering” behavior
Infants were shown a video with these two kinds of toys, then asked to select one
14 of 16 10-month-olds selected helper Is this result surprising enough (under null model of
no preference) to indicate a genuine preference for the helper toy?
Example 3: Infants’ Social Evaluation Simulate with coin flipping Then simulate with applet
Example 3: Infants’ Social Evaluation Then learn binomial distribution, calculate exact p-
value
0021.
5.15.16
165.15.
15
165.15.
14
16
)14(
016115214
XPvaluep
0.20
0.15
0.10
0.05
0.00
X = number who choose helper toy
Pro
bability
14
0.00209
2
Distribution PlotBinomial, n=16, p=0.5
Example 3: Infants’ Social Evaluation Learn probability distribution to answer inference
question from research study Again the analysis is completed with
Tactile simulation Technology simulation Mathematical model
Modeling process of statistical investigation Examination of methodology, further questions in study
Follow-ups Different number of successes Different sample size
April 8, 2008 CAUSE Webinar 25
Example 4: Sleepless Drivers
Sociology case-control study Connor et al (2002) investigated whether those in
recent car accidents had been more sleep deprived than a control group of drivers
No full night’s sleep in past week
At least one full night’s sleep in
past week
Sample sizes
“case” drivers (crash) 61 510 571
“control” drivers (no crash) 44 544 588
April 8, 2008 CAUSE Webinar 26
Example 4: Sleepless DriversSample proportion that were in a car crash
Sleep deprived: .581Not sleep deprived: .484
Odds ratio: 1.48
How often would such an extreme observed odds ratio occur by chance, if there was no sleep deprivation effect?
0%10%20%30%40%50%60%70%80%90%
100%
No full night’s sleep in pastweek
At least one full night’ssleep in past week
no crash
crash
April 8, 2008 CAUSE Webinar 27
Example 4: Sleepless Drivers
Students investigate this question through Computer simulation (Minitab)
Empirical sampling distribution of odds-ratio Empirical p-value
Approximate mathematical model
1.48
April 8, 2008 CAUSE Webinar 28
Example 4: Sleepless Drivers
SE(log-odds) =
Confidence interval for population log odds: sample log-odds + z* SE(log-odds) Back-transformation
90% CI for odds ratio: 1.05 – 2.08
dcba
1111
April 8, 2008 CAUSE Webinar 29
Example 4: Sleepless Drivers
Students understand process through which they can investigate statistical ideas
Students piece together powerful statistical tools learned throughout the course to derive new (to them) procedures Concepts, applications, methods, theory
April 8, 2008 CAUSE Webinar 30
For more information
Investigating Statistical Concepts, Applications, and Methods (ISCAM), Cengage Learning, www.cengage.com
Instructor resources: www.rossmanchance.com/iscam/ Solutions to investigations, practice problems,
homework exercises Instructor’s guide Sample syllabi Sample exams