Upload
thane
View
22
Download
4
Embed Size (px)
DESCRIPTION
Experiences with a new textbook emphasizing randomization / simulation : Results from early implementation. Nathan Tintle Dordt College Sioux Center, Iowa. Outline. We’ve been sitting too long! Why change? (pre-2009) Hold your breath Taking the plunge… (fall 2009) Show me the data - PowerPoint PPT Presentation
Citation preview
Experiences with a new textbook emphasizing randomization/
simulation: Results from early implementation
Nathan TintleDordt College
Sioux Center, Iowa
We’ve been sitting too long!
Why change? (pre-2009) Hold your breath
Taking the plunge… (fall 2009) Show me the data
Data on fall 2009 implementation (pre-, post- and 4-month retention)
Are we there yet? Joining forces, major changes and refinements (Summer 2010-
present) Are we there yet?
Open questions…(fall 2012 and beyond) When are we going to get there (and where is there!)?
This and other common questions we’ve heard….
Outline
Consensus curriculum
Desc stats, design, prob and samp dist, inference Using a TI Lecture and lab separated Feeling like we were teaching algebra and
memorization of rules and not statistical thinking
Had heard George’s talk from 2005 Permutation tests more and more in research
Background
…so hold your breath!
Decided to throw out the old curriculum completely and start from scratch. Hard to “sprinkle in,” Unclear how we would “incrementally change”
Pilot in spring 2009 (loosely based on Rossman-Chance-Cobb-Holcomb modules)
Faculty training workshop in may 2009 Flesh out more in summer 2009 for use in fall 2009
Todd Swanson, Jill VanderStoep and I
Taking the plunge…
Unit #1 (4-5 weeks). Introduction to inference Chapter 1. Inference with a single proportion
(simulation only) Chapter 2. Inference comparing two
proportions (randomization test only) Chapter 3. Inference comparing two means
(randomization test only) Chapter 4. Inference for correlation and
regression (randomization test only)
Details of fall 2009 implementation
Unit #2 (8-9 weeks). Revisiting inference:
theory-based approaches, confidence intervals and power
Chapter 5: Correlation and regression: revisited
Chapter 6: comparing two and more means: revisited
Chapter 7: comparing two or more proportions: revisited
Details of fall 2009 implementation
Start with inference- Simulation and Randomization
first; Theory-based approaches later (revisit) Topics based chapters (Malone, et al. argued for
this)- Descriptive statistics are just in time Probability and sampling distributions introduced
intuitively; less formally Student projects (more, earlier) Pedagogy (active learning inextricably linked with
simulation/randomization) Case studies/research articles: Real data that
matters
Key features of fall 2009
All instructors liked better, felt like students learn
more/better the before, student attitudes seemed better Used CAOS as pre-post test and 4 month follow-up
Full results are published vs. CAOS fall 2007 Pre-post: Tintle NL et al. “Development and assessment of a
preliminary randomization-based introductory statistics curriculum. Journal of Statistics Education. March 2011.
4 month retention: Tintle NL, et al. “Retention of statistical concepts in a preliminary randomization based introductory statistics curriculum” Statistics Education Research Journal. May 2012.
Assessment from fall 2009
Pre-post changes
6 items significantly better (p≤0.001) fall 2009 (pre-post) as compared to fall 2007 (at Hope) (p-values and design)
1 items significantly worse (standard deviation/histogram) 33 items n/s
4-month retention Average of 48% loss of knowledge gained 4-months post
course (traditional curriculum) Average of 6% loss of knowledge gained 4-months post-
course (randomization curriculum)
Assessment highlights from fall 2009
Joined forces with Beth Chance, George Cobb,
Allan Rossman and Soma Roy to produce an introductory statistics textbook
Many major and minor changes to address assessment results, teaching experience, etc. from fall 2009 (as well as experience over last two years of teaching)
Summer 2010
Chapter 0: A few basics (statistical method, desc.
stats and probability as long-run frequency) (~1 week)
Unit 1: LOGIC AND SCOPE OF INFERENCE (5-6 weeks) Chapter 1: Simulation and theory-based inferential
approach for single proportion (SIGNIFICANCE) Chapter 2: Estimation using plausible values, ± 2SD,
theory-based approach (ESTIMATION) Chapter 3: Drawing conclusions from population to
sample (GENERALIZABILITY) Chapter 4: Association and causation (CAUSATION)
Course overview
Unit 2: COMPARING TWO GROUPS (5-6 weeks
weeks) Chapter 5: Comparing two proportions—
randomization and theory-based Chapter 6: Comparing two averages—
randomization and theory-based Chapter 7: Matched pairs and single mean—
randomization and theory-based
Course overview
Unit 3: ANALYZING MORE GENERAL SITUATIONS
(3-4 weeks) Chapter 8: Comparing more than two groups on
categorical response Chapter 9: Comparing more than two groups
with a quantitative response Chapter 10: Correlation and regression
**Note: Chapters 7-10 can be done in any order**
Course overview
Efficiency with randomization/simulation and
theory-based done simultaneously
Logic and Scope of inference (Significance, Estimation, Generalizability and Causation)
Easier validity conditions for theory-based tests
Much more…3-S process for assessing statistical significance, 7-step method, approach on CIs,…
Changes/major decisions
Assessment 2011/2012
Portability Dordt similar results to Hope, showing
improvement over time as we make tweaks Continues to be good Exploring alternative assessment
tests/questions more tailored to our learning goals (e.g., GOALS, MOST, Garfield et al.)
Chapter 9. Comparing multiple group means on a quantitative
response
Exploration 9.1: Exercise and Brain volume in the elderly (data from Mortimer et al. 2012)
Brain size typically declines. Shrinkage may be linked to dementia.
Randomized experiment with 4 groups: (a) Tai Chi (b) Walking (c) Social Interaction and (d) Control
40 weeks: Measure percent change in brain size
An example
Class testing fall 2012 (seven other
institutions, plus our own three) Anticipate Wiley published book within 18
months (or so) Draft materials prepared now Contact if interested in learning more:
[email protected] or http://math.hope.edu/isi for more details
Where are we now
Debate over
Bootstrapping vs. randomization vs. simulation How to handle confidence intervals Order Etc.
Where is there?
Are we there yet?
Q: How can I convince client departments?
Hard if you don’t do theory-based approaches any longer If you say “We’re still doing the test that you care about,
but we’re giving students a better scaffolding to understand what that test is doing…” I haven’t heard ANY client department respond negatively to
this rationale
Q: Ok, so how about my math colleagues? Harder. They like the MATH part of statistics, and that’s
what we’re arguing to do less of to encourage STATISTICAL thinking.
What is the goal of your course? Not algebra, not probability, not calculus, not mathematical
thinking…
Common questions we’ve heard
Q: Sounds great, sounds like LOTS of work!
Can I do this? Be willing to get out of the boat! New things are never completely smooth I know of no one (yet!) who has done this and
wants to go back Utilize experienced instructor resources
We have some and are working on more Need for more…
Common questions we’ve heard
Randomization is doable as intro course
without alienating client departments (theory-based tests)
Shows promise (initial assessment data; feedback, etc. is positive)
More research needed to pinpoint the causes of improved assessment data and accepted “best-practices” (what matters and what doesn’t) for a randomization-approach
Conclusions
Collaborators:
Hope College: Todd Swanson and Jill VanderStoep
Cal Poly: Beth Chance, Allan Rossman and Soma Roy
Mt. Holyoke: George Cobb Testers: Numerous other faculty and many,
many students Funding: NSF (DUE-1140629)
Acknowledgment s
Item Description (Topic)
Cohort1
% of Students Correct
Pretest
Posttest
Difference
Understanding of the purpose of randomization in an experiment (Data collection and design)
NT 8.5 12.3 3.8HT 4.6 9.7 5.1HR
3.5 20.8 17.3 Understanding that low p-values are desirable in research studies (Tests of significance)
NT 49.9 68.5 18.6HT 56.9 85.6 28.7HR 56.9 96.0 39.1
Significantly better
Item Description (Topic)
Cohort1
% of Students Correct
Pretest
Posttest
Difference
Understanding that no statistical significance does not guarantee that there is no effect (Tests of significance)
NT 63.1 64.4 1.3HT 66.2 72.7 6.5HR 65.2 85.1 19.9
Ability to recognize a correct interpretation of a p-value (Tests of significance)
NT 46.8 54.5 7.7
HT 36.1 41.0 4.9
HR 42.3 60.0 17.7
Significantly better
Item Description (Topic)
Cohort1
% of Students Correct
Pretest
Posttest
Difference
Ability to recognize an incorrect interpretation of a p-value. Specifically, probability that a treatment is not effective. (Tests of significance)
NT 53.1 58.6 5.5HT 59.8 68.6 8.8HR 58.9 79.7 20.8
Understanding of how to simulate data to find the probability of an observed value (Probability)
NT 20.4 19.5 -0.9HT 20.0 20.0 0.0HR 20.0 32.2 12.2
Significantly better
Significantly better Item Description (Topic)
Cohort1
% of Students Correct
Pretest
Posttest
Difference
Ability to recognize an incorrect interpretation of a p-value. Specifically, probability that a treatment is not effective. (Tests of significance)
NT 53.1 58.6 5.5HT 59.8 68.6 8.8HR 58.9 79.7 20.8
Understanding of how to simulate data to find the probability of an observed value (Probability)
NT 20.4 19.5 -0.9HT 20.0 20.0 0.0HR 20.0 32.2 12.2
Significantly worse
Item Description (Topic)
Cohort1
% of Students Correct
Pretest
Posttest
Difference
Ability to correctly estimate and compare standard deviations for different histograms. (Descriptive statistics)
NT 34.3 51.7 17.4HT 44.8 70.8 26.0HR 36.3 48.5 12.2
CAOS scores 4 months post-course (fall 2007 vs.
fall 2009 students)
Significantly better retention in fall 2009 sample (p=0.002)
Retention
Average change in percent correctPosttest Mean minus Prettest
Mean(SD)
4-month Retention Mean minus
PosttestMean(SD)
Fall 2007 10.92 -5.28(9.5) (10.1)
Fall 2009 10.04 -0.61(12.3) (8.3)
Topic
Cohort
Average score on Topics
Pretest Posttest
4-month retention
Change
Data collection and design
Randomized 31.58 43.09 41.12 -1.97
Consensus41.02 47.44 34.94 -12.50
Tests of Significance
Randomized51.54 71.27 72.37 1.10
Consensus51.51 67.31 64.31 -2.95
Best retention areas
What do students prefer—simulation/randomization
or theory-based?
Depends on how you present it---students will take their cue from you My preference:
A. Simulation requires computational power, didn’t have that until recently
B. Theory based was the historical answer because it predicts what would have happened how you simulated
C. Theory based use mathematical theory to give good approximation of simulation distribution under certain validity conditions
Common questions we’ve heard
Ongoing questions for debate (fall 2012 and beyond)
Confidence intervals approach—does it matter? Plausible values? Bootstrap? Theory only?
Study design and simulation approach Disconnect Connect
Re-randomize= randomized experiment Bootstrap=random sample
Are we there yet?
Q: Does this work?
Assessment data Content objectives Attitudes
More and more people doing this and saying it works! At least 15 faculty with our materials, ~10 new
faculty this fall, Lock’s, CATALST, NCSU, UCLA, STATCRUNCH…and
more!
Common questions we’ve heard
Q: What makes the difference? Simulation?
Randomization? Pedagogy? Talking about inference for 16 weeks instead of 5? We don’t know…yet.. Pedagogy and these changes are inextricably
linked On Sunday at the modeling session one of the
panelists said “How could you teach this without hands on activities and technology?”
Common questions we’ve heard