Nathan Tintle Dordt College Sioux Center, Iowa

Experiences with a new textbook emphasizing randomization/

simulation: Results from early implementation

Nathan TintleDordt College

Sioux Center, Iowa

We’ve been sitting too long!

Why change? (pre-2009) Hold your breath

Taking the plunge… (fall 2009) Show me the data

Data on fall 2009 implementation (pre-, post- and 4-month retention)

Are we there yet? Joining forces, major changes and refinements (Summer 2010-

present) Are we there yet?

Open questions…(fall 2012 and beyond) When are we going to get there (and where is there!)?

This and other common questions we’ve heard….

Outline

Consensus curriculum

Desc stats, design, prob and samp dist, inference Using a TI Lecture and lab separated Feeling like we were teaching algebra and

memorization of rules and not statistical thinking

Had heard George’s talk from 2005 Permutation tests more and more in research

Background

…so hold your breath!

Decided to throw out the old curriculum completely and start from scratch. Hard to “sprinkle in,” Unclear how we would “incrementally change”

Pilot in spring 2009 (loosely based on Rossman-Chance-Cobb-Holcomb modules)

Faculty training workshop in may 2009 Flesh out more in summer 2009 for use in fall 2009

Todd Swanson, Jill VanderStoep and I

Taking the plunge…

Unit #1 (4-5 weeks). Introduction to inference Chapter 1. Inference with a single proportion

(simulation only) Chapter 2. Inference comparing two

proportions (randomization test only) Chapter 3. Inference comparing two means

(randomization test only) Chapter 4. Inference for correlation and

regression (randomization test only)

Details of fall 2009 implementation

Unit #2 (8-9 weeks). Revisiting inference:

theory-based approaches, confidence intervals and power

Chapter 5: Correlation and regression: revisited

Chapter 6: comparing two and more means: revisited

Chapter 7: comparing two or more proportions: revisited

Details of fall 2009 implementation

Start with inference- Simulation and Randomization

first; Theory-based approaches later (revisit) Topics based chapters (Malone, et al. argued for

this)- Descriptive statistics are just in time Probability and sampling distributions introduced

intuitively; less formally Student projects (more, earlier) Pedagogy (active learning inextricably linked with

simulation/randomization) Case studies/research articles: Real data that

matters

Key features of fall 2009

All instructors liked better, felt like students learn

more/better the before, student attitudes seemed better Used CAOS as pre-post test and 4 month follow-up

Full results are published vs. CAOS fall 2007 Pre-post: Tintle NL et al. “Development and assessment of a

preliminary randomization-based introductory statistics curriculum. Journal of Statistics Education. March 2011.

4 month retention: Tintle NL, et al. “Retention of statistical concepts in a preliminary randomization based introductory statistics curriculum” Statistics Education Research Journal. May 2012.

Assessment from fall 2009

Pre-post changes

6 items significantly better (p≤0.001) fall 2009 (pre-post) as compared to fall 2007 (at Hope) (p-values and design)

1 items significantly worse (standard deviation/histogram) 33 items n/s

4-month retention Average of 48% loss of knowledge gained 4-months post

course (traditional curriculum) Average of 6% loss of knowledge gained 4-months post-

course (randomization curriculum)

Assessment highlights from fall 2009

Joined forces with Beth Chance, George Cobb,

Allan Rossman and Soma Roy to produce an introductory statistics textbook

Many major and minor changes to address assessment results, teaching experience, etc. from fall 2009 (as well as experience over last two years of teaching)

Summer 2010

Chapter 0: A few basics (statistical method, desc.

stats and probability as long-run frequency) (~1 week)

Unit 1: LOGIC AND SCOPE OF INFERENCE (5-6 weeks) Chapter 1: Simulation and theory-based inferential

approach for single proportion (SIGNIFICANCE) Chapter 2: Estimation using plausible values, ± 2SD,

theory-based approach (ESTIMATION) Chapter 3: Drawing conclusions from population to

sample (GENERALIZABILITY) Chapter 4: Association and causation (CAUSATION)

Course overview

Unit 2: COMPARING TWO GROUPS (5-6 weeks

weeks) Chapter 5: Comparing two proportions—

randomization and theory-based Chapter 6: Comparing two averages—

randomization and theory-based Chapter 7: Matched pairs and single mean—

randomization and theory-based

Course overview

Unit 3: ANALYZING MORE GENERAL SITUATIONS

(3-4 weeks) Chapter 8: Comparing more than two groups on

categorical response Chapter 9: Comparing more than two groups

with a quantitative response Chapter 10: Correlation and regression

**Note: Chapters 7-10 can be done in any order**

Course overview

Efficiency with randomization/simulation and

theory-based done simultaneously

Logic and Scope of inference (Significance, Estimation, Generalizability and Causation)

Easier validity conditions for theory-based tests

Much more…3-S process for assessing statistical significance, 7-step method, approach on CIs,…

Changes/major decisions

Assessment 2011/2012

Portability Dordt similar results to Hope, showing

improvement over time as we make tweaks Continues to be good Exploring alternative assessment

tests/questions more tailored to our learning goals (e.g., GOALS, MOST, Garfield et al.)

Chapter 9. Comparing multiple group means on a quantitative

response

Exploration 9.1: Exercise and Brain volume in the elderly (data from Mortimer et al. 2012)

Brain size typically declines. Shrinkage may be linked to dementia.

Randomized experiment with 4 groups: (a) Tai Chi (b) Walking (c) Social Interaction and (d) Control

40 weeks: Measure percent change in brain size

An example

Class testing fall 2012 (seven other

institutions, plus our own three) Anticipate Wiley published book within 18

months (or so) Draft materials prepared now Contact if interested in learning more:

[email protected] or http://math.hope.edu/isi for more details

Where are we now

mailto:[email protected]

http://math.hope.edu/isi

Debate over

Bootstrapping vs. randomization vs. simulation How to handle confidence intervals Order Etc.

Where is there?

Are we there yet?

Q: How can I convince client departments?

Hard if you don’t do theory-based approaches any longer If you say “We’re still doing the test that you care about,

but we’re giving students a better scaffolding to understand what that test is doing…” I haven’t heard ANY client department respond negatively to

this rationale

Q: Ok, so how about my math colleagues? Harder. They like the MATH part of statistics, and that’s

what we’re arguing to do less of to encourage STATISTICAL thinking.

What is the goal of your course? Not algebra, not probability, not calculus, not mathematical

thinking…

Common questions we’ve heard

Q: Sounds great, sounds like LOTS of work!

Can I do this? Be willing to get out of the boat! New things are never completely smooth I know of no one (yet!) who has done this and

wants to go back Utilize experienced instructor resources

We have some and are working on more Need for more…


Randomization is doable as intro course

without alienating client departments (theory-based tests)

Shows promise (initial assessment data; feedback, etc. is positive)

More research needed to pinpoint the causes of improved assessment data and accepted “best-practices” (what matters and what doesn’t) for a randomization-approach

Conclusions

Collaborators:

Hope College: Todd Swanson and Jill VanderStoep

Cal Poly: Beth Chance, Allan Rossman and Soma Roy

Mt. Holyoke: George Cobb Testers: Numerous other faculty and many,

many students Funding: NSF (DUE-1140629)

Acknowledgment s

Item Description (Topic)

Cohort1

% of Students Correct

Pretest

Posttest

Difference

Understanding of the purpose of randomization in an experiment (Data collection and design)

NT 8.5 12.3 3.8HT 4.6 9.7 5.1HR

3.5 20.8 17.3 Understanding that low p-values are desirable in research studies (Tests of significance)

NT 49.9 68.5 18.6HT 56.9 85.6 28.7HR 56.9 96.0 39.1

Significantly better


Cohort1


Pretest

Posttest

Difference

Understanding that no statistical significance does not guarantee that there is no effect (Tests of significance)

NT 63.1 64.4 1.3HT 66.2 72.7 6.5HR 65.2 85.1 19.9

Ability to recognize a correct interpretation of a p-value (Tests of significance)

NT 46.8 54.5 7.7

HT 36.1 41.0 4.9

HR 42.3 60.0 17.7



Cohort1


Pretest

Posttest

Difference

Ability to recognize an incorrect interpretation of a p-value. Specifically, probability that a treatment is not effective. (Tests of significance)

NT 53.1 58.6 5.5HT 59.8 68.6 8.8HR 58.9 79.7 20.8

Understanding of how to simulate data to find the probability of an observed value (Probability)

NT 20.4 19.5 -0.9HT 20.0 20.0 0.0HR 20.0 32.2 12.2


Significantly better Item Description (Topic)

Cohort1


Pretest

Posttest

Difference

Ability to recognize an incorrect interpretation of a p-value. Specifically, probability that a treatment is not effective. (Tests of significance)

NT 53.1 58.6 5.5HT 59.8 68.6 8.8HR 58.9 79.7 20.8

Understanding of how to simulate data to find the probability of an observed value (Probability)

NT 20.4 19.5 -0.9HT 20.0 20.0 0.0HR 20.0 32.2 12.2

Significantly worse


Cohort1


Pretest

Posttest

Difference

Ability to correctly estimate and compare standard deviations for different histograms. (Descriptive statistics)

NT 34.3 51.7 17.4HT 44.8 70.8 26.0HR 36.3 48.5 12.2

CAOS scores 4 months post-course (fall 2007 vs.

fall 2009 students)

Significantly better retention in fall 2009 sample (p=0.002)

Retention

Average change in percent correctPosttest Mean minus Prettest

Mean(SD)

4-month Retention Mean minus

PosttestMean(SD)

Fall 2007 10.92 -5.28(9.5) (10.1)

Fall 2009 10.04 -0.61(12.3) (8.3)

Topic

Cohort

Average score on Topics

Pretest Posttest

4-month retention

Change

Data collection and design

Randomized 31.58 43.09 41.12 -1.97

Consensus41.02 47.44 34.94 -12.50

Tests of Significance

Randomized51.54 71.27 72.37 1.10

Consensus51.51 67.31 64.31 -2.95

Best retention areas

What do students prefer—simulation/randomization

or theory-based?

Depends on how you present it---students will take their cue from you My preference:

A. Simulation requires computational power, didn’t have that until recently

B. Theory based was the historical answer because it predicts what would have happened how you simulated

C. Theory based use mathematical theory to give good approximation of simulation distribution under certain validity conditions


Ongoing questions for debate (fall 2012 and beyond)

Confidence intervals approach—does it matter? Plausible values? Bootstrap? Theory only?

Study design and simulation approach Disconnect Connect

Re-randomize= randomized experiment Bootstrap=random sample

Are we there yet?

Q: Does this work?

Assessment data Content objectives Attitudes

More and more people doing this and saying it works! At least 15 faculty with our materials, ~10 new

faculty this fall, Lock’s, CATALST, NCSU, UCLA, STATCRUNCH…and

more!


Q: What makes the difference? Simulation?

Randomization? Pedagogy? Talking about inference for 16 weeks instead of 5? We don’t know…yet.. Pedagogy and these changes are inextricably

linked On Sunday at the modeling session one of the

panelists said “How could you teach this without hands on activities and technology?”


Documents

Nathan Tintle Dordt College Sioux Center, Iowa