30
Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Data Analysis and the

Shackles of Statistical Tradition

Larry Weldon

Statistics and Actuarial Science

SFU

Page 2: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Why change is needed?

• Computer revolution– Calculation revolution (1960 +)– Communication revolution (1980 +)– Data Storage expansion (2000 +)

• Inexpensive Statistical Software– Open source (e.g. R, Excel, …)

Page 3: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Some Authoritative Opinions

Jon Kettenring, 1997, ASA Pres

“The question … is whether the 21st century statistics discipline should be equated so strongly to the traditional core topics as they are now.”

“A very limited view of statistics is that it is practiced by statisticians. … The wide view has far greater promise of a widespread influence of the intellectual content of the field of data science.” W.S. Cleveland

(1993)

Page 4: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

To come …

• Examples of anachronisms of traditional parametric inference

• Use of parametric models for simulation

• Limitations of traditional stats theory

• Suggestions for broader toolkit

Page 5: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Major Implications?

• Less need for parametric fits & inference

• More use of simulation, resampling and graphics

• More use for communication of results to non-specialists

• Re-examination of traditional approach

Page 6: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Ex 1: A time series

Polynom Model?

Arma Model?

Page 7: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Ex 1: A time series

Non-parSmoothe.g. Loess

Page 8: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

• Being exactly right, on average!

• Better to be a close often?

• E.G. Estimation of 2 MMSE estimator?

Ex 2. Unbiasedness Criterion

Page 9: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Normal Model

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.€

ˆ s 2 = (x i

1

n

∑ − x )2 /(n + k)

Page 10: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Expo Model

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.€

ˆ s 2 = (x i

1

n

∑ − x )2 /(n + k)

Page 11: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

MMSE Estimator?

• Does MSE really tell us what we want to know about our estimator of VARiance?

• What is distribution of signed error of estimate of VAR?

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 12: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Typical Error or Whole Dist’n?

• MSE measures typical error.

• Distribution of error is more informative & easy to report.

• Whole distributions often do not need parametric summary! Use Graph.

Page 13: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Ex 3. Does Variance measure Variation?

• E.g. Variance of Yield in Bushels Squared?

Page 14: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Analysis of Variance: SST=SSR+SSE

How does it compare with

Analysis of SD ?

Is R-squared a ratio of useful units?

Is “64% of variance”

as useful as

“80% of SD”?

Page 15: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Anova Table

• DF Sum Sq Mean Sq F value Pr(>F) • block 5 343.29 68.66 4.4467 0.015939 * • N 1 189.28 189.28 12.2587 0.004372 **• P 1 8.40 8.40 0.5441 0.474904 • K 1 95.20 95.20 6.1657 0.028795 * • N:P 1 21.28 21.28 1.3783 0.263165 • N:K 1 33.14 33.14 2.1460 0.168648 • P:K 1 0.48 0.48 0.0312 0.862752 • Residuals 12 185.29 15.44

Enough?

Page 16: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Analysis of Variance?

• Data analysts need to know squared units are weird!

• Arithmetic simplicity does not justify descriptive complexity

Page 17: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Ex 4: Are P-values useful?

• Irrelevant except in marginal cases

• Ambiguous in marginal cases

• Fixed error rate - not useful– arbitrary for decision making – arbitrary for scientific exploration

• A measure of credibility of H0 (needed?)

Page 18: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

P-value and Power

• Need fixed alpha to compute power?

• How do we decide on sample size if not fixed alpha?

• Anticipate precision relative to the feature of interest

Page 19: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Ex. 5 Role of Simple Parametric Models?

For simulation of complex systems

e.g. – Stock market– Weather– Environmental degradation– Aging phenomena (Survival)– Queues– Traffic– Etc.

Go to R

Page 20: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Common Sense?

• How does it fit with stat culture? • Stat as the tool of Inference Police.

– Never assume something is simple– Never jump to conclusions– Never assume naive thinking will help

• Are students afraid to use their own “common sense”?

• Important Role: Stat as Discovery Tools

Page 21: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Enlightened Common Sense?

• Know the dangers

• Use informed judgment

• Do not expect “objective” analysis!

• Information extraction from data is a Subjective process

Page 22: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Classical Inference?

• Tests of Hypothesis?• Confidence Intervals?• Parametric Inference?

• Difficult to explain to non-statisticians• Unsuccessful in portraying what statisticians

can do• Maybe we rely to much on these data tools

Page 23: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

What is more useful?

• Graphs– For data analysis– For data summary– For result communication,

especially for non-par smoothing

• Simulation– Resampling, Bootstrapping– Building demos of complex phenomena– Testing if apparent effects are real

Page 24: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Conclusion

Software has drastically expanded – What analysts can do– How analysts can do it– Which analysts can do it– The way results are reported

Statisticians have to expand their toolkitand communicate with the masses!

Page 25: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Comments?

Thank you for listening.

Page 26: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Some Questions

• Do data analysts really learn useful info from parametric inference (often)?

• Are graphs respectable vehicles to demonstrate results (without parametric inference)?

• Are simulation & resampling more useful tools than classical inference?

• What really is “basic stats”?

Page 27: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Final Quote

• “All of this leads me to suggest that there is a very realistic possibility that statistics will cease to exist. It may flow out through its primordial roots back into substantive areas where it will be developed, in a piece-meal fashion as in its past, by an army of statistical users rather than statistical scientists. It is incumbent on all of us to resist this process of dissolution, to resist defining our subject out of existence. We can begin by not defining our subject too narrowly.”

Jim Zidek 1986

Page 28: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU
Page 29: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Coverage Popular Intro-Stat Textbooks

Overview and Descriptive Stats

Probability & Sampling

Estimation & Testing

Dixon and Massey (1957) 2nd Ed 9% 21% 70% Freund, J.E. (1960) 2nd Ed 36% 29% 35% Huntsberger (1961) 28% 28% 44%

40 years of computers

Moore and McCabe 2nd Ed (1993)

37 % 25 % 34 %

Freedman, Pisani and Purves 3rd Ed. (1998)

38 % 37% 34 %

Wild and Seber 1st Ed. (2000)

25 % 38 % 32 %

Page 30: Data Analysis and the Shackles of Statistical Tradition Larry Weldon Statistics and Actuarial Science SFU

Coverage Popular Intro-Stat Textbooks

"Smoothing"

"Multivariate Data Display"

"Official Statistics"

Dixon and Massey (1957) 2nd Ed 0 0 0 Freund, J.E. (1960) 2nd Ed 0 0 0 Huntsberger (1961) 0 0 0 Moore and McCabe 2nd Ed (1993)

0.4% 0 0.3%

Freedman, Pisani and Purves 3rd Ed. (1998)

0 0 2.6%

Wild and Seber 1st Ed. (2000)

1.8% 0 0