47
Para-normal Statistics: Analyzing what doesn't add up. Steven Lembark Workhorse Computing [email protected]

Paranormal stats

Embed Size (px)

Citation preview

Page 1: Paranormal stats

Para-normal Statistics:Analyzing what doesn't add up.

Steven LembarkWorkhorse [email protected]

Page 2: Paranormal stats

Normality

We expect data is normal.

It's what we are trained for.

Chi-Squared, F depend on it.

It's the guts of ANOVA.

Theory guarantees it, sort of.

Page 3: Paranormal stats

What is "normal"?

Normal data is:

Parametric

Real

Symmetric

Unimodal

Page 4: Paranormal stats

Ab-normal data

Not all data is parametric.

Page 5: Paranormal stats

Ab-normal data

Not all data is parametric:

"Bold" + "Tide" / 2 ==

Page 6: Paranormal stats

Ab-normal data

Not all data is parametric: Nominal Data

"Bold" + "Tide" / 2 == ??

"Bald" - "Harry" >= 0 ??

Page 7: Paranormal stats

Ab-normal data

Not all data is parametric: Ordinal Data

"On a scale of 1 to 5 how would you rate..."

Is the average really 3?

Are differences between ranks unform?

Page 8: Paranormal stats

Ab-normal data

Not all data is parametric: Ordinal Data

"On a scale of 1 to 5 how would you rate..."

Is the average really 3?

For different people?

Page 9: Paranormal stats

Ab-normal data

Not all data is unimodal, symmetric.

Bi-modal data has higher sample variance.

Positive data is skewed.

Page 10: Paranormal stats

Ab-normal data

Counts usually Binomial or Poission.

Binomial: Coin flips.

Poisson: Sample success/failure.

Page 11: Paranormal stats

Power of Positive Thinking

Binomial: Count of success from IID experiments.

Mean = np

Variance = npq

Page 12: Paranormal stats

Power of Positive Thinking

Poisson: Count of occurrances in sample size n.

Mean = np

Variance = np

Page 13: Paranormal stats

Power of Positive Thinking

Curves all positive.

Right tailed.

Binomial has highest power if sample data is binomial.

Result: Smaller n for given Beta.

Page 14: Paranormal stats

Kinda normal

Approximations work...

Page 15: Paranormal stats

Kinda normal

Approximations work some of the time.

Rule: npq > 5 for binomial approximation.

Goal: Keep mean > 3σ so normal is all positive.

Q: How good an approximation?

A: It depends...

Page 16: Paranormal stats

The middle way

Binomial:

n=20, p=0.5

Normal:

µ 10, σ = 2.23

Decent approximation.

Page 17: Paranormal stats

Off to one side

Binomial:

n=20, p=0.3

Normal:

µ = 6, σ = 2.0

Drifting negative.

Page 18: Paranormal stats

Life on the edge

Binomial:

n=20, p=0.1

Normal:

µ = 2, σ = 1.3

Significant negative.

Page 19: Paranormal stats

Neverneverland

Binomial:

n=20, p=0.0013

Normal:

µ = 0.26

σ = 0.16

Heavily negative.

Page 20: Paranormal stats

General rule: npq > 5

Small or large p is skewed.

Six-sigma range should be positive.

At that point n > 5 / pq.

For p = 0.0013, n = 3582.

Sample size around 4000?

Page 21: Paranormal stats

When we assume we make...

Assuming normal data leaves a less robust conclusion.

Stronger, less robust:

Sensitive to individual datasets.

Not reproducable.

Page 22: Paranormal stats

Non-parametric Statistics

Origins in Psychology, Biology, Marketing.

Analyze counts, ranks.

Tests based on discrete distributions.

Page 23: Paranormal stats

Common in Quality

Frequency of failures.

QC with No-Go guages.

Variations between batch runs.

Customer feedback.

Page 24: Paranormal stats

Example: Safety study

Q: Are departments equally "safe"?

Q: Is a new configuration any "safer"?

Compare sample populations.

Page 25: Paranormal stats

What is "safe"?

Fewer reported injurys?

What is P( injury ) per operation?

Page 26: Paranormal stats

What is "safe"?

Fewer reported injurys?

What is P( injury ) per operation?

0.5?

0.3?

Page 27: Paranormal stats

What is "safe"?

Fewer reported injurys?

What is P( injury ) per operation?

0.5?

0.1?

A whole lot less?

Page 28: Paranormal stats

What is "safe"?

Fewer reported injurys?

What is P( injury ) per operation?

0.5?

0.1?

A whole lot less?

N(0.01, 0.01) is heavily negative.

Page 29: Paranormal stats

Severe?

Parametric measure of injurys?

Page 30: Paranormal stats

Severe?

Parametric ranking of injurys?

( Finger + Thumb ) / 2 == ?

Page 31: Paranormal stats

Severe?

Parametric ranking of injurys?

( Finger + Thumb ) / 2 == ?

( Hand + Eye ) == Arm ?

( Hand + Hand ) == 2 * Hand ?

Page 32: Paranormal stats

Ordinal Data

Ranked data, not scaled.

Page 33: Paranormal stats

Ordinal Data

Ranked data, not scaled.

Hangnail < Finger Tip < Finger < Hand < Arm

"Fuzzy Buckets"

Have p( accident ) from history.

Page 34: Paranormal stats

Kolomogrov-Smirnov

Got tonic?

Page 35: Paranormal stats

Kolomogrov-Smirnov

Nope, not Vodka.

Like F or ANOVA: Populations are "different".

Page 36: Paranormal stats

K-S Test

Compare cumulative data (blue) vs. Expeted (red).

Measure is largest difference (arrow).

Page 37: Paranormal stats

K-S for safety

Rank the injurys on relative scale.

Compare counts by bucket.

Cumulative distribution:

accomodates empty cells.

minor mis-catagorization.

Page 38: Paranormal stats

A good datum is hard to find,You always get the other kind.

Apologies to Bessie Smith

Sliding-scale questions:

"How would you rate..."

"How well did..."

"How likely are you to..."

Page 39: Paranormal stats

A good datum is hard to find,You always get the other kind.

Apologies to Bessie Smith

Reproducability:

Variable skill.

Variable methods.

Variable data handling.

Page 40: Paranormal stats

A good datum is hard to find,You always get the other kind.

Apologies to Bessie Smith

Big Data:

Multiple sources.

Multiple populations.

Multiple data standards.

Page 41: Paranormal stats

Repeatable Analysis

Variety of NP tests for "messy" data.

Handle protocol, sampling variations.

Robust conclusions with real data.

Page 42: Paranormal stats

Summary

Non-parametric data: counts, nominal, ordinal data.

Non-parametric analysis avoids NID assumptions.

Robust analysis of real data.

Even the para-normal.

Page 43: Paranormal stats

Questions?

Page 44: Paranormal stats

References: N-P

http://www.uta.edu/faculty/sawasthi/Statistics/stnonpar.html

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC153434/

Nice writeups.

Page 45: Paranormal stats

References: K-S

http://itl.nist.gov/div898/handbook/eda/section3/eda35g.htm

Exploratory data analysis is worth exploring.

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

As always... really good writeup of the test definition, math.

Page 46: Paranormal stats

References: Robust Analysis

https://en.wikipedia.org/wiki/Robust_statistics

https://en.wikipedia.org/wiki/Robust_regression

Decent introductions.Also look up "robust statistics" at nist.gov or "robust statistical analysis" at duckduckgo.

Page 47: Paranormal stats

References: This talk

http://slideshare.net/lembark

Along with everything else I've done...