92
Statistics Overview ©2010 Dr. B. C. Paul

Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Embed Size (px)

Citation preview

Page 1: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Statistics Overview

©2010 Dr. B. C. Paul

Page 2: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Why Are Statistics Important to Engineers

Engineers build models (often mathematical models) of systems and things that we cannot screw-up on and learn the hard way.

Page 3: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Modeling

We build a mathematical model of the situation and then do the math to see if it is going to work for us in the real world

We may not think of it but most of our engineering design equations are mathematical models that were fit to actual data long ago– Newtonian physics (we call them laws now)– Darcy’s law and the Bernoulli Equation

Page 4: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

How do You Decide if a Mathematical Model Fits What You See?

Because you usually can’t measure 100% accurate or don’t think of or can’t consider every minor effect– Real results tend to be distributed around our

potential mathematical models Statistical models consider a distribution of

answers around an underlying trend– You can know the shape and spread of the

variation without knowing the cause

Page 5: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Example

If I have a random number generator that produces numbers between 1 and 100, what value is most likely?

If I take 25 of those random numbers what will the average value most likely be close to?

Page 6: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

What Did You Assume to Get Those Answers?

You assumed how those values were distributed– You considered what was called a uniform

distribution (all numbers are equally likely to come up)

– Statistics begins with a series of standard mathematical distributions

We try to pick one that most nearly matches our reality

Page 7: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Getting Your Answers

You also assumed that the numbers were taken from that distribution at random– ie no one is cherry picking any values

preferentially to any other– One of the reasons that statisticians get so crazy if

they think someone is Cherry Picking the sample Root of all Statistics is that you assume reality

follows a standard mathematical distribution and the part we see was picked at random from that distribution

Page 8: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

How Do We Come Up With What Distribution Closely Resembles Our Reality?

Process Starts with Figuring Out Which of Our Standard Model Distributions it is

Three Levels of Effort Say “I Believe” and assume one

– Most commonly done with “Normal Distribution” - “Bell Curve”

– Many things tend to be normally distributed– Strength of past experience becomes rationale

Also have people who do it without having any idea what they have done

– Standard statistics is built around normal distribution

Page 9: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Levels of Effort

Level 2– Study the distribution to see if we are doing

something terrible– Common approach is called a “Histogram”

it’s a bar graph that we plot our data on so we can look at it

– Also have things like probability paper where you plot your data and see if you get a straight line

Page 10: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Effort Level 3

Use statistical techniques to test whether our sample data is like a set that could reasonably be pulled from some standard distribution– Often our goodness of fit tests

All three levels of effort have some degree of custom for their use in some practices

Page 11: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Measuring Properties of Distributions

Put sample data into a standard equation that generates a number– Often actually call that number a statistic– Measures some property of the distribution that the

data was taken from Some statistics have obvious tangible meaning

– Example - Mean - mathematical average value of the sample or population

Page 12: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Calculating a Mean (or simple average)

Add up all the numbers and then divide by how ever many numbers you added

Example– Numbers 5, 10, 15, 20, 25– What is the Mean?

Calculate– (5 + 10 + 15 + 20 + 25)/5– Numerator totals to 75– Denominator is the number of values I put in– Divide the total by the number of values put in– Answer is 15 (the Mean or Average Value)

Page 13: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Statisticians Need Confusing Ways to Write Equations

Xi means a sample value– The i subscript tells you whether it was the first, second, third etc

sample From example on last slide we know X2 was the second number we

looked at which was 10 Σ means the sum of a series of values n means the number of samples considered Thus we write the formula for mean as

– We of course also have a special symbol for a mean

X

n

n

iX1

Page 14: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

More Measurements

Mode– The value that has the greatest chance of coming

up

Example– If I have 10 people who are 5’10”– 2 people who are 4’3”– 2 people who are 6’10”– If you pick a person at random from my group what

height will person most likely be?

Page 15: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

More Measures

Median– Half of the values are higher - half are lower

Mean, Median, and Mode all seem to have somewhat obvious physical meanings

Other statistics are less obvious– Variance – A number that comes out of a formula that tells you how

spread out the distribution is

Square root of variance is Standard Deviation– Average difference between a sample and the mean value

Page 16: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Standard Deviation

Standard Deviation is the average difference between individual samples and the mean

1

)( 2

n

Xs X i

What does it mean?Take each sample number, subtract the average sampleValue from it, square the result, do this for every numberAnd add up the result, then divide the result by one lessThan the number of samples you took, and then take theSquare root of that value.

Page 17: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

As a Practical Matter That’s a Pain I have to compute the average before I can do the

math for standard deviation Alternative Formula

1

1

2

12

nn

s

n

n

i

i

XX

Tells you keep track of two number1- Take each number square it and then add the squares

up2- Take each number and add them up and then square

the total

Page 18: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Getting Standard Deviation

Statistical Calculators have multiple memories– They add up numbers in one memory– They square and add up numbers in another– They total entries in another– They then apply the standard deviation formula

Of course can also use SPSS

Page 19: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Types of Distributions

Idea is that we try to approximate reality with a mathematically defined distribution

– Then we can use mathematical operations to predict our answers

Distributions that often fit reality– Normal Distribution (developed in 1733)

Bell Curve– Uniform Distribution– Binomial Distribution– T Distribution– Qui Square Distribution– Lognormal Distribution

Page 20: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Derived Distributions

T distribution, Qui Squared, and Lognormal Distributions are all derived from the Normal Distribution for specific types of situations

Page 21: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Normal Distribution

Shaped Like

Formula

ex

xfY

2

2

2

2

1)(

Page 22: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Symmetric Distributions with a Central Tendency

Normal Distribution is classic example– Most of the chances are right near the center of the

distribution Frequency drops off to sides Mode is at the Center of the Distribution

– Distribution is mirror image about its center Allows to just compute one side Median is Mean is the Mode

A lot of reality has central tendency with relatively symmetric sides

– T distribution like that too Sides slope off a little differently

Page 23: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Why the Normal Distribution

One of the first mathematically defined distributions that was a real good fit– People developed other formulas and

distributions from calculations done on the normal distribution

T distribution and Qui Square Distribution both result from performing mathematical operations on samples of a normal distribution

– Normal Distribution was first to press with a distribution that was heavy at the center and symmetric

Page 24: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Reality 101 for Statistical Distributions

Probably no such thing as a real normal distribution in life

Even if there were we almost never count each and every member of the population so you’d never know if it was

Statistical Distributions let us take limited data – see what it approximately is– Then use the defined mathematical model to

suddenly know everything about it

Page 25: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Back to Why the Normal Distribution

Big part of Real World is Central Tendency and Symmetric

Found that calculations done with a normal distribution were robust– Minor lack of fit in real world data doesn’t change

the answers much– Thus works on almost anything with central

tendency and near symmetric

Page 26: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Most Common Lack of Fit

Not Symmetric

Robustness covers aLittle skewness

This type of shape can be fit with aDistribution adapted from normal calledlognormal

If you take averages of about 25 samplesFrom this – the averages will be normal(averaging normalizes)Taking logarithms of the data will makeThe transformed distribution normal

Taking square-root will normalizeA few others

Page 27: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Multi-Modal Distributions

These types of distributions are often 3 different normallyDistributed families over-lying each other

Finding what is causing the three families often helps usTo better understand our world

Page 28: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Uniform Distribution

All values within some range (which may or may not be plus or minus infinity) are equally likely

Distribution has no central tendency Tends to be associated with truly random

events (or at least events where the underlying cause is eluding our mathematical modeling)

Page 29: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Characteristics of Uniform Distribution

Because all values are equally likely it has no mode

Mean is at the center of the range Uniform is still symmetric about Mean so the

Median and Mean are the same Standard Deviation is 1/4th the range (if

range is infinite obviously that’s not defined) Variance is Standard Deviation Squared

Page 30: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Binomial Distribution

Outcomes that are either off or on– Clearly describes computers and digital data

Many things either work or they don’t– Mining dealing with whether our trucks are in

working order– Water treatment plant – water purification train is

working or not working– Coin tosses are heads or tails

Page 31: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

New Problem

Can’t talk about means, modes, and medians because outcome has no continuous distribution

Want to know what fraction of the outcomes are “yes”

– P = 0.85 85% of members of bimodal population are positive

Usually interested in what chances are that we can take 5 members out of the population and have them all positive

– Example if I have 5 mining trucks how much of the time will all 5 be running?

Page 32: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Ordinate Problem

How continuously distributed are our outcomes?– Our number line is continuous so at first glance we

almost assumed everything was continuous When and what if they are not This usually doesn’t take a very smart

statistician to figure out Some things are yes or no distributed

– Use binomial distribution model Da!

Page 33: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Some Things are Integer Distributed

Continuity really is a function of observational scale– According to quantum physics everything is made

of integer numbers of discrete quanta– At our observation scale the little integer jumps

are perhaps so small we cannot even measure them

– Many times integer continuity is negligible

Page 34: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

What If Integer Continuity is Not Negligible?

Happens when have small numbers or integer distributed data– How does one deal with teacher rankings in

classes of 5 students? Our scale of observation is integer Our sample size is small enough we can’t mask it If it was a class of 500 students we could probably

model outcomes rather well as if continuous

Non-Parametric Statistical Models

Page 35: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Using Statistics

Confidence Intervals and Hypothesis Tests What would you say if we did a coin toss and I came

up heads and won What if I did it to you 50 times in a row I something differs too much from the expected

value you question the things you assumed– Null hypothesis is nothing is going on– Rejecting the null hypothesis means you question the

fundamental assumptions.

Page 36: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Statistical Tests

What is a confidence interval?If I take a sample where is it mostLikely to come from.

Suppose I pull a sample and its value is fromWay out here?

What do I know? - that was pretty unlikely to happen – in fact – at somePoint I’m going to wonder whether I really got it from that population

Confidence Interval Problems all have the flavor of deciding how far out inThe tails, how rare, the sample is or would be if you could get it.

Page 37: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Too Many Normal Distributions

Normal distribution is defined by its mean and standard deviation

– There are endless possibilities

We start by standardizing our results to a standard normal distribution with a mean of 0 and an stdev of 1.

– Has the form

X

Z

Page 38: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Just Any Normal Distribution

Our Value X

Our formula converts that pointTo an equal point on the standardNormal distribution.

0

Stdev=1

Page 39: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Once We Are On A Standard Normal Distribution we look at how extreme a value we have

What % of the Values areMore Extreme than this?

Page 40: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Preparing for Rainfall

Wendy Wetone has just designed a storm sewer system for a new housing project

– Culverts and intakes will handle a 2.5 inch rainfall in 24 hours

– The average big rainfall even in the area is only 1.25 inches

Wendy is ok Right? If the roads and homes in an area are going to wash

out maybe being ready for an average rain isn’t good enough

Page 41: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Reality for Major Rainfall Events

Average is 1.25 inches, but suppose there is a 1 inch standard deviation

μ = 1.25

σ = 1

How would we knowSomething like this?

We built a modelFrom weatherRecords.

Is there enough of a chance up hear that I should be getting heart-burn over this design?

Page 42: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

We Know How to Solve This One

Normal Distribution is fully defined by a formula

We only need to know the average (in this case 1.25) and the variance (standard deviation squared – easy when standard deviation is one)

ex

xfY

2

2

2

2

1)(

Page 43: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

What That Formula Does

Y is a probability value (chance of occurrence) X in this case is a rainfall event Rather obviously we are interested in rainfall events

greater than 2.5 inches– Guess that means x is 2.5

Problem – Formula gives probability for only a discrete value – ie it will give us the probability of a 2.5 inch rain event

– We are in fact worried about any event that exceeds our design capacity

Page 44: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

That’s not a Problem for Us Smart Engineers

Just Integrate the Function from 2.5 inches on up– In fact most statistical modeling is done on

cumulative probability distributions (ie integrated areas on the probability density function)

Just one little problem– Normal probability density function is one of those

beasts that the math teachers don’t like to talk about – can’t get an analytical integrated solution

Page 45: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

That’s Only a Problem for Mathematicians

We have numeric integration Ok maybe that is a problem if we have to

integrate that thing– Remember – desk top computers are recent

vintage Do you have a numeric integration package on your

computer even now?

Normal Distribution dates from 1733 so know someone created tables of numeric integration

Page 46: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Normal Distribution Table

Page 47: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Converting to a Value on Standard Normal Distribution

What we want to know is what are the chances of a rainfall event exceeding our drainage system design– Ie what percentage of big rainstorms will exceed

2.5 inches (on a distribution with mean of 1.25 and standard deviation of 1)

Convert 2.5 inches to an equivalent value on standard normal distribution– The area above that value in the curve will be the

same as our actual distribution.

Page 48: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Magic Conversion Formula

x

Z

1

25.15.225.1

Page 49: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Now Its Look Up TimeProb = 0.8944

Page 50: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Results

Table shows that from minus infinity to 1.25 there is 0.8944– Ie 0.1056 is above 1.25

English Translation– There is a 10.56% chance that a large rainfall

event will exceed the design capacity of our drainage system

– Sounds like Wendy might be doing some design work over

Page 51: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Basis for Rainfall Events

10% chance called a 10 year storm (distribution of years largest storms)

0.05% chance called a 20 year storm 0.01% chance called a 100 year storm When say it is designed for a 100 year flood it

doesn’t mean it only happens every 100 years– It means 1% chance in any given year– Problem with other thinking is if you had a big flood 5 years

ago that must mean there is no chance it will ever happen again in your lifetime (Wrong!)

Page 52: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Ore Grade Control

Orville Orman is planning a truck fleet to haul his copper ore out of his mine– Some rock will have so little copper in it that it

would cost more to process than its worth This stuff is going to get put aside

– Other pay rock will be carried to the processing plant

Commonly have ore and waste truck fleets but need to know how much of each type of rock you will have to design.

Page 53: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Orville’s Ore

Orville knows average grade is 0.95% Cu Standard Deviation is say 0.5% Cu Cut-Off Grade (point at which ore costs more

to process than Cu will sell for) is 0.25% What percentage of Orville’s ore is below cut-

off grade?

Page 54: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Situation

μ = 0.95

σ = .5

0.25

How much ofMy rock isDown here?

Page 55: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Oh We Are Hot

Our critical x value is 0.25 We will convert this to a “Z score” from the

standard normal distribution We will then look up in the table how much of

our distribution is from minus infinity to our Z We will then tell our truck planners how much

rock to prepare for

Page 56: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Crunch Away

5.0

95.025.04.1

Go to the Table

Table Says! 0.0808

About 8.1% of our Rock is Below Cut-Off

Page 57: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Previous Examples

Called One Tailed Tests– Our Civil Engineers were concerned about events

larger than some amount An upper tail test

– Our Mining Engineers were concerned about tonnage below cut-off

A lower tail test

What if interest in either too much or too little– Typical of a machine tolerance problem

Page 58: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Tolerance

Benjamin Bidwell would like to bid on a DOD order for machined shafts

– The spec says 1 inch +/- 0.005 inches– Benjamin knows his men and equipment can put any

chosen part size within a standard deviation of 0.0025 inches

– He figures he can put in a winning bid provided no more than 3% of the pieces he makes have to be rejected

Can Benjamin put in a winner bid on this order?

Page 59: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Situation

σ = 0.0025

μ = 1 1.0050.995

How manyProducts areOut hereIn theTails?

Page 60: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

We Know What to Do

Convert those limits to Z scores

Start with the top limit

0025.0

1005.12

Table Look Up Says 0.9773 or 2.27% will be too largeNow we use our knowledge – this distribution and

tolerance isSymmetric - ie 2.27% on the bottom endThat Sucks - about 4.54% of products will be out

of Spec

Page 61: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Hypothesis Test

Hubbert’s Hammers makes clobber balls for use in a doll recycling plant. Hardness is important to determining the longevity of the hammer balls. Herby has been getting some customer complaints about his balls not holding up and pulls a few off the assembly line for testing. He gets values of 3, 3.6, 4.2, 4.1, 2.7, 4.7, and 4.3. The balls are suppose to have a hardness of 4.5. Does Herby have a problem?

Page 62: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Herby Runs to SPSS, enters his sample data He gets an average of 3.8 and a Stdev of 0.73.

Page 63: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Interpreting the Data

Everyone knows not every ball will be 4.5 hardness, but on average they need to be.

Herby knows that if he ran to his assembly line and grabbed another 7 balls at random he would get a different number.

Page 64: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Herby’s World

μ

σ

Herby knows that 95% of theTime a sample of 7 grab ballsWill be within 1.96 standardDeviation units of the true mean.

*96.1

X

(He’s spent too much timeLooking at normal distributionTables)

Page 65: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Herby Formulates a “Hypothesis Test”

Herby thinks the endurance of his balls has gone down.

The “null hypothesis” is that this one sad looking sample is not enough to conclude the mean ball hardness on the assembly line has changed

– If the sample falls within 1.96 standard deviation units of the target mean of 4.5 Herby can be 95% certain the spec on his assembly line is still in tolerance

– If not Herby will reject the “null hypothesis” and conclude that his assembly line is screwed

Oh gosh – get out the crosses and garlic – where starting to sound like statisticians.

Page 66: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The “Alpha Level”

In reality Herby could grab 7 balls on a perfectly normal assembly line and get any value

– Yet Herby is going to declare a disaster if he does not come out within 1.96 standard deviation units of his target value

Because in the real world a sample could come from anywhere, one of the decisions we have to make is how willing are we to be wrong.

– This is called setting our Alpha Level– How great is the chance that we will reject the null

hypothesis when we shouldn’t have

Page 67: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

OK – Lets Get on With Herby’s Test

Plug into the Equation

Wholly Marshmallows! What do we use for standard deviation?

– Our standard deviation was the standard deviation for individual samples – not averages

*96.18.3

Page 68: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

What’s the Big Deal About Individual Samples and Averages?

In a large general ed class what kind of range do you get on peoples test scores?

Ever noticed that certain professors test average scores tend to come out about the same value year after year?

Point- In a random sample, the standard deviation of an average will always be less than the standard deviation of individual values.

Page 69: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

OK- I Believe – Now Get Me the Dogone Standard Deviation

For a random sample the standard deviation of the mean is

nsamples

Mean

If you think I’m going to try showing you the proof your out ofYour mind.

Where n= # samplesUsed in the mean

Page 70: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

OK – Let Roll

Our standard deviation of the mean is

Plug into the magic equation

7

73.0276.0

276.0*96.18.334.4 Oh Crud – The Assembly Line is Turning Out Weak Balls!

Page 71: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

What if We Had Set A Higher Alpha Level

Plug and Chug for 1% Alpha Level

Now we look ok Note from standard deviation formula that larger

samples suck in the standard deviation– If there really is a problem with Herby’s balls – how big a

sample will it take to see the problem?

276.0*575.28.351.4

Page 72: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Figuring Out a Required Sample Size

Herby’s assembly line is suppose to turn out balls of 4.5 hardness

– How far out of spec can Herby Tolerate Things? Suppose Herby decides he needs his estimates to be good to

within 0.5 hardness units. Next Herby has to decide how much of a chance he

is willing to take that he will shut down the line and issue recalls when nothing is really wrong at all.

– Suppose Herby wants 99% confidence (ie – alpha level is 1%)

99% of a normal distribution is within +/- 2.575 standard deviation unit of the true mean

Page 73: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Herby’s Task

Herby needs to detect a 0.5 hardness unit departure from the 4.5 target hardness but still have a less than 1% of shutting the line down by mistake.

Formula is

Note that this is just the plus or minus part of our confidence interval formula

Ln

Z *

Where L is the min error that must beBe detectedZ is the Z value for our alpha level

Page 74: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Doing the Math

First solve for our sample size needed

Then plug into the equation and solve

LZn 2

22*

5.073.0575.2

2

22*

n

N=14.13 as a practical matter means need sample of 15To actually achieve desired accuracy with an acceptable risk.

Note – this also implies that higher confidence requires more money spentOn sampling and testing.

Page 75: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Herby’s Assembly Line Analysis to Date

Herby has grabbed a sample of 7 balls off the assembly line With this sample Herby is 95% sure he has a problem with the

hardness of the balls being produced When Herby checked for only a 1% chance that he was going

to shut the line down for no reason at all Herby’s sample could not furnish him enough certainty

To detect a 0.5 unit departure from the target hardness of 4.5 and doing so with no more than a 1% chance of stopping the line for a quirk of sampling Herby must take a grab of 15 balls off the assembly line

Page 76: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Comparing Two Samples

Red Rooster Carburetor company would like to claim that their carburetors improve fuel economy by 20% when their replacement carburetors are used.

Red Rooster assembles teams of drivers to drive two sets of cars – one that has been retrofit with Red Rooster Carburetors and one that uses the manufactures original carburetors

Page 77: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Data Begins Coming In

The standard vehicles came in with an average of 21.4 mpg and stdev of 6.1 from 60 car and driver combinations

The Rooster Carburetor Vehicles came in with 29.5 mpg and stdev of 6.2 from 41 car and driver combinations

Page 78: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Setting Up A Test

If the average gas mileage for the no Rooster set is improved 20% its adjusted mean is 25.68

The Null Hypothesis is that the mean of cars gas mileage is the same (after the 20% adjustment)– Set the test up to reject and conclude the Rooster

Carburetor set is more than 20% better if the test statistic is extreme enough

Page 79: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Test Statistic

nn

YYZ

2

2

2

1

2

1

21

We will let Y1 be our Roostercarburetor

We will let Y2 be our StandardVehicles with 20% improvement

If Y1 is bigger than Y2 it will cause Z to become increasingly large. If Z isSo far out in the upper tail that there is little chance it could be a randomEvent we will reject the null hypothesis and conclude that the Red RoosterCarburetors do improve fuel economy by 20%

Page 80: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

A Note on Our Test Statistic

nn

YYZ

2

2

2

1

2

1

21

The denominator is what we callA pooled estimate of variance

Strictly speaking the test is assumingThat the two populations haveThe same variance. If the variancesAre close it is accepted practice toTo allow the lye as close enough.

How much different can the variances be and still be about the same?Actually a bit of a judgment call but I’m not worried about 6.1 and 6.2

Page 81: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Plug and Chug

6041

68.255.29

1.62.622

Z

Z=3.06 do to the table to look up how much ofThe normal distribution is beyond 3.06 standardDeviation units

Page 82: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Do A Table Look Up

Area under the curve is 0.99889 or 0.00111 ie 0.111% of the distribution isFurther out. There is about 1/10th of 1% chance that the observed result isA fluke.Action – Reject the null hypothesis on conclude that the Red Rooster CarburetorDoes improve fuel economy by more than 20%

Page 83: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Paired Experiments

What if Red Rooster Carburetors is a group of students who designed their carburetor in the machine shop at school

– The idea that they can go out and build 41 carburetors and send 101 cars and drivers out to burn up a bunch of gas is kind of “iffy”

One Way to Get Sample Size Down is to get rid of some of that random variance

– What if we used the same car and driver with and without the Red Rooster Carburetor?

We just took out two sources of scatter in the data– This is called a Paired Experiment

Page 84: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Paired Experiments Needs to be a solid basis for pairing– Can make the numbers crunch pairing up anything

Experiment – I want to show that students from Illinois are smarter than students from Missouri. I give a test to 40 SIU seniors that are Illinois residents. I then give the same test to 40 Kindergarteners from Missouri. I match the students up in the order in which tests were turned in and do my test.

– If my test statistic shows that my Illinois students scored higher are you willing to believe that Illinois students are smarter than Missouri students?

Page 85: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

OK that last one raises some concerns about the Intelligence of who ever designed that experiment

The basis for pairing should be that we are pairing like items to eliminate variation from what ever we are trying to “write out” of the experiment by pairing.

Suppose we make one Red Rooster Carburetor to go on a Dodge Neon and I have 10 students drive the vehicle over the same road course before adding the carburetor. I then add the carburetor and have the same 10 students drive the same car over the same course. I will then pair the results before and after adding the carburetor

Page 86: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Looking at My Results

Standard Dodge Neon– Don Dork 26.5– Kurt Kurtosis 25.7– Angela Airhead 25.2– Mark Maniac 23.9– Katty Careful 28.1– Jim Junkyard 26.2– Steve Stickshift 25.9– Burt Bunion 27.1– Saedy Sadist 26.7– Melvin Mizer 28.2

Neon with RR Carb– Don Dork 32.1– Kurt Kurtosis 30.1– Angela Airhead 31.8– Mark Maniac 29.8– Katty Careful 34.2– Jim Junkyard 30.6– Steve Stickshift 31.2– Burt Bunion 33.2– Saedy Sadist 32.8– Melvin Mizer 34.5

Page 87: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The test requires us to get the differences within our pairing

Don Dork Result – 32.1- 26.5 = 5.6 Kurt Kurtosis - 30.1 – 25.7 = 4.4 And so on through the pairing.

Page 88: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Tuning in a Little More Red Rooster actually wants to claim a 20% increase in gas mileage so we may be able to normalize out some more variance by directly measuring % improvement.

– Results 21.13%, 17.12%, 26.19%, 24.68%, 21.71%, 16.79%, 20.48%, 22.51%, 22.85%, 22.34%

We also are interested in how much these values differ from 20% improvement so we can subtract 20% from each value

– 1.13%, -2.88%, 6.19%, 4.68%, 1.71%, -3.21%, 0.48%, 2.51%, 2.85%, 2.34%

Plug the Data into SPSS to get Mean and Standard Deviation– Could also use Excel and function =average(data range) and

=stdev(data range) for standard deviation

Page 89: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Hypothesis

Ho = there is no difference between our set of numbers and 0– Specifically means we cannot be sure we have

over 20% improvement

Rejecting the null hypothesis means we are sure we have over 20% improvement

Page 90: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

The Test Statistic for a Paired Experiment

n

dtsd

D with the bar over it is the averageDifference (in this case 1.58%)

Sd is the standard deviation of theIndividual differences as calculated(in this case 2.95%)

N is of course the number of samples(in this case 10)

Crunching the number we get 1.69

Page 91: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Looking Up Our ResultWe have n-1 degrees of freedom(in this case 9)

1.69 is between 90 and 95%Significant. We cannot rejectThe null hypothesis at the 95%Level, but we can at about93% confidence.

Page 92: Statistics Overview ©2010 Dr. B. C. Paul. Why Are Statistics Important to Engineers Engineers build models (often mathematical models) of systems and

Limitations of Our Results

93% confidence we have over 20% improvement may fall short of the proof some people would demand

– One way to strengthen the conclusion is more samples (the standard deviation shrinks with more samples and since it is in the denominator that makes t bigger)

We may also be concerned that all our tests were on a Dodge Neon which furnishes no data on whether the result would be improved on other cars as well