8
Elementary Statistics Triola, Elementary Statistics 11/e 43 Continuous Probability Distributions Unit 12 Normal Probability Distribution The Normal Probability Distribution is the most commonly encountered distribution in nature. Whenever you measure some characteristic of a population, be it running times, weight, volume, etc. you will most likely get data that is distributed normally. If you do not use a measuring device to generate the data, then most likely the data will not be distributed normally. Some examples of data sets that are not normally distributed are polls, daffodils in a lawn, leaves on a tree, or people’s income. The histogram of data that is distributed normally will have the classic bell shape, Pictured here is the Standard Normal Distribution. It is an abstract distribution with mean equal to zero and standard deviation equal to 1.0. If we wish to compare two different populations, we can convert a data value from each population to a standard score, also called a z-score and then compare the two scores. For example, height and weight of men are normally distributed with means 69 inches and 168 lbs respectively, and standard deviations of 2.8 inches and 27 lbs respectively. The ultimate average looking male would weigh 168 lbs and be 69 inches tall. What about someone who was 76 inches tall and weighed 180 pounds. To convert his height to a standard score, we use the following formula, His height converted to a standard score would then be, The standard score corresponding to his weight would be,

Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

  • Upload
    lyphuc

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

43

Continuous Probability Distributions

Unit 12 Normal Probability Distribution

The Normal Probability Distribution is the most commonly encountered distribution in nature.

Whenever you measure some characteristic of a population, be it running times, weight, volume, etc.

you will most likely get data that is distributed normally. If you do not use a measuring device to

generate the data, then most likely the data will not be distributed normally. Some examples of data

sets that are not normally distributed are polls, daffodils in a lawn, leaves on a tree, or people’s income.

The histogram of data that is distributed normally will have the classic bell shape,

Pictured here is the Standard Normal Distribution. It is an abstract distribution with mean equal to zero

and standard deviation equal to 1.0. If we wish to compare two different populations, we can convert a

data value from each population to a standard score, also called a z-score and then compare the two

scores. For example, height and weight of men are normally distributed with means 69 inches and 168

lbs respectively, and standard deviations of 2.8 inches and 27 lbs respectively. The ultimate average

looking male would weigh 168 lbs and be 69 inches tall. What about someone who was 76 inches tall

and weighed 180 pounds. To convert his height to a standard score, we use the following formula,

His height converted to a standard score would then be,

The standard score corresponding to his weight would be,

Page 2: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

44

Looking at these numbers, we would say that he’s much taller than he is heavy. In fact, this person

would appear thin to us even though his weight is above average.

Find the z-score of someone who’s IQ is 120 if

1. _____________

It’s important to note that the histogram of sample data from a normally distributed population won’t

be perfectly bell shaped, but it won’t be skewed either. Compare the following two histograms,

The histogram on the left is not perfectly bell shaped, but it is more or less centered around its mean of

234 which is also close to the mode of 235, and it starts out low, rises to a peak near center and then

drops off. The histogram on the right is clearly skewed to the right. This skew has pulled the mean of

194 to the right of the mode at 190. In a perfect world the two distributions would look as follows,

Page 3: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

45

Area Equals Probability

Recall the histogram for throwing dice. We’ll use as the height of the bars, the expected value, and

assume that we roll the dice 36 times.

For example, the height of the bar for bin 7 is six, because if we roll the dice 36 time we would expect to

get six sevens,

We can create a relative histogram by dividing the height of

each bar by the total number of data points, in this case 36. Thus the height of the bar for bin 7 would

be 1/6 and the height of the bar for bin 6 would be 5/36, etc. If you were to do this for every bar, and

then added the height of all the bars, what would that sum equal?

2. ____________________________

If we were to assume that the width of each bar in our relative histogram was exactly equal to 1.0 what

would be the area of bar 7? Remember area is equal to height times width and the height of this bar is

1/6.

3. ____________________________

What is the probability of rolling a 5 or a 6 or a 7? Remember that these are independent events so and

“and” event will be zero. For example, P(5 and 6 and 7) = 0. Remember, furthermore that

( ) ( ) ( ) ( ) ( )

4. _____________________________

What is the sum of the areas of bars 5, 6 and 7?

5. ____________________________

What conclusion can you draw about the area under the histogram and the probabilities?

6. ______________________________________________________________________________

______________________________________________________________________________

0

2

4

6

8

2 3 4 5 6 7 8 9

10

11

12

Mo

re

Fre

qu

en

cy

Bin

Histogram

Frequency

Page 4: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

46

When we are dealing with continuous datasets, we can’t have single value bins. For one thing, there is

no such thing as someone being, for example, exactly 69 inches tall. Also, if our populations are huge,

then we could use many bins in our histogram. So many, that when you looked at it from a distance, the

outline of the histogram would begin to look like a smooth curve. Histograms are for our samples, but

smooth curves are used to represent the data distribution for entire populations. Look at the similarity

between the two:

The perfect smooth curve on the right is call the Normal Distribution or “bell curve”. When we take a

relatively small sample and histogram it, we are not going to get a perfect bell curve, but if it’s close, we

are going to assume that the sample is from a normally distributed population.

Area under the smooth curve is equivalent to probability just as it was for the histogram. So finding

probabilities becomes a problem of finding areas. The mean of a normally distributed population is

always the dead center of the curve. For example, IQ is normally distributed. The mean is 100 and the

standard deviation is 15. The center of the bell curve would be 100 and 115 would be one standard

deviation unit to the right and 85 would be one standard deviation to the left.

There’s a special normal curve called the Standard Normal Curve. Its mean is 0.0 and its standard

deviation is 1.0. Also, when we are referring to this curve, the axis is labeled the z-axis and values along

the z-axis are called z-scores. These are the exact same z-scores that you learned about earlier. To find

the area under the curve to the left of a z-score we would use the NORM.S.DIST excel tool. Let’s say we

wanted to find the following area,

z = 1.25

We would do the following:

01020

Fre

qu

en

cy

Bin

Histogram

Frequency

Page 5: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

47

The area is 0.8944. Note, Cumulative is always true.

Given the area under the curve, it is also possible to find the corresponding z-score using NORM.S.INV.

Consider the following,

Note that the tool asks for probability and not area.

NORM.S.DIST and NORM.S.INV play important roles later in our study of statistics. When working with

real world populations we use other tools. Let’s take IQ again. The mean of the population is 100 and

its standard deviation is 15. Let’s ask what is the what is the probability of randomly picking a person

Page 6: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

48

whose IQ is less than 90? This is equivalent to finding the area under the normal curve that lies to the

left of the value 90. These generic normal distributions are based on the x-axis and the data is

expressed as X, so we want to find ( ) We use the tool, NORM.DIST (no S).

It says that the area under the curve to the left of 90 is 0.2525, therefore,

( )

Suppose we wanted to do the inverse, that is find the IQ such that 85% of the population fall below it.

In this case we use NORM.INV,

We find that anyone with an IQ of 115.55 will have a higher IQ than 85% of the population. Recall from

the discussion under percentiles that anyone with an IQ of 115.55 falls into the 85 percentile.

What if we wanted to find the percentage of the population that had IQ’s between 90 and 110? In

order to find this, we have to find the area between 90 and 110. One way to get this is to subtract the

Page 7: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

49

area to the left of 90 from the area to the left of 110. You might have to draw some pictures to

convince yourself of this. Therefore,

( ) ( ) ( )

Try it and see what you get,

7. _________________

Finally, how would you find, ( ) The tool can only give you probabilities less than a number.

How about using the complement? Isn’t ( ) ( )

( ) ( )

REMEMBER, AREA EQUALS PROBABILITY.

This is the end of Unit 12. In class, you will get more practice with these

concepts by working exercises in MyMathLab.

Page 8: Continuous Probability Distributions - ronharrow.comronharrow.com/wp-content/uploads/2012/08/Chapter-6-Continuous... · Elementary Statistics Triola, Elementary Statistics 11/e 43

Elementary Statistics

Triola, Elementary Statistics 11/e

50

Answers

1. 1.3333

2. 1.0

3. 1/6

4.

5.

6. The area of the histogram is equivalent to probability.

7.