Upload
lyphuc
View
215
Download
0
Embed Size (px)
Citation preview
Elementary Statistics
Triola, Elementary Statistics 11/e
43
Continuous Probability Distributions
Unit 12 Normal Probability Distribution
The Normal Probability Distribution is the most commonly encountered distribution in nature.
Whenever you measure some characteristic of a population, be it running times, weight, volume, etc.
you will most likely get data that is distributed normally. If you do not use a measuring device to
generate the data, then most likely the data will not be distributed normally. Some examples of data
sets that are not normally distributed are polls, daffodils in a lawn, leaves on a tree, or people’s income.
The histogram of data that is distributed normally will have the classic bell shape,
Pictured here is the Standard Normal Distribution. It is an abstract distribution with mean equal to zero
and standard deviation equal to 1.0. If we wish to compare two different populations, we can convert a
data value from each population to a standard score, also called a z-score and then compare the two
scores. For example, height and weight of men are normally distributed with means 69 inches and 168
lbs respectively, and standard deviations of 2.8 inches and 27 lbs respectively. The ultimate average
looking male would weigh 168 lbs and be 69 inches tall. What about someone who was 76 inches tall
and weighed 180 pounds. To convert his height to a standard score, we use the following formula,
His height converted to a standard score would then be,
The standard score corresponding to his weight would be,
Elementary Statistics
Triola, Elementary Statistics 11/e
44
Looking at these numbers, we would say that he’s much taller than he is heavy. In fact, this person
would appear thin to us even though his weight is above average.
Find the z-score of someone who’s IQ is 120 if
1. _____________
It’s important to note that the histogram of sample data from a normally distributed population won’t
be perfectly bell shaped, but it won’t be skewed either. Compare the following two histograms,
The histogram on the left is not perfectly bell shaped, but it is more or less centered around its mean of
234 which is also close to the mode of 235, and it starts out low, rises to a peak near center and then
drops off. The histogram on the right is clearly skewed to the right. This skew has pulled the mean of
194 to the right of the mode at 190. In a perfect world the two distributions would look as follows,
Elementary Statistics
Triola, Elementary Statistics 11/e
45
Area Equals Probability
Recall the histogram for throwing dice. We’ll use as the height of the bars, the expected value, and
assume that we roll the dice 36 times.
For example, the height of the bar for bin 7 is six, because if we roll the dice 36 time we would expect to
get six sevens,
We can create a relative histogram by dividing the height of
each bar by the total number of data points, in this case 36. Thus the height of the bar for bin 7 would
be 1/6 and the height of the bar for bin 6 would be 5/36, etc. If you were to do this for every bar, and
then added the height of all the bars, what would that sum equal?
2. ____________________________
If we were to assume that the width of each bar in our relative histogram was exactly equal to 1.0 what
would be the area of bar 7? Remember area is equal to height times width and the height of this bar is
1/6.
3. ____________________________
What is the probability of rolling a 5 or a 6 or a 7? Remember that these are independent events so and
“and” event will be zero. For example, P(5 and 6 and 7) = 0. Remember, furthermore that
( ) ( ) ( ) ( ) ( )
4. _____________________________
What is the sum of the areas of bars 5, 6 and 7?
5. ____________________________
What conclusion can you draw about the area under the histogram and the probabilities?
6. ______________________________________________________________________________
______________________________________________________________________________
0
2
4
6
8
2 3 4 5 6 7 8 9
10
11
12
Mo
re
Fre
qu
en
cy
Bin
Histogram
Frequency
Elementary Statistics
Triola, Elementary Statistics 11/e
46
When we are dealing with continuous datasets, we can’t have single value bins. For one thing, there is
no such thing as someone being, for example, exactly 69 inches tall. Also, if our populations are huge,
then we could use many bins in our histogram. So many, that when you looked at it from a distance, the
outline of the histogram would begin to look like a smooth curve. Histograms are for our samples, but
smooth curves are used to represent the data distribution for entire populations. Look at the similarity
between the two:
The perfect smooth curve on the right is call the Normal Distribution or “bell curve”. When we take a
relatively small sample and histogram it, we are not going to get a perfect bell curve, but if it’s close, we
are going to assume that the sample is from a normally distributed population.
Area under the smooth curve is equivalent to probability just as it was for the histogram. So finding
probabilities becomes a problem of finding areas. The mean of a normally distributed population is
always the dead center of the curve. For example, IQ is normally distributed. The mean is 100 and the
standard deviation is 15. The center of the bell curve would be 100 and 115 would be one standard
deviation unit to the right and 85 would be one standard deviation to the left.
There’s a special normal curve called the Standard Normal Curve. Its mean is 0.0 and its standard
deviation is 1.0. Also, when we are referring to this curve, the axis is labeled the z-axis and values along
the z-axis are called z-scores. These are the exact same z-scores that you learned about earlier. To find
the area under the curve to the left of a z-score we would use the NORM.S.DIST excel tool. Let’s say we
wanted to find the following area,
z = 1.25
We would do the following:
01020
Fre
qu
en
cy
Bin
Histogram
Frequency
Elementary Statistics
Triola, Elementary Statistics 11/e
47
The area is 0.8944. Note, Cumulative is always true.
Given the area under the curve, it is also possible to find the corresponding z-score using NORM.S.INV.
Consider the following,
Note that the tool asks for probability and not area.
NORM.S.DIST and NORM.S.INV play important roles later in our study of statistics. When working with
real world populations we use other tools. Let’s take IQ again. The mean of the population is 100 and
its standard deviation is 15. Let’s ask what is the what is the probability of randomly picking a person
Elementary Statistics
Triola, Elementary Statistics 11/e
48
whose IQ is less than 90? This is equivalent to finding the area under the normal curve that lies to the
left of the value 90. These generic normal distributions are based on the x-axis and the data is
expressed as X, so we want to find ( ) We use the tool, NORM.DIST (no S).
It says that the area under the curve to the left of 90 is 0.2525, therefore,
( )
Suppose we wanted to do the inverse, that is find the IQ such that 85% of the population fall below it.
In this case we use NORM.INV,
We find that anyone with an IQ of 115.55 will have a higher IQ than 85% of the population. Recall from
the discussion under percentiles that anyone with an IQ of 115.55 falls into the 85 percentile.
What if we wanted to find the percentage of the population that had IQ’s between 90 and 110? In
order to find this, we have to find the area between 90 and 110. One way to get this is to subtract the
Elementary Statistics
Triola, Elementary Statistics 11/e
49
area to the left of 90 from the area to the left of 110. You might have to draw some pictures to
convince yourself of this. Therefore,
( ) ( ) ( )
Try it and see what you get,
7. _________________
Finally, how would you find, ( ) The tool can only give you probabilities less than a number.
How about using the complement? Isn’t ( ) ( )
( ) ( )
REMEMBER, AREA EQUALS PROBABILITY.
This is the end of Unit 12. In class, you will get more practice with these
concepts by working exercises in MyMathLab.
Elementary Statistics
Triola, Elementary Statistics 11/e
50
Answers
1. 1.3333
2. 1.0
3. 1/6
4.
5.
6. The area of the histogram is equivalent to probability.
7.