15
Using the 68-95-99.7 Rule Normal Quantile Plots

Using the 68-95-99.7 Rule Normal Quantile Plots

  • Upload
    harris

  • View
    50

  • Download
    3

Embed Size (px)

DESCRIPTION

Using the 68-95-99.7 Rule Normal Quantile Plots. Learning Objectives. By the end of this lecture, you should be able to: Do various calculations involving areas under the density curve using the 68-95-99.7 rule - PowerPoint PPT Presentation

Citation preview

Page 1: Using the 68-95-99.7 Rule Normal Quantile Plots

Using the 68-95-99.7 RuleNormal Quantile Plots

Page 2: Using the 68-95-99.7 Rule Normal Quantile Plots

Learning Objectives

By the end of this lecture, you should be able to:

– Do various calculations involving areas under the density curve using the 68-95-99.7 rule

– Identify the mathematical technique used to help confirm (though not guarantee!) that our distribution is indeed Normal.

Page 3: Using the 68-95-99.7 Rule Normal Quantile Plots

A few numbers worth memorizing(though not just yet)

• Because we encounter the Normal distribution so much, it is worth memorizing the approximate areas from the Normal table that correspond to a few different z-scores.

– I say approximate, because the values are rounded off.

• Look at the areas shown here – but don’t memorize them just yet.z = -2 about 2.2%z = -1 about 16%z = +1 about 84%z = +2 about 98%

• What I do want you to memorize are the 3 numbers shown in a famous ‘rule’ on the next slide.

Page 4: Using the 68-95-99.7 Rule Normal Quantile Plots

The 68-95-99.7% Rule for Normal Distributions

This is essentially a “shortcut” for a mental ballpark

of the areas under the normal curve. It is definitely

worth memorizing.

The area between -1 and +1 standard deviations

corresponds to about 68% of the observations.

The area between -2 and +2 standard deviations

corresponds to about 95% of the observations.

The area between -3 and +3 standard deviations

corresponds to about 99.7% of the observations.

You WILL be asked to use these numbers on quizzes and exams. Please note that on your exams you will not be provided with the three numbers (68, 95, 99.7).

Page 5: Using the 68-95-99.7 Rule Normal Quantile Plots

The z=0 line (black line) is very helpful in doing many of these calculations.

Now let’s play around with these numbers by

answering some questions. All numbers refer to z-scores

(i.e. standard deviations):•What percentage of observations lie between -1 and +1?

•Answer: As we just discussed, the number of observations

between -1 and +1 standard deviations is 68%.

•What percentage lie between 0 and +1?

•Answer: If -1 to +1 is 68%, then 0 to +1 is half of that, which is

34%.

• This is an important step to understand. Make

sure you are clear on this point before moving on.

• There are a few ways to think of it: Look at the

area between z=0 (the black line) and z=+1. Note

that is is half of the area between -1 and +1.

• If you need to visualize it, then shade in the area

between z=0 and z=+1.

•What percentage of observations lie below +1?

•Answer: To do this, look at your z=0 line. Make sure you

recognize that the area to the left of z=0 represents 50% of

observations. Now, how many observations are between 0 and

+1? Recall from the previous question that this is

34%.Therefore, from 0 to +1 = 34, and below 0 is 50, so the

area to the left of +1 represents 84% of observations.

Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions

Page 6: Using the 68-95-99.7 Rule Normal Quantile Plots

More examples:•What percentage of observations lies between -2 and +1?

•Answer: Use your midline! I would solve this by adding the area

between -2 and 0 (half of 95%) to the area between 0 and +1 (half of

68%) 47.5%+ 34% = 81.5%

•What percentage of observations lies between 0 and +3?

•Answer: Half of the area between -3 and +3 (99.7) which is 49.85%.

•What percentage of observations lies below -2?

•Answer: While this too can be answered in a few different ways, I

would like you to make sure you can do it this way:

• Look at the area between -2 and +2. Our ‘shortcut’ tells us

that this contains 95% of observations.

• This means that the area above +2 and below -2 together

compromise 5% of observations. So the area above +2 =

2.5% of observations, and the area below -2 also comprises

2.5% of observations.

• Answer: 2.5%

•What percentage of observations lies above +3?

•Answer: Use the same technique as was just discussed:

• Between -3 and +3 makes up 99.7.

• Therefore below -3 and above +3 makes up 0.3%.

• Therefore below -3 is 0.15% and above +3 = 0.15%

Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions

Page 7: Using the 68-95-99.7 Rule Normal Quantile Plots

One more!•What percentage of observations lies below +2 standard deviations?

•Answer: Repeat the process from before to determine the area on either

side of +2 and -2. That value was 2.5%. If 2.5% of values lie above +2, then

97.5% of observations lie below it.

• Answer: 97.5%

Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions

Page 8: Using the 68-95-99.7 Rule Normal Quantile Plots

mean µ = 64.5 standard deviation = 2.5

N(µ, ) = N(64.5, 2.5)

The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions

• What percentage of women are between

62 and 67 inches tall?

• Answer: Corresponds to -1 to +1 SDs, that

is, about 68%

• What is the range of heights between

which about 95% of women fall?

• Answer: About -2 to +2 SDs, so, about 59.5

to 69.5 inches tall.

• What is the range of heights between

which nearly all (over 99%) of women fall?

• Answer: A quick answer would simply to

pick the -3 to +3 SD range (57-72).

Inflection point

Page 9: Using the 68-95-99.7 Rule Normal Quantile Plots

mean µ = 64.5 standard deviation = 2.5

N(µ, ) = N(64.5, 2.5)

More Examples:

•What percentage are taller than 67

inches?

•Answer: If 68% of all women are

between 62 and 67 inches tall, this

means that 32% are outside of that

range. In other words, 16% are

shorter than 62 inches, and 16% are

taller than 67.

•What percentage are shorter than

59.5 inches?

•Answer: If 95% of all women are

between 59.5 and 69.5”, then 5% are

outside of that range. In other words,

2.5% are shorter than 59.5 and 2.5%

are taller than 69.5”.

Inflection point

The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions

Page 10: Using the 68-95-99.7 Rule Normal Quantile Plots

Shortcut Rule or Z-Table?• Students have often been confused as to which should be used.

• Whenever possible, use your z-table as you will get a much more accurate result. In particular, if you are given z-scores that are not anywhere near whole numbers (e.g. 2.332), then there is no shortcut to use! The shortcut can only be used with whole (integer) numbers between -3 and +3.

• The main purpose of learning the ‘shortcut’ rule (in addition to the fact that they come up on all kinds of exams), is to encourage you develop an understanding of what you are trying to do rather than just jumping to calculators and z-tables.

• For this course, you will be asked to do both.

Page 11: Using the 68-95-99.7 Rule Normal Quantile Plots

Is the distribution truly Normal?

• Deciding whether data does indeed show a Normal (or, close to Normal) distribution is a very important question.

• All the examples we’ve been discussing involving z-scores assume that the data is Normal. If the distribution of the data was not Normal, all of our answers and calculations would be flawed.

• Recall that there are many other types of distributions that are not Normal. Some examples include skewed, bimodal, Binomial (later in the quarter), Poisson, etc, etc

• Each type of distribution has its own characteristic formulas, calculations, inference techniques, etc. Again, because the Normal distribution is one of the most commonly encountered distributions, we will spend lots of time discussing it.

• So how to you decide if a distribution is Normal? • You might be tempted to say “look at a graph”. And this is not entirely false: When examining data,

a chart is a great — often the best — place to start! • However, as humans, we are easily fooled. There are many histograms (and related density curves)

that look Normal, but in fact, are not. • Fortunately, we do have a statistical test that can support (thought not guarantee) that our dataset

does indeed appear to be Normal.

Page 12: Using the 68-95-99.7 Rule Normal Quantile Plots

The Normal Quantile plot is a graph that helps us determine if a distribution is indeed

Normal

It is a mathematical plot that we can create using our statistical software package of

choice.

Here is the method (which is provided for interest only):

1. The data points are ranked and the percentile ranks are converted to z-scores

with Table A. The z-scores are then used for the x axis against which the data

are plotted on the y axis of the normal quantile plot.

2. If the distribution is indeed normal the plot will show a fairly straight line,

indicating a good match between the data and a normal distribution.

3. Systematic deviations from a straight line indicate a non-normal distribution.

Outliers appear as points that are far away from the overall pattern of the plot.

Normal Quantile Plot

Page 13: Using the 68-95-99.7 Rule Normal Quantile Plots

Normal quantile plots are complex to do by hand, but they are standard features in most

statistical software.

Normal Quantile Plot shows a

good fit to a straight line: the

distribution of rainwater pH

values is close to normal.

Normal quantile plot is not a

straight line. This tells us that the

data do not follow a Normal

distribution.

Page 14: Using the 68-95-99.7 Rule Normal Quantile Plots

The normal quantile test supports normality, but does NOT guarantee it!

• Two key points here: – If the plot is NOT straight, then your data is NOT normal!– If the plot IS straight, then you have supported the idea that your

dataset is normal. However, you have NOT guaranteed it!

• This concept (confirming / supportive tests) will come up with various other statistical concepts down the road. Whenever you encounter them, you should be sure to make use of them.

Page 15: Using the 68-95-99.7 Rule Normal Quantile Plots