Upload
kory-morton
View
219
Download
0
Embed Size (px)
Citation preview
Chapter 3, part C
III. Uses of means and standard deviations
Of course we don’t just calculate measures of location and dispersion just because we can, they have very important uses.
A. Z-scores
• A z-score measures the relative location of an item in the data set.
• It also measures the number of standard deviations an observation lies from the mean.
s
xxz ii
For example, the airline price of $175 has a z-score=(175-219)/45.47 = -.97. This means that a price of $175 falls almost one standard deviation below the mean.
B. Chebyshev’s Theorem
Chebyshev’s: At least (1-1/k2) of the items in a data set must be within k standard deviations from the mean, where k is any value greater than 1.
In other words, the theorem tells us the % of items that must be within a specified number of standard deviations from the mean.
Implications
If k=2, at least 75% of the data lie within s=2 of the mean.
How? (1-1/4)=.75 or 75%.
If k=3, this fraction rises to 89% of the data.
If k=4, this fraction rises to 94% of the data.
Example: A microeconomics exam has a mean of 72 with a standard deviation of 4. What % of the class falls between 64 and 80 on their exam?
Calculate the z-scores for both 64 and 80 to find k and then use Chebyshev’s theorem to answer the question.
C. The Empirical Rule
If the data are distributed normally (bell-shaped), the empirical rule tells us that:
• Approximately 68% of the data will be within s=1 of the mean.
95% of the data will be within s=2 of the mean. all of the data will be within s=3 of the mean.
D. Detecting Outliers
• The empirical rule says that almost all observations will fall within s=3 of the mean.
• Thus, if an observation has a z-score of greater than 3 (in absolute value), it may be considered an outlier.
• What to do about an outlier? If it’s a case of an erroneous value (i.e. a typo), try to correct it. If it’s valid data, arguments can be made (for and against) dropping it from the sample.