7
Chapter 3, part C

Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

Embed Size (px)

Citation preview

Page 1: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

Chapter 3, part C

Page 2: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

III. Uses of means and standard deviations

Of course we don’t just calculate measures of location and dispersion just because we can, they have very important uses.

Page 3: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

A. Z-scores

• A z-score measures the relative location of an item in the data set.

• It also measures the number of standard deviations an observation lies from the mean.

s

xxz ii

For example, the airline price of $175 has a z-score=(175-219)/45.47 = -.97. This means that a price of $175 falls almost one standard deviation below the mean.

Page 4: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

B. Chebyshev’s Theorem

Chebyshev’s: At least (1-1/k2) of the items in a data set must be within k standard deviations from the mean, where k is any value greater than 1.

In other words, the theorem tells us the % of items that must be within a specified number of standard deviations from the mean.

Page 5: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

Implications

If k=2, at least 75% of the data lie within s=2 of the mean.

How? (1-1/4)=.75 or 75%.

If k=3, this fraction rises to 89% of the data.

If k=4, this fraction rises to 94% of the data.

Example: A microeconomics exam has a mean of 72 with a standard deviation of 4. What % of the class falls between 64 and 80 on their exam?

Calculate the z-scores for both 64 and 80 to find k and then use Chebyshev’s theorem to answer the question.

Page 6: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

C. The Empirical Rule

If the data are distributed normally (bell-shaped), the empirical rule tells us that:

• Approximately 68% of the data will be within s=1 of the mean.

95% of the data will be within s=2 of the mean. all of the data will be within s=3 of the mean.

Page 7: Chapter 3, part C. III. Uses of means and standard deviations Of course we don’t just calculate measures of location and dispersion just because we can,

D. Detecting Outliers

• The empirical rule says that almost all observations will fall within s=3 of the mean.

• Thus, if an observation has a z-score of greater than 3 (in absolute value), it may be considered an outlier.

• What to do about an outlier? If it’s a case of an erroneous value (i.e. a typo), try to correct it. If it’s valid data, arguments can be made (for and against) dropping it from the sample.