18

Module 6 – Descriptive Statistics

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Module 6 – Descriptive Statistics
Page 2: Module 6 – Descriptive Statistics

Page 2© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

3

Statistical SamplingStatistical Sampling• Use a sample of data or observations to infer “the

truth” about the entire population

Population SampleGather sample

Statistical Inference

Discover facts about sampleInfer facts

about population

4

Sample of International CartelsSample of International Cartels

• Throughout this module we will be working with a data set of industries in which firms have been convicted, either in the United States or European Union, of international price-fixing conspiracies.

• In order to be included in this data set, the conspiracy had to occur during the 1990s.

• Why is this a sample and not the population of international cartels?

Page 3: Module 6 – Descriptive Statistics

Page 3© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

5

Discover Facts About the SampleDiscover Facts About the SampleKnowing What Questions To AskKnowing What Questions To Ask

• Excerpt from a speech on international price-fixing conspiracies prosecuted by the U.S. Department of Justice:

– “In July, our…Office…indicted three executives for their participation in the sorbates price-fixing and volume-allocation conspiracy. Sorbates are chemical preservatives used as mold inhibitors in high-moisture and high-sugar food products. This cartel once again proves that antitrust conspiracies are not unstable or ineffectual. The cartel lasted eighteen years and affected almost $1 billion in U.S. commerce.” (emphasis added)

• What would you like to know about cartels and their duration in order to assess the validity of this statement?

6

Cartel Data SampleCartel Data Sample1990s International Price1990s International Price--Fixing ConspiraciesFixing Conspiracies

Industry Duration Industry Duration(years) (years)

Aluminum Phosphide 1 Plastic Dinnerware 1Bromine 3 Ship Construction 4Carbon Cathode Block 2 Ship Transportation 5Cartonboard 5 Sodium Gluconate 2Cement 11 Sorbates 17Citric Acid 4 Stainless Steel 1Ferrosilicon 2 Steel Beam 6Fine Arts 6 Steel Heating Pipes 4Graphite Electrodes 5 Steel Tube 5Isostatic Graphite 9 Sugar 4Laminated Plastic Tubes 9 Thermal Fax Paper 1Lysine 3 Vitamins 9Maltol 6

Page 4: Module 6 – Descriptive Statistics

Page 4© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

7

Measures of Central TendencyMeasures of Central TendencySummarizing a Set of DataSummarizing a Set of Data

• One number can capture what is “typical” of the data

• Mean = Average = (Sum of Values)/(Number of Values)

• Median = middle or central value when data are sorted from low to high

• Mode = most common value(s) in the data• Note that a data set may have more than one mode

8

Measures of Central Tendency Measures of Central Tendency MeanMean

Let’s calculate these descriptive statistics for the cartel duration data set

Mean = (Sum of cartel durations)/(# of cartels)= 125/25= 5

So, the average cartel in this sample lasted for 5 years.

Median = ?

Mode = ? Have to rearrange data…see next slideHave to rearrange data…see next slide

Page 5: Module 6 – Descriptive Statistics

Page 5© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

9

Measures of Central Tendency Measures of Central Tendency MedianMedian

Aluminum Phosphide 1 Plastic Dinnerware 1 Stainless Steel 1Thermal Fax Paper 1Carbon Cathode Block 2Ferrosilicon 2Sodium Gluconate 2Bromine 3Lysine 3Citric Acid 4Ship Construction 4Steel Heating Pipes 4Sugar 4

Graphite Electrodes 5Steel Tube 5Ship Transportation 5Cartonboard 5Fine Arts 6Maltol 6Steel Beam 6Isostatic Graphite 9 Laminated Plastic Tubes 9 Vitamins 9Cement 11Sorbates 17

25 observations, so middle value is 25/2 = 12.5. Look for 25 observations, so middle value is 25/2 = 12.5. Look for 1313thth number in the series. number in the series. Median = 4.Median = 4.

Middle of the sampleMiddle of the sample

10

Measures of Central Tendency Measures of Central Tendency ModeMode

Aluminum Phosphide 1 Plastic Dinnerware 1 Stainless Steel 1Thermal Fax Paper 1Carbon Cathode Block 2Ferrosilicon 2Sodium Gluconate 2Bromine 3Lysine 3Citric Acid 4Ship Construction 4Steel Heating Pipes 4Sugar 4

Graphite Electrodes 5Steel Tube 5Ship Transportation 5Cartonboard 5Fine Arts 6Maltol 6Steel Beam 6Isostatic Graphite 9 Laminated Plastic Tubes 9 Vitamins 9Cement 11Sorbates 17

Mode = most common value: 3Mode = most common value: 3--way tie = 1 year, 4 years, 5 yearsway tie = 1 year, 4 years, 5 years

44

33

22

44

44

33

33

1111

Page 6: Module 6 – Descriptive Statistics

Page 6© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

11

Measures of Central Tendency Measures of Central Tendency ExerciseExercise

• Given the following data, find and interpret the mean and median:

Percentage change in real GDPUS Japan EU

2000a 5.9 3.6 3.62000b 2.7 -0.2 2.72001a 1.2 2.2 2.62001b 1.9 0.0 2.52002a 3.3 1.2 2.7MeanMeanMedianMedian

OECD Economic Outlook, No. 69, June 2001 “Summary of Projection”OECD Economic Outlook, No. 69, June 2001 “Summary of Projection” (seasonally adjusted at (seasonally adjusted at annual rates, where “a” refers to first six months of the year aannual rates, where “a” refers to first six months of the year and “b” refers to second six months).nd “b” refers to second six months).

12

Measures of Central Tendency Measures of Central Tendency Exercise: **Answers**Exercise: **Answers**

• Given the following data, find and interpret the mean and median:

Percentage change in real GDPUS Japan EU

2000a 5.9 3.6 3.62000b 2.7 -0.2 2.72001a 1.2 2.2 2.62001b 1.9 0.0 2.52002a 3.3 1.2 2.7MeanMean 3.0 1.36 2.82MedianMedian 2.72.7 1.21.2 2.72.7

Detailed calculations are shown on the following slides.Detailed calculations are shown on the following slides.

Page 7: Module 6 – Descriptive Statistics

Page 7© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

13

Measures of Central Tendency Measures of Central Tendency Exercise: **Answers**Exercise: **Answers**

• Mean %∆GDP for US = (5.9 + 2.7 + 1.2 + 1.9 + 3.3)/5

= 3.0

• Mean %∆GDP for Japan = (3.6 – 0.2 + 2.2 + 0 + 1.2)/5

= 1.36

• Mean %∆GDP for EU = (3.6 + 2.7 + 2.6 + 2.5 + 2.7)/5

= 2.82

14

Measures of Central Tendency Measures of Central Tendency Exercise: **Answers**Exercise: **Answers**

• Median %∆GDP for US: Rank from low to high and find the middle value…

1.2, 1.9, 2.72.7, 3.3, 5.9

• Median %∆GDP for Japan:

-0.2, 0, 1.21.2, 2.2, 3.6

• Median %∆GDP for EU:

2.5, 2.6, 2.72.7, 2.7, 3.6

Page 8: Module 6 – Descriptive Statistics

Page 8© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

15

Measures of Dispersion in the DataMeasures of Dispersion in the DataVarianceVariance

• You can’t describe everything about a data set with one summary number. You need more…

• The range measures how spread out the data are. – Cartel duration ranges from 1 year to 17 years

in our sample.• The variance measures how tightly clustered the

observations are around the mean. A small variance implies the observations lie in a narrow range, while a large variance implies that the observations are spread out.

16

Measures of Dispersion in the DataMeasures of Dispersion in the DataVarianceVariance

Example: 6 months of Stock Returns (%)Stock A Stock B0.35 1.750.50 4.501.25 10.00

-1.00 -8.003.20 -3.001.70 0.75

MEANMEAN 1.001.00 1.001.00But Stock B is clearly more volatile than Stock A. The mean alone cannot describe investment return. Variance is often used to measure the risk of a stock or stock portfolio, since itdescribes dispersion of returns around the mean.

Page 9: Module 6 – Descriptive Statistics

Page 9© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

17

Measures of Dispersion in the DataMeasures of Dispersion in the DataVarianceVariance

Computing the Sample Variance• Find the mean of the data.• Find the deviation of each value from the mean.• Square the deviations.

– Why? Recall that the mean duration is 5 years. Consider two cartels:

»» Stainless Steel: 1 year durationStainless Steel: 1 year duration»» Laminated Plastic Tubes: 9 year durationLaminated Plastic Tubes: 9 year duration» If we just looked at the difference from the

mean, we would have (1-5) = -4 and (9-5) = 4. The total deviation would be –4 + 4 = 0. But that’s not right. So we square the deviations before adding, e.g., (-4)2 + 42 = 32.

18

Measures of Dispersion in the DataMeasures of Dispersion in the DataVarianceVariance

Computing the Sample Variance

• Find the mean of the data.

• Find the deviation of each value from the mean.

• Square the deviations.

• Sum the squared deviations.

• Divide the sum by n-1.

Page 10: Module 6 – Descriptive Statistics

Page 10© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

19

Measures of Dispersion in the DataMeasures of Dispersion in the DataVarianceVariance

Cartel data…(using original table)…Var = (1 – 5)2 + (3 – 5)2 + (2 – 5)2 + … + (4 – 5)2 + (1 – 5)2 + (9 – 5)2

(25 – 1)= 13.92

Industry DurationAluminum Phosphide 1Bromine 3Carbon Cathode Block 2***Sugar 4Thermal Fax Paper 1Vitamins 9

20

Measures of Dispersion in the DataMeasures of Dispersion in the DataStandard DeviationStandard Deviation

• How do we interpret the variance of 13.92?» Not easily, since it has units of (years)2

• Use standard deviation instead» Standard Deviation = Square root of Variance

» It also measures how spread out the data are around their mean, but the units are the same as the original units of the data.

Standard Deviation = (13.92)1/2 = 3.73

» Since the duration data is in years then the interpretation is that one standard deviation from the mean is 3.73 years.

Page 11: Module 6 – Descriptive Statistics

Page 11© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

21

Measures of DispersionMeasures of DispersionExerciseExercise

• Given the following data, find the sample variance and standard deviation:

Percentage change in real GDPUS Japan EU

2000a 5.9 3.6 3.62000b 2.7 -0.2 2.72001a 1.2 2.2 2.62001b 1.9 0.0 2.52002a 3.3 1.2 2.7VarianceVarianceStd DevStd Dev

22

Measures of DispersionMeasures of DispersionExercise: **Answers**Exercise: **Answers**

• Given the following data, find the sample variance and standard deviation:

Percentage change in real GDPUS Japan EU

2000a 5.9 3.6 3.62000b 2.7 -0.2 2.72001a 1.2 2.2 2.62001b 1.9 0.0 2.52002a 3.3 1.2 2.7VarianceVariance 3.263.26 2.512.51 0.200.20Std DevStd Dev 1.811.81 1.581.58 0.450.45

Detailed calculations are shown on the following slides.Detailed calculations are shown on the following slides.

Page 12: Module 6 – Descriptive Statistics

Page 12© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

23

Measures of Dispersion Measures of Dispersion Exercise: **Answers**Exercise: **Answers**

• Variance: Find the mean of the data. Find the deviation of each value from the mean. Square the deviations. Sum the squared deviations. Divide the sum by n-1.

• US: Mean = 3.0

U.S. Subtract Mean Square2000a 5.9 5.9 – 3 = 2.9 8.412000b 2.7 2.7 – 3 = -0.3 0.092001a 1.2 1.2 – 3 = -1.8 3.242001b 1.9 1.9 – 3 = -1.1 1.212002a 3.3 3.3 – 3 = 0.3 0.09

VarianceVariance = 13.04/(5-1) = 3.26 ⇒ Standard Deviation = 1.81One standard deviation from the mean for the US is 1.81%.

Sum = 13.04Sum = 13.04

24

Measures of Dispersion Measures of Dispersion Exercise: **Answers**Exercise: **Answers**

• Japan: Mean = 1.36

Japan Subtract Mean Square2000a 3.6 3.6 – 1.36 = 2.24 5.022000b -0.2 -0.2 – 1.36 = -1.56 2.432001a 2.2 2.2 – 1.36 = 0.84 0.712001b 0.0 0.0 – 1.36 = -1.36 1.852002a 1.2 1.2 – 1.36 = -0.16 0.03

VarianceVariance = 10.04/(5-1) = 2.51 ⇒ Standard Deviation = 1.58One standard deviation from the mean for Japan is 1.58%.

Sum = 10.04Sum = 10.04

Page 13: Module 6 – Descriptive Statistics

Page 13© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

25

Measures of Dispersion Measures of Dispersion Exercise: **Answers**Exercise: **Answers**

• EU: Mean = 2.82

EU Subtract Mean Square2000a 3.6 3.6 – 2.82 = 0.78 0.612000b 2.7 2.7 – 2.82 = -0.12 0.012001a 2.6 2.6 – 2.82 = -0.22 0.052001b 2.5 2.5 – 2.82 = -0.32 0.102002a 2.7 2.7 – 2.82 = -0.12 0.01

VarianceVariance = 0.78/(5-1) = 0.20 ⇒ Standard Deviation = 0.45This is the smallest standard deviation of the three, indicating

the most stable (least volatile) GDP growth over this period.

Sum = 0.78Sum = 0.78

26

Normal DistributionNormal Distribution• If you have a large sample of observations and you count

the frequency with which each observation occurs (e.g., 4 cartels last one year, 3 cartels that last two years, and so on), you will often find that the frequency follows a normal or bell-shaped distribution.

• The mean, median, and mode are equal. The curve reaches its highest point at the mean and it is symmetric around the mean.

MeanMeanValue of XValue of X

Frequency of XFrequency of X

Page 14: Module 6 – Descriptive Statistics

Page 14© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

27

Normal DistributionNormal Distribution• A bell-shaped curve can change shape as the mean

and variance change. • The graph on the left has observations that are

tightly clustered around the mean (small variance), while the graph on the right shows observations that are much more spread out (large variance).

MeanMeanMeanMean

Small VarianceSmall Variance Large VarianceLarge Variance

28

Normal DistributionNormal DistributionEmpirical Rule: Interpreting Standard DeviationEmpirical Rule: Interpreting Standard Deviation

• If you have data from a bell-shaped distribution, then– Roughly 68% of data should be within 1 standard

deviation of the mean – Roughly 95% of data should be within 2 standard

deviations of the mean– Almost all the data (99.7%) should be within 3

standard deviations of the mean

MeanMean

2 Standard2 StandardDeviationsDeviations

2 Standard2 StandardDeviationsDeviations

Page 15: Module 6 – Descriptive Statistics

Page 15© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

29

Normal DistributionNormal DistributionEmpirical Rule: Interpreting Standard DeviationEmpirical Rule: Interpreting Standard Deviation

Data from international cartel sample• Mean = 5, Std Dev = 3.73• 5 +/- 3.73 = (1.27, 8.73)

• 64% of our data is in this range (16 of 25 cartels)

• 5 +/- 2(3.73) = (-2.46, 12.46)• 96% of our data is in this range (24 of 25 cartels)

• 5 +/- 3(3.73) = (-6.19, 16.19)• 96% of our data is in this range (24 of 25 cartels)

So… the 17So… the 17--year year sorbatessorbates cartel is an cartel is an outlieroutlier

30

Normal Distribution: Empirical RuleNormal Distribution: Empirical RuleExerciseExercise

• Apply the empirical rule for standard deviation for a normal distribution to the following data and interpret your answer:

Percentage change in real GDPUS

2000a 5.92000b 2.72001a 1.22001b 1.92002a 3.3MeanMean 3.03.0Std DevStd Dev 1.811.81

Page 16: Module 6 – Descriptive Statistics

Page 16© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

31

Normal Distribution: Empirical RuleNormal Distribution: Empirical RuleExercise: **Answers**Exercise: **Answers**

• Apply the empirical rule for standard deviation for a normal distribution to the following data and interpret your answer:

Percentage change in real GDPUS

2000a 5.92000b 2.72001a 1.22001b 1.92002a 3.3MeanMean 3.03.0Std DevStd Dev 1.811.81

• 3 +/- 1.81 = (1.19, 4.81)

80% of the data is in this range (4 of 5)

• 3 +/- 2(1.81) = (-0.62, 6.62)

100% of the data is in this range

32

Descriptive Sample Statistics Descriptive Sample Statistics Summation NotationSummation Notation

• It is often convenient to use the summation symbol, Σ, as shorthand notation for summing a list of numbers.

• Given observations X1, X2, …, Xn where n is the number of observations (e.g., n = 25 in the case of the cartel data set), we can denote the sum of the X’s as:

∑=

n

iiX

1= X1 + X2 + X3 + … + Xn

Page 17: Module 6 – Descriptive Statistics

Page 17© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

33

Descriptive Sample Statistics Descriptive Sample Statistics Summary Using Summation NotationSummary Using Summation Notation

• Observations: X1, X2, …, Xn

• Sample mean:

• Sample variance:

• Sample standard deviation:

n

X

nXXXX

n

ii

n∑

==+++= 121 )( L

2SS =

Note: For population variance, divide by N Note: For population variance, divide by N (the size of the population), rather than by n(the size of the population), rather than by n--1.1.

2

1

2 )(1

1 ∑=

−−

=n

ii XX

nS

34

ApplicationApplicationReporting Physician Income Data in the NewsReporting Physician Income Data in the News

• Every year physicians belonging to the American Medical Association (AMA), fill out a Socioeconomic Monitoring Survey (SMS). The survey has several questions about physician income.

• The AMA historically provided copies of the income tables when requested by the media. This created a great deal of controversy.

• The problem was that news reporters were taking selected income information from the survey results and reporting it in a way that did not give the full picture of variations in physician income.

Page 18: Module 6 – Descriptive Statistics

Page 18© 2002 The Regents of The University of MichiganAll Rights Reserved. No duplication without prior, written consent.

Quantitative Skills WorkshopModule 6 – Descriptive Statistics

35

ApplicationApplicationReporting Physician Income Data in the NewsReporting Physician Income Data in the News

• According to an article in American Medical News, (November 15, 1999), an AMA study of this problem concluded that “few [news] articles take into account the wide variation in earnings across specialties and the fact that many reporters use an average salary across the board instead of the median – the salary number at which half of the physicians make more and half make less – which [we] believe gives a truer picture of physician income.”

• The AMA pared down what it distributed to the media and provided only a 3-page fact sheet with relevant descriptive statistics. The strategy worked. Reporters used median income rather than the mean in their news coverage.

• One of the physicians involved in the revamping said “the idea of proactive news management is a good one.”