Upload
diana-hrab
View
23
Download
0
Embed Size (px)
DESCRIPTION
Descriptive Statistics PPT from the course
Citation preview
13. The Methodology of Descriptive StatisticsThe purpose of a descriptive statistical investigation is clarify a number of characteristics of a given variable measures in time or as a cross section
A descriptive statistical analysis consists of: Setting up a histogram (or a time series plot) Calculating descriptive statistics
Measures of location and position
Inspection for outliers Classification of the distribution of the examined data set
The range of statistical techniques utilized have not provided us with anything more than we would have got by taking the [...] variables and looking at their graphs
Statistics EUS & Negot Chinese 1
In statistics, we consider the following types of data:
Cross-section:Many sectors/categories/regions at a given point in time
Time series:One sector/category/regions over a period of time e.g. a year
Panel:A combination of times series and cross section
Census:Statistics provided through a questionnaire
Statistics EUS & Negot Chinese 2
24. Histogram A histogram displays classification into intervals of a
quantitative variable The horizontal axis (x-axis) is the interval scale The vertical axis (y-axis) is used to display the frequency
Data set with 20 observations of incomes in 1,000 DKK
Ranked
Statistics EUS & Negot Chinese 3
9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17
6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24
How can the data set be divided into some efficient categories or groups?
Ad hoc method:
More mathematical approach: 2k=n where k is the number of categories
Statistics EUS & Negot Chinese 4
Below 5 6 to 10 11 to 15 16 to 20 21 or more TotalNumber 20Frequency 0 3 5 9 3 20
Relative % 0 0.15 0.25 0.45 0.15 1.00
Cumulative % 0 0.15 0.40 0.85 1.00
10.5 to 15 16 to 15 15 to 19.5 19.5 to 24 TotalObservations 20Frequency 3 5 8 4 20Relative % 0.15 0.25 0.40 0.20 1.00Cumulative % 0.15 0.40 0.80 1.00
3Statistics EUS & Negot Chinese 5
0
1
2
3
4
5
6
7
8
9
10
Under 5 5 to 10 11 to15 16 to 20 Over 20
Frequency
Interval (1,000 DKK)
Monthly Income
Construction of a Histogram by use of Excel
Statistics EUS & Negot Chinese 6
4A Special Histogram
Age: 0 to 4 5 to 14 15 to 29 30 to 49 50 to 69 70 or more Total Persons, mill 116.60 196.90 350.50 283.10 147.90 36.80 1131.90Persons, % 10.30 17.40 30.97 25.01 13.07 3.25 100.00 Units of 5 years 1 2 3 4 4 [4]* 18 % units of 5 years 10.30 8.70 10.32 6.25 3.27 0.81 *=assumed Using data from the first part of the table the following graph can be drawn:
0,005,00
10,0015,0020,0025,0030,0035,00
0 to 4 5 to 14 15 to 29 30 to 49 50 to 69 70 or more
Percent
Age
Population China 1990
Statistics EUS & Negot Chinese 7
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85plus
PopulationChina1.7.1990
Statistics EUS & Negot Chinese 8
Age, year 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 >85 Person,% 10.3 8.7 8.7 10.3 10.3 10.3 6.25 6.25 6.25 6.25 3.3 3.3 3.3 3.3 0.8 0.8 0.8 0.8 0.8
55. Measures of LocationMost frequent or typical observation
Sample mean (MB page 26)Modus or Mode (MB page 32)Median (MB page 31)Geometric Mean (MB page 57)Relation among the mean, mode and medianQuartiles and Perentiles
Statistics EUS & Negot Chinese 9
The meanUses information from all observations
Man
From the example:
Grouped data set:
Statistics EUS & Negot Chinese 10
6Example of Grouped data set on GradesExam in the course International Economics that was held in February 2011 at the BA-int study in Flensburg
Grouped mean:
Modus or ModeThis is the most common observed observation (highest frequency)
Income data example mode = 16Grouped data examplemode = 7
Statistics EUS & Negot Chinese 11
Grades of passed (7-point DK scale) 2 4 7 10 12 Total Frequency 10 26 33 19 4 92
MedianThe middlemost observation:
Median = 0.50(n + 1) ordered position0.50(20+1) = 10.5 ordered observation = 16
Example with grades: At the 46.5 ordered obs. = 7
Important measure because it is not sensitive with regard to outliers
Statistics EUS & Negot Chinese 12
Data 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24 Frequency .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 Cumulative .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
7Sum function:
Statistics EUS & Negot Chinese 13
Dealing with symmetry
Statistics EUS & Negot Chinese 14
8Summing upSymmetry: M0 = Md = Skewed to the right: M0 < Md < (bulk of data left)Skewed to the left: < Md < M0 (bulk of data right)
Income data set: = 15.85 < M0 = 16 and Md = 16 data is skewed to the left
Grade data set: = 6.45 < Mo = 7 and Md = 7 data is skewed to the left
Statistics EUS & Negot Chinese 15
Quartiles and Percentiles
Quartile = q(n+1) ordered position
Percentile = p(n+1) ordered position
5-point summary:1st decil is 0.10-percentileLower quartile is 0.25-percentile (called Q1)Median is 0.50-percentileUpper quartile is 0.75-percentile (called Q3)9th decil is 0.90-percentile
Statistics EUS & Negot Chinese 16
9Example
10: (20+1)(10/100) = 2.10 observations appears at = 9.1025: (20+1)(25/100) = 5.25 observations appears at = 13.7550: (20+1)(50/100) = 10.50 observations appears at = 16.0075: (20+1)(75/100) = 15.75 observations appears at = 18.2590: (20+1)(90/100) = 18.90 observations appears at = 21.90
Statistics EUS & Negot Chinese 17
Data 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24 Frequency .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 Cumulative .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Geometric (multiplicative) MeanDefined as:
The geometric mean is always smaller than the arithmic mean
Example:
Statistics EUS & Negot Chinese 18
10
6. Measures of DispersionRange, inter quartile range, decil range and Box-plotVariance and standard deviationCoefficient of variationSkewness and kurtosis
Range = maximum minimumQuartile range = Q3 Q1 = 50 % of obs.Decil range = D9 D1 = 80 % of obs.
Statistics EUS & Negot Chinese 19
Box-plotA Box-plot is used in order to identify outliersOutlier: obs. more than 3 times the IRQ away from Q1 and Q3 Suspected outlier: obs. more than 1.5 (but less than 3) IRQ away from Q1 and Q3
For our little data set we get
(supected) Outlier Q1 Median Q3
BoxPlot
0 5 10 15 20 25 30
Statistics EUS & Negot Chinese 20
11
Lower inner fence:Q1 1.5IQR = 13.75 1.5(4.5) = 7.00
Lower outer fence:Q1 3.0IQR = 13.75 3.0(4.5) = 0.25
Upper inner fence:Q3 + 1.5IQR = 18.25 + 1.5(4.5) = 25.50
Upper outer fence:Q3 + 3.0IQR = 18.25 + 3.0(4.5) = 32.25
Statistics EUS & Negot Chinese 21
Variance and Standard DeviationMake use of all observations
or
Example on data set for incomes
Statistics EUS & Negot Chinese 22
12
Grouped data set
Example:
Statistics EUS & Negot Chinese 23
The Coefficient of Variation:Gives the relative dispersionRecommended for comparisons of different data sets
If the distribution has large variation (is very flat) then CVtakes a large value.If the distribution has small variation (is very steep) then CVtakes a small value.
Statistics EUS & Negot Chinese 24
13
Some examples:
SK > 0: RightSK = 0: SymmetrySK < 0: Left
KU large: DensityKU low: Uniform
Statistics EUS & Negot Chinese 25
7. Descriptive statistics on a Computer or Calculator
Use of ExcelUse of MegastatUse of pocket calculator
Statistics EUS & Negot Chinese 26
14
8. Descriptive Statistics in a Grouped Data Sets
Statistics EUS & Negot Chinese 27
More complex data set for the distribution of income, Denmark Disposal house hold incomes, Denmark, 1987 i
Interval for incomes 1,000 DKK
Number of households,
1,000
Mean income 1,000 DKK
Income mass Mio. DKK
Deviation
Square
fi xi fixi (xi ) (xi )2 fi(xi )2 1 2 3 4 5 6 7 8
0 50
100 150 200 250 300 400
- 49.9 - 99.9
- 149.9 - 199.9 - 249.9
299.9 399.9
-
146 590 414 323 325 210 139 55
36.9 73.2
123.7 175.1 225.9 273.6 340.6 548.3
5,387 43,202 51,224 56,568 73,435 57,446 47,339 30,156
-128.7 -92.4 -41.9
9.5 60.3
108.0 175.0 382.7
16563.69 8537.76 1755.61
90.25 3636.09
11664.00 30625.00
146459.29
2418298 5036983 726822 29151
1181729 2449440 4256875 8055261
Sum 2,202 364,757 24154559 Source: Statistics Denmark, Annual Statistical Review, 1994, page 220-221.
Mean and Standard Deviation
Statistics EUS & Negot Chinese 28
Mean and Standard Deviation There are 8 categories i.e. k = 8. By insertion in the formulas:
Mean: 6.165648,165202,2757,3641 DKK
nxfk
i ii
Standard deviation: 73.104202,2
559,154,24)(
1
2
n
xfk
iii
15
Histogram, Quartiles, Median and Box-plotConsider the relative and cumulative distribution of data
Statistics EUS & Negot Chinese 29
Disponible husstandsindkomster, Danmark, 1987 i
Interval for incomes 1,000 DKK
Number of households,
1,000
Number of households
frequency, %
Cumulative frequency, %
fi fi/n1 2 3 4 5 6 7 8
0 50
100 150 200 250 300 400
- 49.9 - 99.9
- 149.9 - 199.9 - 249.9
299.9 399.9
-
146 590 414 323 325 210 139 55
6.6 26.8 18.8 14.7 14.8
9.5 6.3 2.5
6.6 33.4 52.2 66.9 81.7 91.2 97.5
100.0
Sum 2,202 100.0 Source: Statistics Denmark, Annual Statistical Review, 1994, page 220-221
Histogram
Distribution Income, Denmark, 1987
0,00
5,00
10,00
15,00
20,00
25,00
30,00
0 - 49 50 - 99 100 -149
150 -199
200 -249
250 -299
300 -349
350 -399
Above400
%
Statistics EUS & Negot Chinese 30
16
Sum Function
Statistics EUS & Negot Chinese 31
How to do the interpolation
We use a formula for example given as:
Value = End value interval """"
pctpercentinwidthTotalfractiletorelativelongtoo interval width in value
Illustration: Frequency % 52.2 50 33.4 100 ? 149 income (1,000 DKK)
Statistics EUS & Negot Chinese 32
17
Median: 149,144851,5000,150000,508.18
)502.52(000,150 Similarly for the other quartiles and deciles:
Lower quartile: 328,84000,508.26
)254.33(000,100 (Q1)
Upper quartile: 365,227000,508.14
)757.81(000,250 (Q3)
Lower decile: 343,56000,508.26
)104.33(000,100
Upper decile: 684,293000,505.9
)902.91(000,300
Statistics EUS & Negot Chinese 33
Inter Quartile Range (IQR): (Q3Q1) = 227,365 84,328 = 143,037 Lower inner fence: Q1 1.5IQR = 84,328 1.5(143,037) = 130,228 Lower outer fence: Q1 3.0IQR = 84,328 3.0(143,037) = 344,783 Upper inner fence: Q3 + 1,5IQR = 227,365 + 1.5(143,037) = 441,921 Upper outer fence: Q3 + 3.0IQR = 227,365 + 3.0(143,037) = 656,476 Box-plot
300 200 100 0 100 200 300 400 500 600
LOF = 345 LIF = 130 Q1=84 M=144 Q3=227 UIF = 442 UOF = 656
Statistics EUS & Negot Chinese 34
18
9. Descriptive Statistics an Example of Outliers
Outliers are extremes
Outliers make distributions non-normal Outliers changes the mean, standard deviation and skewness
However, the median remains constant
Statistics EUS & Negot Chinese 35
Basic Max=34 Max=44 Max=54 Mean 15.85 16.35 16.85 17.35 Increases Standard Error 1.00 1.29 1,69 2.13 Median 16 16 16 16 Constant!! Modus / Mode 16 16 16 16 Standard deviation 4.46 5.79 7.56 9.52 Sample variance 19.92 33.50 57.08 90.66 Kurtosis 0.12 3.88 8.99 12.55 Skewness -0.35 1.19 2.43 3.16 Increases Range 18 28 38 48 Minimum 6 6 6 6 Maximum 24 34 44 54 Sum 317 327 337 347 Observations 20 20 20 20 Confidence interval(95 %) 2.09 2.71 3.54 4.46 Increases
Statistics EUS & Negot Chinese 36