View
217
Download
0
Embed Size (px)
Citation preview
Distribution Summaries
Measures of central tendencyMeanMedianMode
Measures of spreadStandard Deviation Interquartile Range (IQR)
Distribution spread
Range
Standard deviation
Variance
Range
The range of a distribution is the difference between the highest value and the lowest value
Fractio
n
# Months Cohabited0 50 100
0
.2
.4
.6
Length of Cohabitation in Months
0 103
Range (cont.)
. sum cohbl Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------- cohblnth | 626 11.74601 17.1347 0 103
Range (cont.). sum cohbl, d # Months Cohabited ------------------------------------------------------------- Percentiles Smallest 1% 0 0 5% 0 0 10% 0 0 Obs 626 25% 0 0 Sum of Wgt. 626 50% 5 Mean 11.74601 Largest Std. Dev. 17.1347 75% 17 97 90% 32 97 Variance 293.5978 95% 46 103 Skewness 2.304175 99% 79 103 Kurtosis 9.411293
Range (cont.)
0
50
100
# Months Cohabited 103
97
Range
The range of a distribution is the difference between the highest value and the lowest value
Variance
The most commonly used measure of spread
One of the most fundamental concepts in statistics
Variance Formula
In words, the variance is the mean squared deviation (from the mean)
A deviation is the difference between a score and the mean of all scores
We square this deviation for all observations
We then take the mean of all these
Variance Formula (cont.)
n
XXn
ii
1
2
2
Definitional Formula
Variance Formula (cont.)
n
n
X
X
n
iin
ii
2
1
1
2
2
Computational Formula
Variance (example)
Obs Square Dev Dev Sq 1 1 -2 4 2 4 -1 1 3 9 0 0 4 16 1 1 5 25 2 4Sum 15 55 0 10Mean 3 2
Variance = (55 - 225 / 5) / 5 = (55-45) / 5 = 2
Why sum the SQUARES?
Recall that the sum of the deviations around the mean is zero
Therefore the average deviation is zero
Squaring a positive or negative number always creates a positive result
This way we are assured of a sum that is greater than or equal to zero
Compare
n
ii XX
1
0
number pos.1
2
n
ii XX
Compare (cont.)
Deviations Squared Deviations
10 - 12 = -211 - 12 = -112 - 12 = 013 - 12 = 114 - 12 = 2
10 - 12 = -211 - 12 = -112 - 12 = 013 - 12 = 114 - 12 = 2
41014
Sum
Mean
60 60 0 60 60 0 10
12 12 0 12 12 0 2Variance
Standard Deviation
The second most commonly used measure of spread
The square root of the variance
Which brings us back to the original metric or units of measure
2Standard DeviationVariance
What are units?
Consider age
Units are years
Deviations are years
Squared deviations are years squared
Summing and taking mean leaves squared years
Taking square root yields years again
So we have the sd?
The standard deviation is about 1/6 of the rangeFor a normal distribution, about 70% of observations are ± 1 σ from the mean.And, about 90% are ± 2 σ from the meanAnd, about 99% are ± 3 σ from the mean
Variance (example)
1 2 3 4 5
Variance = 2 Std. Dev. = 1.414
1
Mean
Variability of the scores
Variability and spread of the scores indicate the second characteristic of a distribution that we need to know.
The first was the mean or central location of the distribution
The mean and variance are independent
Means can change without affecting the variance (or standard deviation)
Standard deviation (or variance) can change without affecting the mean
Two distributions may differ on means or on standard deviations or both (or neither)
What makes scores variable?
Why are some scores high and others low?
Why does the variance change?
. tab sex, sum(income1) | Summary of income1 sex | Mean Std. Dev. Freq. ------------+------------------------------------ female | 16.207224 10.82088 263 male | 22.371972 13.304104 289 ------------+------------------------------------ Total | 19.434783 12.557429 552