Upload
bhaveshshukla
View
220
Download
0
Embed Size (px)
Citation preview
8/9/2019 3 Descriptive Stats
1/27
Numeric Summaries and
Descriptive Statistics
8/9/2019 3 Descriptive Stats
2/27
populations vs. samples
we want to describe both samples and
populations
the latter is a matter of inference
8/9/2019 3 Descriptive Stats
3/27
outliers
minority cases, so different from the majoritythat they merit separate consideration
are they errors?
are they indicative of a different pattern? think about possible outliers with care, but
beware of mechanical treatments
significance of outliers depends on yourresearch interests
8/9/2019 3 Descriptive Stats
4/27
8/9/2019 3 Descriptive Stats
5/27
summaries of distributions
graphic vs. numeric
graphic may be better for visualization
numeric are better for statistical/inferential
purposes
resistance to outliers is usually an advantage
in either case
8/9/2019 3 Descriptive Stats
6/27
general characteristics
kurtosis
leptokurtic platykurtic
[peakedness]
8/9/2019 3 Descriptive Stats
7/27
right
(positive)skew
left
(negative)
skew
skew (skewness)
8/9/2019 3 Descriptive Stats
8/27
8/9/2019 3 Descriptive Stats
9/27
central tendency
measures of central tendency
provide a sense of the value expressed by
multiple cases, over all
mean
median
mode
8/9/2019 3 Descriptive Stats
10/27
mean
center of gravity
evenly partitions the sum of all
measurement among all cases; average of
all measures
n
x
x
n
i
i== 1
8/9/2019 3 Descriptive Stats
11/27
crucial for inferential statistics
mean is not very resistant to outliers
a trimmed mean may be better fordescriptive purposes
mean pro and con
8/9/2019 3 Descriptive Stats
12/27
meanrim diameter (cm)
unit 1 unit 2
12.6 16.211.6 16.4
16.3 13.8
13.1 13.2
12.1 11.3
26.9 14.0
9.7 9.0
11.5 12.5
14.8 15.613.5 11.2
12.4 12.2
13.6 15.5
11.7
n 12 13
total 168.1 172.6
total/n 14.0 13.3
unit 1 unit 2
9 26
2524
23
22
21
20
19
18
17
3 16 24
15 56
14.0== 8 14 0
651 13 28 ==13.3
641 12 25
65 11 237
107 9 0
R: mean(x)
8/9/2019 3 Descriptive Stats
13/27
trimmed meanrim diameter (cm)
unit 1 unit 29.7 9.0
11.5 11.2
11.6 11.3
12.1 11.7
12.4 12.2
12.6 12.5
13.1 13.213.5 13.8
13.6 14.0
14.8 15.5
16.3 15.6
26.9 16.2
16.4
n 10 11
total 131.5 147.2
total/n 13.2 13.4
unit 1 unit 29 26
25
24
23
22
21
2019
18
17
3 16 24
15 56
8 14 0
13.2== 651 13 28 ==13.4
641 12 25
65 11 237
10
7 9 0
R: mean(x, trim=.1)
8/9/2019 3 Descriptive Stats
14/27
median
50th percentile
less useful for inferential purposes
more resistant to effects of outliers
8/9/2019 3 Descriptive Stats
15/27
median
rim diameter (cm)
unit 1 unit 2
9.7 9.0
11.5 11.2
11.6 11.312.1 11.7
12.4 12.2
12.6 12.5
12.9
8/9/2019 3 Descriptive Stats
16/27
mode
the most numerous category
for ratio data, often implies that data have
been grouped in some way can be more or less created by the grouping
procedure
for theoretical distributionssimply thelocation of the peak on the frequencydistribution
8/9/2019 3 Descriptive Stats
17/27
i
solate
dscatte
rs
hamle
ts
villag
es
r
egion
alcen te
rs
r
egionalcen te
rs
modal class = hamlets
1.0 1.5 2.0 2.5
8/9/2019 3 Descriptive Stats
18/27
dispersion
measures of dispersion
summarize degree of clustering of cases, esp.
with respect to central tendency
range
variance
standard deviation
8/9/2019 3 Descriptive Stats
19/27
range
unit 1 unit 2
9.7 9.0
11.5 11.2
11.6 11.3
12.1 11.7
12.4 12.2
12.6 12.513.1 13.2
13.5 13.8
13.6 14.0
14.8 15.5
16.3 15.6
26.9 16.216.4
unit 1 unit 2
* 9 26
| 25
| 24
| 23
| 22
| 21
| 20
| 19| 18
| 17
| 3 16 24 *
| 15 56 |
| 8 14 0 |
| 651 13 28 |
| 641 12 25 |
| 65 11 237 |
| 10 |
* 7 9 0 *
would be better to use midspread
R: range(x)
8/9/2019 3 Descriptive Stats
20/27
variance
analogous to average deviation of cases from
mean
in fact, based on sum of squared deviations fromthe meansum-of-squares
( )
1
1
2
2
==
n
xx
s
n
i
i
R: var(x)
8/9/2019 3 Descriptive Stats
21/27
variance
computational form:
1
/
2
11
2
2
=
==
n
nxx
s
n
i
i
n
i
i
8/9/2019 3 Descriptive Stats
22/27
note: units of variance are squared
this makes variance hard to interpret
ex.: projectile point sample:
mean = 22.6 mm
variance = 38 mm2
what does this mean???
8/9/2019 3 Descriptive Stats
23/27
standard deviation
square root of variance:
( )
1
1
2
==
n
xx
s
n
i
i
1
/
1
2
1
2
=
= =
n
nxx
s
n
i
n
i
ii
8/9/2019 3 Descriptive Stats
24/27
standard deviation
units are in same units as base measurements
ex.: projectile point sample:mean = 22.6 mm
standard deviation = 6.2 mm
mean +/- sd (16.428.8 mm) should give at least some intuitive sense of where most
of the cases lie, barring major effects of outliers
8/9/2019 3 Descriptive Stats
25/27
rim diameter (cm)
unit 1 unit 2
12.6 16.2
11.6 16.4
16.3 13.8
13.1 13.2
12.1 11.3
26.9 14.0
9.7 9.011.5 12.5
14.8 15.6
13.5 11.2
12.4 12.2
13.6 15.5
11.7
mean: 14.0 13.3
n: 12 13
unit 1 unit 2
-1.4 2.9
-2.4 3.1
2.3 0.5
-0.9 -0.1
-1.9 -2.0
12.9 0.7
-4.3 -4.3
-2.5 -0.8
0.8 2.3
-0.5 -2.1
-1.6 -1.1
-0.4 2.2
-1.6
unit 1 unit 2
1.98 8.54
5.80 9.75
5.25 0.27
0.83 0.01
3.64 3.91
166.20 0.52
18.56 18.29
6.29 0.60
0.63 5.40
0.26 4.31
2.59 1.16
0.17 4.94
2.49
sum of sq.: 212.19 60.20
variance: 19.29 5.02
stand. dev.: 4.39 2.24
8/9/2019 3 Descriptive Stats
26/27
trimmed dispersion measures
variance and sd are even more sensitive to
extreme values (outliers) than the mean
why??
you can calculate a trimmed version of the
variance simply by eliminating cases from the
tails, and calculating the variance in the normalway
8/9/2019 3 Descriptive Stats
27/27
trimmed standard deviation
trimmed sd is calculated differently
sT = trimmed standard deviation
n = number of cases in untrimmed batchs2w = variance of trimmed (winsorized) batch
nT = number of cases in the trimmed batch
1
)1( 2
=T
W
T
n
sn
s