3 Descriptive Stats

Embed Size (px)

Citation preview

  • 8/9/2019 3 Descriptive Stats

    1/27

    Numeric Summaries and

    Descriptive Statistics

  • 8/9/2019 3 Descriptive Stats

    2/27

    populations vs. samples

    we want to describe both samples and

    populations

    the latter is a matter of inference

  • 8/9/2019 3 Descriptive Stats

    3/27

    outliers

    minority cases, so different from the majoritythat they merit separate consideration

    are they errors?

    are they indicative of a different pattern? think about possible outliers with care, but

    beware of mechanical treatments

    significance of outliers depends on yourresearch interests

  • 8/9/2019 3 Descriptive Stats

    4/27

  • 8/9/2019 3 Descriptive Stats

    5/27

    summaries of distributions

    graphic vs. numeric

    graphic may be better for visualization

    numeric are better for statistical/inferential

    purposes

    resistance to outliers is usually an advantage

    in either case

  • 8/9/2019 3 Descriptive Stats

    6/27

    general characteristics

    kurtosis

    leptokurtic platykurtic

    [peakedness]

  • 8/9/2019 3 Descriptive Stats

    7/27

    right

    (positive)skew

    left

    (negative)

    skew

    skew (skewness)

  • 8/9/2019 3 Descriptive Stats

    8/27

  • 8/9/2019 3 Descriptive Stats

    9/27

    central tendency

    measures of central tendency

    provide a sense of the value expressed by

    multiple cases, over all

    mean

    median

    mode

  • 8/9/2019 3 Descriptive Stats

    10/27

    mean

    center of gravity

    evenly partitions the sum of all

    measurement among all cases; average of

    all measures

    n

    x

    x

    n

    i

    i== 1

  • 8/9/2019 3 Descriptive Stats

    11/27

    crucial for inferential statistics

    mean is not very resistant to outliers

    a trimmed mean may be better fordescriptive purposes

    mean pro and con

  • 8/9/2019 3 Descriptive Stats

    12/27

    meanrim diameter (cm)

    unit 1 unit 2

    12.6 16.211.6 16.4

    16.3 13.8

    13.1 13.2

    12.1 11.3

    26.9 14.0

    9.7 9.0

    11.5 12.5

    14.8 15.613.5 11.2

    12.4 12.2

    13.6 15.5

    11.7

    n 12 13

    total 168.1 172.6

    total/n 14.0 13.3

    unit 1 unit 2

    9 26

    2524

    23

    22

    21

    20

    19

    18

    17

    3 16 24

    15 56

    14.0== 8 14 0

    651 13 28 ==13.3

    641 12 25

    65 11 237

    107 9 0

    R: mean(x)

  • 8/9/2019 3 Descriptive Stats

    13/27

    trimmed meanrim diameter (cm)

    unit 1 unit 29.7 9.0

    11.5 11.2

    11.6 11.3

    12.1 11.7

    12.4 12.2

    12.6 12.5

    13.1 13.213.5 13.8

    13.6 14.0

    14.8 15.5

    16.3 15.6

    26.9 16.2

    16.4

    n 10 11

    total 131.5 147.2

    total/n 13.2 13.4

    unit 1 unit 29 26

    25

    24

    23

    22

    21

    2019

    18

    17

    3 16 24

    15 56

    8 14 0

    13.2== 651 13 28 ==13.4

    641 12 25

    65 11 237

    10

    7 9 0

    R: mean(x, trim=.1)

  • 8/9/2019 3 Descriptive Stats

    14/27

    median

    50th percentile

    less useful for inferential purposes

    more resistant to effects of outliers

  • 8/9/2019 3 Descriptive Stats

    15/27

    median

    rim diameter (cm)

    unit 1 unit 2

    9.7 9.0

    11.5 11.2

    11.6 11.312.1 11.7

    12.4 12.2

    12.6 12.5

    12.9

  • 8/9/2019 3 Descriptive Stats

    16/27

    mode

    the most numerous category

    for ratio data, often implies that data have

    been grouped in some way can be more or less created by the grouping

    procedure

    for theoretical distributionssimply thelocation of the peak on the frequencydistribution

  • 8/9/2019 3 Descriptive Stats

    17/27

    i

    solate

    dscatte

    rs

    hamle

    ts

    villag

    es

    r

    egion

    alcen te

    rs

    r

    egionalcen te

    rs

    modal class = hamlets

    1.0 1.5 2.0 2.5

  • 8/9/2019 3 Descriptive Stats

    18/27

    dispersion

    measures of dispersion

    summarize degree of clustering of cases, esp.

    with respect to central tendency

    range

    variance

    standard deviation

  • 8/9/2019 3 Descriptive Stats

    19/27

    range

    unit 1 unit 2

    9.7 9.0

    11.5 11.2

    11.6 11.3

    12.1 11.7

    12.4 12.2

    12.6 12.513.1 13.2

    13.5 13.8

    13.6 14.0

    14.8 15.5

    16.3 15.6

    26.9 16.216.4

    unit 1 unit 2

    * 9 26

    | 25

    | 24

    | 23

    | 22

    | 21

    | 20

    | 19| 18

    | 17

    | 3 16 24 *

    | 15 56 |

    | 8 14 0 |

    | 651 13 28 |

    | 641 12 25 |

    | 65 11 237 |

    | 10 |

    * 7 9 0 *

    would be better to use midspread

    R: range(x)

  • 8/9/2019 3 Descriptive Stats

    20/27

    variance

    analogous to average deviation of cases from

    mean

    in fact, based on sum of squared deviations fromthe meansum-of-squares

    ( )

    1

    1

    2

    2

    ==

    n

    xx

    s

    n

    i

    i

    R: var(x)

  • 8/9/2019 3 Descriptive Stats

    21/27

    variance

    computational form:

    1

    /

    2

    11

    2

    2

    =

    ==

    n

    nxx

    s

    n

    i

    i

    n

    i

    i

  • 8/9/2019 3 Descriptive Stats

    22/27

    note: units of variance are squared

    this makes variance hard to interpret

    ex.: projectile point sample:

    mean = 22.6 mm

    variance = 38 mm2

    what does this mean???

  • 8/9/2019 3 Descriptive Stats

    23/27

    standard deviation

    square root of variance:

    ( )

    1

    1

    2

    ==

    n

    xx

    s

    n

    i

    i

    1

    /

    1

    2

    1

    2

    =

    = =

    n

    nxx

    s

    n

    i

    n

    i

    ii

  • 8/9/2019 3 Descriptive Stats

    24/27

    standard deviation

    units are in same units as base measurements

    ex.: projectile point sample:mean = 22.6 mm

    standard deviation = 6.2 mm

    mean +/- sd (16.428.8 mm) should give at least some intuitive sense of where most

    of the cases lie, barring major effects of outliers

  • 8/9/2019 3 Descriptive Stats

    25/27

    rim diameter (cm)

    unit 1 unit 2

    12.6 16.2

    11.6 16.4

    16.3 13.8

    13.1 13.2

    12.1 11.3

    26.9 14.0

    9.7 9.011.5 12.5

    14.8 15.6

    13.5 11.2

    12.4 12.2

    13.6 15.5

    11.7

    mean: 14.0 13.3

    n: 12 13

    unit 1 unit 2

    -1.4 2.9

    -2.4 3.1

    2.3 0.5

    -0.9 -0.1

    -1.9 -2.0

    12.9 0.7

    -4.3 -4.3

    -2.5 -0.8

    0.8 2.3

    -0.5 -2.1

    -1.6 -1.1

    -0.4 2.2

    -1.6

    unit 1 unit 2

    1.98 8.54

    5.80 9.75

    5.25 0.27

    0.83 0.01

    3.64 3.91

    166.20 0.52

    18.56 18.29

    6.29 0.60

    0.63 5.40

    0.26 4.31

    2.59 1.16

    0.17 4.94

    2.49

    sum of sq.: 212.19 60.20

    variance: 19.29 5.02

    stand. dev.: 4.39 2.24

  • 8/9/2019 3 Descriptive Stats

    26/27

    trimmed dispersion measures

    variance and sd are even more sensitive to

    extreme values (outliers) than the mean

    why??

    you can calculate a trimmed version of the

    variance simply by eliminating cases from the

    tails, and calculating the variance in the normalway

  • 8/9/2019 3 Descriptive Stats

    27/27

    trimmed standard deviation

    trimmed sd is calculated differently

    sT = trimmed standard deviation

    n = number of cases in untrimmed batchs2w = variance of trimmed (winsorized) batch

    nT = number of cases in the trimmed batch

    1

    )1( 2

    =T

    W

    T

    n

    sn

    s