51
Descriptive Statistics

Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Embed Size (px)

Citation preview

Page 1: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Descriptive Statistics

Page 2: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

the everyday notions of central tendency

Usual Customary Most Standard Expected normal Ordinary Medium commonplace

NY Times, 10/24/ 2010Stories vs. StatisticsBy JOHN ALLEN PAULOS

Page 3: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Overview What are descriptive statistics?

A bit of terminology/notation Measures of Central Tendency

Mean, Mode, Median Measures of Variability

Ranges, Standard Deviations The Normal Curve

Page 4: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Terminology/Notation A data distribution = A set of data/scores

(the whole thing) 1, 2, 4, 7

X = A raw, single score (i.e., 2 from above) ∑ = Summation (added up)

∑X = 14 (each individual score added up) n = sample size (distribution size, or number

of scores) n = 4 (from above)

Page 5: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Descriptive Statistics Descriptive statistics are the side of

statistics we most often use in our everyday lives

Realize that most observations/data are too “large” for a human to take in and comprehend – we must “reduce” them How can we summarize what we see? Example – Grades/Registrar

Page 6: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Making sense out of chaos

Descriptive Statistics Descriptive statistics = describing

the data n = 50, a test score of 83%

Where does it fit in the class??

Page 7: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Descriptive Statistics Transform a set of numbers or

observations into indices that describe or characterize the data “Summary statistics” A large group of statistics that are

used in all research manuscripts Even the most complex statistical tests

and studies start with descriptive statistics

Page 8: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Descriptive Statistics

Descriptive Statistics

MeasurementScales

• Nominal• Ordinal• Interval• Ratio

GraphicPortrayals

• Frequencies• Histograms• Bar graphs• Normal distribution

CentralTendency

• Mean• Median• Mode

Relationship

• Scatterplot• Correlation• Regression

Variability

• Range• Standard deviation• Standardized scores

Page 9: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Descriptive Statistics Descriptive statistics usually

accomplish two major goals: 1) Describe the central location of the

data 2) Describe how the data are dispersed

about that point In other words, they provide:

1) Measures of Central Tendency 2) Measures of Variability

Page 10: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Measure of Central Tendency What SINGLE summary value best

describes the CENTRAL location of an entire distribution? Mode: which value occurs most often Median: the value above and below

which 50% of the cases fall (the middle; 50th percentile)

Mean: mathematical balance point; arithmetic/mathematical average

Page 11: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Mode Most frequent occurrence What if data were?

17, 19, 20, 20, 22, 23, 25, 28 17, 19, 20, 20, 22, 23, 23, 28

Problem: set of numbers can be bimodal, or trimodal, depending on the scores

Not a stable measure Ex. 17, 19, 20, 22, 23, 28, 28

Page 12: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Median Rank numbers, pick middle one What if data were…?

17, 19, 20, 23, 23, 28 Solution: add up two middle scores,

divide by 2 (=21.5) Best measure in asymmetrical

distribution (i.e. skewed), not sensitive to extreme scores Ex. 17, 19, 20, 23, 23, 428

Page 13: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Mean = X Add up the numbers and divide by the

sample size (the number of numbers!)

Try this one… 2,3,5,6,9 2+3+5+6+9 = 25 / 5 = 5

(Usually) best measure of the three –uses the most information (all values from distribution contribute)

XX

n

Page 14: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Characteristics of the Mean Balance point

Point around which deviations sum to zero

Deviation = X – X

For instance, if scores are 2,3,5,6,9 Mean is 5

Sum of deviations: (-3)+(-2)+0+1+4=0 ∑ (X – X) = 0

Page 15: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Characteristics of the Mean Affected by extreme scores

Example 1 Scores 7, 11, 11, 14, 17 Mean = 12, Mode and Median = 11

Example 2 Scores 7, 11, 11, 14, 170 Mean = 42.6, Mode & Median = 11

Page 16: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Characteristics of the Mean Balance point Affected by extreme scores Appropriate for use with interval or

ratio scales of measurement More stable than Median or Mode

when multiple samples drawn from the same population Basis for inferential stats

Page 17: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Guidelines to Choose Measure of Central Tendency Mean is preferred because it is the

basis of inferential statistics Median may be better for skewed

data Distribution of wealth in the US – ex.

annual household income in Washington state for 2000: mean=$76,818; median=$42,024

Mode to describe average of nominal data (eye color, hair color, etc…)

Page 18: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Scores

Normal Distribution

Frequency,How often a score occurs

Page 19: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

MLB batting averages over 3-year span (min. 100 AB)

Mean = 0.267

n = 1291

Page 20: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

MedianMode

Mean

Scores

Normal Distribution“Normal” distribution indicates the data are perfectly symmetrical

Page 21: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Positively skewed distribution

Median

Mode

Mean

Scores

Page 22: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

NFL Salaries 2011

Page 23: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Negatively skewed distribution

Median

Mode

Mean

Scores

Page 24: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Relationship among the MCT & shape of distribution

Page 25: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Alaska’s average elevation of1900 feet is less than that of Kansas. Nothing in that average suggeststhe 16 highest mountains inthe United States are in Alaska. Averages mislead, don’t they?

Grab Bag, Pantagraph, 08/03/2000

Page 26: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Variability

Measures of dispersion or spread

The only thing constant is variation.

Page 27: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

the notions of variability

•Unusual•Peculiar•Strange•Original•Extreme•Special•Unlike•Deviant•Dissimilar•different

NY Times, 10/24/ 2010Stories vs. StatisticsBy JOHN ALLEN PAULOS

Page 28: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Variability defined Measures of Central Tendency provide a

summary level of the data Recognizes that scores vary across

individual cases ie, the mean or median may not be an actual

score in your distribution Variability quantifies the spread of

performance How scores vary around mean/mode/median

Page 29: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

To describe a distribution 1) Measure of Central Tendency

Mean, Mode, Median 2) Measure of Variability

Multiple measures Range, Interquartile range, Semi-

Interquartile Range Standard Deviation

Page 30: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Range Range = Difference between low/high

score # of hours spent watching TV/week

2, 5, 7, 7, 8, 8, 10, 12, 12, 15, 17, 20 Range = (Max - Min) Score

20 - 2 = 18 Very susceptible to outliers Doesn’t indicate anything about

variability around the mean/central point

Page 31: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Semi-Interquartile range What is a quartile??

Divide sample into 4 parts of equal size Q1 , Q2 , Q3 = Quartile Points

Interquartile Range = Q3 - Q1 Difference between highest and lowest

quartile SIQR = IQR / 2 Related to the Median…prevents outliers

from overly skewing measure For ordinal data or skewed interval/ratio

Page 32: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

BMD and walkingQuartiles based on miles walked/week

Krall et al, 1994, Walking is related to bone density and rates of bone loss. AJSM, 96:20-26

Page 33: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Notes: Skewed Distribution?

95th Percentile?

50th Percentile vs Median?

Page 34: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Standard Deviation Most commonly accepted measure

of spread1. Compute the deviations of all numbers

from the mean2. Square and THEN sum each of the

deviations3. Divide by the number of deviations4. Finally, take the square root

Variation itself is nature's only irreducible essence. Stephen Jay Gould

2( )x X

n

Page 35: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Standard Deviation Distribution = 1, 3, 5, 7 X = 16 /4 = 4 1) Compute Deviations = -3, -1, 1, 3 2) Square Deviations = 9, 1, 1, 9 3) Sum Deviations = 20 4) Divide by n= 20/4 = 5 5) Take square root = √5 = 2.2

Page 36: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Key points about SD SD small data clustered round mean SD largedata scattered from the mean Affected by extreme scores (just like

mean)…oftentimes called “outliers” Consistent (more stable) across samples

from the same population Just like the mean - so it works well with

inferential stats (where repeated samples are taken)

Page 37: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

SD Example Three NFL quarterbacks with similar QB

ratings in 2006: Matt Hasselbeck (SEA) = 76.0 Rex Grossman (CHI) = 73.9 Brett Favre (GB) = 72.7 Note: QB rating involves a complex formula accounting for

passing attempts, completions, yards, touchdowns, and interceptions…100+ is considered outstanding & 70-80 is average

All appear to have had very similar, somewhat mediocre seasons as QB’s

Page 38: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

SD Example Let’s look at the SD of their game-

by-game QB ratings: Matt Hasselbeck (SEA) = 29.97 Rex Grossman (CHI) = 47.60 Brett Favre (GB) = 27.81

Grossman had, by far, the most variability (i.e. inconsistency) in his game-by-game performances…is this good or bad?

Page 39: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Clinical Use of SD

Page 40: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

SD and the normal curve The following concepts are critical

to your understanding of how descriptive statistics works

Remember – a “normal” curve is perfectly symmetrical. This is not typical, but usually data are almost normal…

Page 41: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

SD and the normal curve

60 70 80

X = 70SD = 10 34.1% 34.1%

About 68% ofscores fallwithin 1 SDof mean

Page 42: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

About 68% ofscores fallbetween 60 and 70

The standard deviation and the normal curve

60 70 80

X = 70SD = 10

34% 34%

Page 43: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

70

About 95% ofscores fallwithin 2 SDof mean

60 8050 90

X = 70SD = 10

The standard deviation and the normal curve

34.1% 34.1%

13.6% 13.6%

Page 44: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

70

About 95% ofscores fallbetween 50 and 90

60 8050 90

X = 70SD = 10

The standard deviation and the normal curve

34.1% 34.1%

13.6%13.6%

Page 45: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

The standard deviation and the normal curve

70

About 99.7% of scores fall within 3 S.D. of the mean

60 8050 90

X = 70SD = 10

40 100

2.3% 2.3%

34.1% 34.1%

13.6%13.6%

Page 46: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

The standard deviation and the normal curve

70

About 99.7% of scores fall between 40 and 100

60 8050 90

X = 70SD = 10

40 100

2.3% 2.3%

34.1% 34.1%

13.6%13.6%

Page 47: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

What about = 70, SD = 5? What approximate percentage of

scores fall between 65 & 75? …1SD below + 1SD above = 68%

What range includes about 99.7% of all scores?

…3SD below to 3SD above = 55 to 85

X

Page 48: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Interpreting The Normal Table

Area under Normal Curve Specific SD values (z) include certain

percentages of the scores Values of Special Interest

1.96 SD = 47.5% of scores (47.5 + 47.5 = 95%) 2.58 SD = 49.5% of scores (49.5 + 49.5 = 99%)

ie, 95% of scores fall within 1.96 standard deviations of the mean (1.96 above and 1.96 below)

Page 49: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

IQ

10085 11570 130

X = 100SD = 15

55 145

2.3% 2.3%

34.1% 34.1%

13.6%13.6%

68% have an IQ between 85-115

Page 50: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

MLB players’ batting averages over a 3-year span (min. 100 at bats)

~95% of players have an average between 0.196 and 0.337

Page 51: Descriptive Statistics. the everyday notions of central tendency Usual Customary Most Standard Expected normal Ordinary Medium commonplace NY Times, 10/24

Next Week… We will utilize our understanding of

descriptive statistics concepts, including central tendency, variability, and the normal curve, to examine standardized scores

Homework = Cronk 3.1 – 3.4 Bring calculator to class In-class activity 2…