31
Slide 4-1 2/10/2012 Chapter 4 Landmark Summa ries: Interpreting Typical Values and Percentiles

Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

Embed Size (px)

Citation preview

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 1/31

Slide

4-1

2/10/2012

Chapter 4

Landmark Summaries:

Interpreting Typical Values and

Percentiles

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 2/31

Slide

4-2

2/10/2012

Average or Mean

Add the data, divide by n or  N (the number of  

elementary units)

Divides total equally. The only such summary

A representative, central number (if data set is

appr oximately normal)

Summation notation

 ±  7 is capital Greek sigma

n

 X  X  X  X 

n!

...21

 N 

 X  X  X   N 

!Q...21

Sample average

Po pulation average

§!

!

n

i

i X n

 X 

1

1§!

!Q

 N 

i

i X  N  1

1

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 3/31

Slide

4-3

2/10/2012

Example: Number of Defects

Def ects measured for each of 10 pr oduction lots

4, 1, 3, 7, 3, 0, 7, 14, 5, 9

0

2

0 5 10 15 20

Def ects per lot

   F  r  e  q  u  e  n  c  y   (   l      o   t  s   )

Average is 5.1

def ects p

er lot

Fig 4.1.1

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 4/31

Slide

4-4

2/10/2012

Median

Also summarizes the data

The middle one

 ± Put data in order 

 ± Pick middleo

ne (o

r average middle two

if  n is even)

 ±  Median (9, 4, 5) = Median(4, 5, 9) = 5

 ±  Median (9, 4, 5, 7) = Median (4, 5, 7, 9) = = 6

R ank 

of the median is (1+n)/2

 ± If  n=3, rank is (1+3)/2 = 2

 ± If  n=4, rank is (1+4)/2 = 2.5 (so average 2nd and 3rd)

 ± If  n=262, rank is (1+262)/2 = 131.5

5+7

2

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 5/31

Slide

4-5

2/10/2012

Median (continued)

A representative, central number 

 ± If data set has a center 

Less sensitive to outliers than the average

For skewed data, represents the ³typical case´ better than the average does

 ± e.g., incomes

Average income for a country equally divides the total, which

may include some very high incomes

Median income chooses the middle person (half earn less, half  

earn more), giving less inf luence to high incomes (if any)

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 6/31

Slide

4-6

2/10/2012

Example: Spending

Customers plan to spend ($thousands)

3.8, 1.4, 0.3, 0.6, 2.8, 5.5, 0.9, 1.1

R ank ordered f r om smallest to largest

0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.51 2 3 4 5 6 7 8

Median is (1.1+1.4)/2 = 1.25 ± Smaller than the average, 2.05

Due to slight skewness?

R ank of median

= (1+8)/2 = 4.5

0 1 2 3 4 5

3 1 8 8 56 4

9

Median Average

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 7/31

Slide

4-7

2/10/2012

Example: The Crash of 1987

Dow-Jones Industrials, stock-price changes as

each stock began trading that f atef ul morning

Fairly normal

Mean and median are similar 

Fig 4.1.2

0

5

-20% -10% 0%

Percent change at o pening

   F  r  e  q  u  e  n  c  y

Average = -8.2%

Median = -8.6%

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 8/31

Slide

4-8

2/10/2012

Example: Incomes

Personal income of 100 peo ple

Average is higher than median due to skewness

Fig 4.1.3

0

10

20

30

40

50

$0 $100,000 $200,000 Income

Average = $38,710

Median = $27,216

   F  r  e  q  u  e  n  c  y

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 9/31

Slide

4-9

2/10/2012

Mode

Also summarizes the data

Most common data value

 ± Middle of  tallest histogram bar 

Pr o blems:

 ±  Depends on how you draw histogram (bin width)

 ± Might be more than one mode (two tallest bars)

Good if most data values are ³correct´

Good for nominal data (e.g., elections)

Mode

Mode

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 10/31

Slide

4-10

2/10/2012

Normal Distribution

Average, median, and mode are identical

 ± If  the data come f r om a normal distribution

Average, median, and modeare identical

in the case of a normal distribution

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 11/31

Slide

4-11

2/10/2012

Skewed Distribution

Average, median, and mode are different

 ± The f ew large (or small) values inf luence the mean

more than the median

 ± The highest point is not in the center 

Average

Median

Mode

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 12/31

Slide

4-12

2/10/2012

Which summary to use?

Average

 ± Best for normal data

 ± Preserves totals

Median ± Good for skewed data or data with outliers, pr ovided

you do not need to preserve or estimate total amounts

Mode

 ± Best for categories (nominal data). ± The mode is the only summary computable for nominal

data!

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 13/31

Slide

4-13

2/10/2012

Which Summary? (continued)

Average requires quantitative data (numbers)

Median works with quantitative or ordinal

Mode works with quantitative, ordinal, or nominal

Quantitative Ordinal Nominal

Average Yes - -

Median Yes Yes -

Mode Yes Yes Yes

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 14/31

Slide

4-14

2/10/2012

Weighted Average

Ordinary average gives same weight to allelementary units

Weighted average allows diff erent weights

Weights must add up to 1

 ± If not, then divide each by their total

n X n

 X n

 X n

 X 1

...11

21!

nn X w X w X w X  ! ...2211

1...21!

nwww

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 15/31

Slide

4-15

2/10/2012

Weighted Average (continued)

Average is per element ary unit 

 ± The average of your course grades is your ³average per 

course´

Weighted average is per unit of weig ht 

 ± Your GPA (grade point average) is a weighted average,

using credit hours to def ine the weights. The weighted

average is your ³average per credit hour´

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 16/31

Slide

4-16

2/10/2012

Example: Portfolio Rate of Return

Portfolio ex pected return (an interest rate,indicating per formance) is the weig ht ed aver a g e

of  the ex pected rates of  return of assets in the

 portfolio, weighted by $dollars invested

Portfolio contains three stocks. One ($1,000

invested) is ex pected to return 20%. Another 

($1,800 invested) ex pects 15%. Third is $2,200

and 30%.

Total invested is 1,000+1,800+2,200 = $5,000

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 17/31

Slide

4-17

2/10/2012

Example (continued)

Weights are

w1 = $1,000/$5,000 = 0.20

w2 = $1,800/$5,000 = 0.36

w3

= $2,200/$5,000 = 0.44

Weighted average is

0.20v(20%) + 0.36v(15%) + 0.44v(30%) = 22.6%

 ± The ex pected return for the portfolio.

 ± Each stock is represented in pr o portion to $ invested

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 18/31

Slide

4-18

2/10/2012

Percentiles

Landmark summaries in the same measurementunits as the data

 ± e.g., dollars, peo ple, miles per gallon, «

Some f amiliar percentiles

 ±  Smallest data value is 0th percentile

 ±  Median is 50th percentile

 ±  Largest data value is 100th percentile

 ±  90th

 percentile is larger than 90%of 

elementary units Finding percentiles

 ±  Diff icult to see f r om histogram

 ± Easy using CDF (Cumulative Distribution Function)

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 19/31

Slide

4-19

2/10/2012

Cumulative Distribution Function

Data axis horizontally (as in histogram)

Cumulative percent vertically

Equal vertical jump at each data value

0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.5

0%

50%

100%

$0 $2 $4 $6

Spending

      C  u  m

  u   l  a   t   i  v  e

   P  e  r  c  e  n   t

80th percentile

is $3.80

80%

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 20/31

Slide

4-20

2/10/2012

Five-Number Summary

Selected landmarks to represent entire data set

 ± Median = 50th percentile

 ± Quartiles

LQ = Lower Quartile = 25th percentile

 ±  R ank =

UQ = U pper Quartile = 75th percentile

 ±  R ank is n+1±[rank of  lower quartile]

 ± Extremes

Smallest = 0th percentile

Largest = 100th percentile

2

2

1int1 ¼½

»¬-

«

n

R ank of median

Discard decimal,

if any.

int(10.5)=10int(35)=35

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 21/31

Slide

4-21

2/10/2012

Five-Number Summary (continued)

Pr ovides information about

 ±  Central summary

Median

 ±  R ange of  the data

Largest ± smallest

 ± ³Middle half ́ of  the data

Fr om LQ to UQ

 ± Skewness

If median is not appr oximately half way between quartiles

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 22/31

Slide

4-22

2/10/2012

Box Plot

Displays f ive-number summary

Less detail than histogram

 ± Easier to compare many gr oups

0 2 4 6 8

Smallest Largest

Lower 

QuartileU pper 

Quartile

Median

   {

Middle half of  the data

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 23/31

Slide

4-23

2/10/2012

Spending rank ordered f r om smallest to largest

0.3, 0.6, 0.9, 1.1, 1.4, 2.8, 3.8, 5.5

1 2 3 4 5 6 7 8

LQ is (0.6+0.9)/2 = 0.75

UQ is (2.8+3.8)/2 = 3.3

Example: Spending

R ank of median= (1+8)/2 = 4.5

R ank of  UQ= 8+1-2.5=6.5

R ank of LQ= (1+4)/2 = 2.5

4 = int(4.5)

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 24/31

Slide

4-24

2/10/2012

Example: Spending (continued)

Five-number summary

0.3, 0.75, 1.25, 3.3, 5.5

Smallest, LQ, Median, UQ, Largest

Box plot

 ± Shows some skewness (lack of symmetry)

0 5

Spending ($thousands)

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 25/31

Slide

4-25

2/10/2012

Identifying Outliers

Outliers are def ined as o bservations, if any, either:

 ± More than UQ + 1.5 (UQ LQ), or 

  ± Less than LQ 1.5 (UQ LQ)

Outliers aref 

ar f 

r o

m the center of 

the distributio

n ± and may be interesting as special cases

UQ LQ

LQ UQ

1.5(UQ LQ)1.5(UQ LQ) U pper 

outliers

Lower 

outliers

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 26/31

Slide

4-26

2/10/2012

Example: Technology CEO Pay

CEO compensation in technology companies

 ±  Detailed box plot identif ies outliers

and identif ies the most extreme non-outliers,

gives more detail than the (ordinary) box plot

Fig 4.2.3

$0 $5,000,000 $10,000,000

Detailed Box Plot

$0 $5,000,000 $10,000,000

IBMAMD

Sun

Micr osystems

A pple

Computer 

Box Plot

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 27/31

Slide

4-27

2/10/2012

Example: CEO Compensation

Box plots to compare f irms within industry gr oups

 ±  Utilities gr oup generally shows lower compensation

 ± Highest-paid are in Financial Services gr oup

Fig 4.2.3

$0 $10,000,000 $20,000,000 $30,000,000

Energy

Financial

Technology

Utilities

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 28/31

Slide

4-28

2/10/2012

CEO Compensation (continued)

Detailed box plots (with outliers and most extremenon-outliers named)

Fig 4.2.3

IBMAMD

Enr on

Citigr oupGoldmanSachs

Bear Stearns

MerrillLynch

Morgan StanleyDean Witter 

LehmanBr others

Phillips Petr oleum

SunMicr osystems

DukeEnergy

GPU

A ppleComputer 

Baker Hughes

BerkshireHathaway

$0 $10,000,000 $20,000,000 $30,000,000

Energy

Financial

Technology

Utilities

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 29/31

Slide

4-29

2/10/2012

Mining the Donations Database

More f requent donors (to p) tend to give  smaller current donation amounts (shif t to lef t)

Fig 4.2.4

$0 $50 $100

Size of current donation

   N  u  m   b

  e  r      o         f

  p  r  e  v   i      o  u  s

  g   i         f   t  s

  p  a  s   t   2  y  e  a  r  s

1

2

3

4+

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 30/31

Slide

4-30

2/10/2012

Example: Business Failures

Per million peo ple, by state90th percentile is 432.4

50th percentile is 260.2

0%

50%

100%

0 100 200 300 400 500 600 700

Failures

      C  u  m  u

   l  a   t   i  v  e   P  e  r  c  e  n   t

Fig 4.2.9

8/3/2019 Statistics- Landmark Summaries Interpreting Typical Values and Percentiles

http://slidepdf.com/reader/full/statistics-landmark-summaries-interpreting-typical-values-and-percentiles 31/31

Slide

4-31

2/10/2012

Example: Business Failures

Compare histogram, box plot, and CDF

Histogram

Box plot

CDF

0

10

0 500Failures

0 500Failures

0%

100%

0 500Failures

Fig 4.2.10