Mean and Variance

Preview:

DESCRIPTION

Mean and Variance. Distribution ?. statistics. pop’n dist’n. dist’n of a sample. (sample) statistic. (population) parameter. pop’n dist’n. dist’n of a sample. A new variable X from mseg of credit card data. mseg X - PowerPoint PPT Presentation

Citation preview

Mean and Variance

Distribution ?

dist’n of a sample pop’n dist’n

statistics

(sample) statistic (population) parameter

X %freq

Head 1 0.5

Tail 0 0.5

Total 1.0

X freq %freq

Head 1 20 0.4

Tail 0 30 0.6

Total 50 1.0

dist’n of a sample

pop’n dist’n

X %freq

Head 1 0.35

Tail 0 0.65

Total 1.0

Y %freq

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6

Total 1.0

Y freq %freq

1 10 0.1

2 20 0.2

3 10 0.1

4 20 0.2

5 20 0.2

6 20 0.2

Total 100 1.0

mseg X

Low Spender 1Med Low Spender 2 Average Spender 3 Med High Spender 4 High Spender 5

A new variable X from mseg

of credit card data

X freq %freq

1 26 0.26

2 20 0.20

3 11 0.11

4 25 0.25

5 18 0.18

Total 100 1.00

X %freq

1 ?

2 ?

3 ?

4 ?

5 ?

Total 1.00

Variable X of credit card

data

?

Measure for location (center)

Mean,

Mode

Median

(truncated, winsorized) Mean

Mean

Median

50% 50%

Median

Mode

Hit/Stop Burst

Dealer's hidden card ?

2 - 91,11 10

Outlier

64

5 6

Truncated mean / Winsorized mean

64 5 61 9

64 5 64 6

64 5 6

64

5 6

Truncated mean / Winsorized mean

50% 50%

Q1 Q2 Q3

75% 25%25% 75%

Quartiles

25 percentile 50 percentile 75 percentile

Median

일러스트 = 유재일 기자 jae0903@chosun.com

빗나간 주택통계 부동산 정책도 헛발질

한국의 PIR 은 주택의 평균 가격과 도시근로자의 평균 가계소득을 기준으로 계산한다 . 반면 미국의 PIR 은 미디언 가격 (MEDIAN PRICE·중간가격 ) 과 미디언 소득을 기준으로 한다 . 미디언 가격은 그 지역에서 거래된 가장 가격이 싼 주택에서부터 가장 비싼 주택을 일렬로늘어 놓은 뒤 그 중간치를 선택한다 .

건설산업전략연구소 김선덕 소장은 “평균가격이나 평균소득은 고가의 주택이나 엄청난고소득자가 일부 포함되면 통계가 왜곡될 수 있다”고 말했다 . 더군다나 한국의 주택가격은호가 ( 呼價 ) 이고 미국의 주택가격은 실거래가를 기준으로 한다 .

차학봉 기자 , hbcha@chosun.com입력 : 2007.03.26 23:31

Wrong housing statistics make wrong real estate policy.

While median is better statistic than mean in representing house prices,Korean government publishes statistics calculated by mean on house prices. Mean price can be distorted by just one or two extreme prices.

percentile

p% (100-p)%

p-th percentile

Measure for variability

Range

InterQuartile Range (IQR)

Variance

Standart Deviation

11

Range

1Q 2Q 3Q

13 QQIQR

11

variance, standard deviation

Y %freq

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6

Total 1.0

Y freq %freq

1 10 0.1

2 20 0.2

3 10 0.1

4 20 0.2

5 20 0.2

6 20 0.2

Total 100 1.0

Mean (Y) = 1*0.1 + 2*0.20 + 3*0.1 + ... + 6*0.2

= 3.8 Mean (Y) = 1*(1/6) + 2*(1/6) + ... + 6*(1/6) =

3.5

X freq %freq

Low Spender 1 26 0.26 Med Low Spender 2 20 0.20 Average Spender 3 11 0.11 Med High Spender 4 25 0.25 High Spender 5 18 0.18 -----------------------------------------------Total 100 1.00

Mean of X

Mean (X) = 1*0.26 + 2*0.20 + 3*0.11 + 4*0.25 +

5*0.18 = 2.89

fX ~

i

ii xfxXE )()(

fX

)( 1xf1x

)( nxfnx

1Total

1)(

iixf

fX ~

i

ii xfxXE )()( 22

fX

)( 1xf1x

)( nxfnx

1Total

2X21x

2nx

X Q %freq

Low Spender 1 (-2)2 0.26 Med Low Spender 2 (-1)2 0.20 Average Spender 3 02 0.11 Med High Spender 4 12 0.25 High Spender 5 22 0.18 -----------------------------------------------Total 1.00

A new variable Q = (X – 3)2

Mean (Q) = (-2)2*0.26 + (-1)2*0.20 + 02*0.11 +

12*0.25 + 22*0.18

fX ~

i

ii xfcxcXE )()(])[( 22

]))([()( 2XEXEXVar

)(XEc Let ,

*~ fX

XxfxXEi

ii )()( **

*fX

)( 1* xf1x

)(* nxfnx

1Total

Distribution of a sample

i

ii

ii Xxn

xfxXE1

)()( **

*fX

5/21

5/13

1Total

5/22

*fX

5/11

5/13

1Total

5/12

5/11

5/12

Sample mean

freq

2

12

5

2*** ))(()( XEXEXVar

*~ fX

2*2 )(1

)()( xxn

xfxxi

ii

ii

(O)

Sample variance

222)(1

1X

ii sorsxx

n

2*** ))((1

)( XEXEn

nXVar

1

2)(1

1

ii xx

n

For large n,

1

2)(1

ii xx

n

11

n

n

20n large enough

1

22 )(1

1

ii xx

ns

n N

1

22 )(1

iixN

X

Standard deviation

)()( XVarXsd

)(*)(* XVarXsd

X V freq

Low Spender 1 (1-2.89)2 26 Med Low Spender 2 (2-2.89)2 20 Average Spender 3 (3-2.89)2 11 Med High Spender 4 (4-2.89)2 25 High Spender 5 (5-2.89)2 18 -----------------------------------------------Total 100

V = (X – 2.89 )2

Var*(X)= (1/99)[(1-2.89)2*26 + …+ (5-2.89)2*18] =

2.22 sd*(X) = 1.49

dist’n of a sample pop’n dist’n

statistics

sample mean population mean

sample variance population variance

sample median population median

…. ….

Nn

no. of teeth

weight of body

no. of phone calls

N

no. of teeth weight of body

N

freqxf ii )( )(xf

1)( dxxf1)( i

ixf

no. of phone calls

n

n

freqxf ii )(

1)( i

ixf

dxxf )(i

ixf )(

dxxfx )(2i

ii xfx )(2

E

)(,)(,)(* xfxfxf ii

dxxfxXEXEXVar )()())(()( 22

dxxfxXE )()(

i

ii xfxXE )()(

)()())(()( 22ii

i

xfxXEXEXVar

Expected value

dxxfxXE )()(

i

ii xfxXE )()(

X f(xi)

Head 1 0.5

Tail 0 0.5

5.0)( XE

0 1

Y f(yi)

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6

5.3)( YE

1)1( E

1)(1)1( i

ixfE

ccE )( X f(xi)

1 1/2

1 1/4

1 1/8

1 1/8

)(3)3( XEXE

)(3)(3)(3)3( XExfxxfxXEi

iii

ii

X 3X f(xi)

1 3 1/2

2 6 1/4

3 9 1/8

4 12 1/8

)()( XEccXE

)()1()()1)(())(( XEEXEXEEXEE

2))(()()())(())(( XEXEXEXXEEXXEE

E

)(),(),(* xfxfxf ii

100 x + 10 x

i ii i iii ybxaybxa )(

)()()( YEbXEaYbXaE

100 x + 10 x

X Y 100X 10Y 100X+10Y

f

1 (H) 1 100 10 110 1/12

0 (T) 1 0 10 10 1/12

1 (H) 2 100 20 120 1/12

0 (T) 2 0 20 20 1/12

1 (H) 6 100 60 160 1/12

0 (T) 6 0 60 60 1/12

]6010110)[12/1()10100( YXE

85)(10)(100 YEXE

2))(()( XEXEXVar

22 ))(()( XEXE

22 ))(()(2 XEXEXXE

22 )())(()( cXEXEXEXVar

For any constantc

0)1( Var

)()( 2 XVaraaXVar

Thank you !!

Recommended