DV BQMJDB ´JJ ÔO TUBUJTUJDR cu aplicații în statistică / Adrian Dușa, Bogdan Oancea, Nicoleta...

Preview:

Citation preview

Referenți științifci:

Prof.univ.dr. Elena Druică

Universitatea din București

Prof.univ.dr. Tudorel Andrei

Academia de Studii Economice

Prof.univ.dr. Călin Vâlsan

Bishop’s University, Canada

William’s School of Business

© Editura Universității din București

Șos. Panduri nr. 90-92, 050663 București – ROMÂNIA

Tel./Fax: +40 214102384

E-mail: editura.unibuc@gmail.com

Internet: htp://editura-unibuc.ro

Centru de vânzare:

Bd. Regina Elisabeta nr. 4-12,

030018 București – ROMÂNIA

Tel. +40 213053703

Tehnoredactare: ADRIAN DUȘA

Copertă: MARIUS JULA

Descrierea CIP a Bibliotecii Naționale a României

R cu aplicații în statistică / Adrian Dușa, Bogdan Oancea,

Nicoleta Caragea, … - București : Editura Universității din București,

2015

Conține bibliografe

Index

ISBN 978-606-16-0643-6

I. Dușa, Adrian

II. Oancea, Bogdan

III. Caragea, Nicoleta

004:311

!y

x

\

↑ ↓

±∞

→→→→→→→→→→→→

|||

> 1

x

y

OxOy

Oy

!

F, M, M, F, M, F, F, M, F, M

∥∥∥∥∥∥∥∥∥∥∥

x =

n

∑i=1

xi

n

nn+1

2 nn

8 − 1 = 7 7 − 2 = 55 − 4 = 1

xi − x i = 1, 2, . . . , n

s2 =

n

∑i=1

(xi − x)2

n − 1

ns2

s

s =√

s2

s1

Q1 Q2 Q3 Q2

Q1 Q2Q3

Q1 Q3

AIQ = Q3 − Q1

AIQ

Q1 Q3

Q2

Q1Q3

1488 − 1225 = 263Q3 + 1.5 · AIQ = 1488 + 1.5 · 263 = 1882.5

Y = f (X, e)

M(Y/X) = a0 + a1X

a1

a0

Y = M(Y/X) + e

Y = a0 + a1X + e

Y Xa0 a1 a0 a1

Y = a0 + a1X XtYt (Yt)

YtYt

ut = Yt − Yt, t = 1, 2, ..., n

F(a0, a1)

Y F(a0, a1)

F(a0, a1)) =n

∑t=1

u2t =

n

∑t=1

(Yt − Yt)2 =

n

∑t=1

(Yt − a0 − a1Yt)2

a0 a1 F(a0, a1)

∑n

t=1 Yt = na0 + a1 ∑nt=1 Xt

∑nt=1 XtYt = a0 ∑n

t=1 Xt + a1 ∑nt=1 X2

t

X1 X2 X3 X4 Y

X1 Y

Yt = a0 + a1X1t + et

X1 YYt = −0.669 + 1.1245 ∗ X1t

R2 = 0.675210%

100 − 8.74% 99.99%X1

Yt = a0 + a1X1t + a2X2t + et

Y

Yt = a0 + a1X1t + a2X2t

ut

Yt = Y + ut

F(a0, a1, a2) = u′u =n

∑t=1

u2t =

n

∑t=1

(Yt − a0 − a1X1t − a2X2t)2

a0, a1, a2

X1 X2Y

X1 X2 Y

Yt = 1.9836 + 0.4405 ∗ X1t − 0.6387 ∗ X2t

R2

Xit

Yt

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Y1 = a0 + a1X11 + a2X21 + · · ·+ akXk1 + e1

Y2 = a0 + a1X12 + a2X22 + · · ·+ akXk2 + e2

Y3 = a0 + a1X13 + a2X23 + · · ·+ akXk3 + e3

Yt = a0 + a1X1t + a2X2t + · · ·+ akXkt + et

Yn = a0 + a1X1n + a2X2n + · · ·+ akXkn + en

Y =

⎜⎜⎜⎜⎜⎝

Y1Y2Y3

Yn

⎟⎟⎟⎟⎟⎠

X =

⎜⎜⎜⎜⎜⎝

1 X11 X21 · · · Xk11 X12 X22 · · · Xk21 X13 X23 · · · Xk3

1 X1n X2n · · · Xkn

⎟⎟⎟⎟⎟⎠

A =

⎜⎜⎜⎜⎜⎝

a0a1a2

ak

⎟⎟⎟⎟⎟⎠

e =

⎜⎜⎜⎜⎜⎝

e1e2e3

en

⎟⎟⎟⎟⎟⎠

Y = XA + e

A = (X′X)−1X′Y

Yt = a0 + a1X1t + a2X2t + a3X3t + et

Yt = a0 + a1X1t + a2X2t + a3X3t + a4X4t + et

regresie4

regresie4 !

Yt = 2.1128 − 3.8260 ∗ X1t − 2.5528 ∗ X2t + 3.7555 ∗ X3t + 2.9481 ∗ X4t

X Ya1 a1 a1

P(a1 = 0)X Y

X Y

a0 a0 = 0

H0H1

H0

tai =aisai

H0 ai = 0 Xi Y

a0

a131.79%

a2

a3

a4

R2

Y X

R2 = 1 − ∑nt=1 u2

t

∑nt=1(Yt − Y)2

R2

(R2) ∗ 100%

R2 = 1 − n − 1n − k − 1

(1 − R2)

X2 X3X4

Y X1 X2

µr =∑n

t=1 urt

n

JB = n

[16∗ µ2

3µ3

2+

124

∗(

µ4

µ22− 3

)2]+ n

(32∗ µ2

1µ2

− µ3 ∗ µ1

µ22

)

H0H1

χ2

χ22(α)

α

χ2

χ2

R2

dw =∑n

t=2(ut − ut−1)2

∑nt=1 u2

t

dL dU k n

H0 dU ≤dw ≤ 4 − dU

H0 dw ≤ dL dw ≤ 4dU

dL ≤ dw ≤ dU 4 − dU ≤ dw ≤ 4 − dL

±

(X′X)R2

(λmax) (λmin)

CIi =

√λmax

λi

λi

λi

X1

X2

(y)

(xi)

(y)(x)

(x)

p

y

x

n = 35

(y = 1)(x)

xy = 1

yx

x

y x

y

Ω

Ω =p

1 − p

p y

x(1 − p) y

x

p = 0, 6 1 − p = 0, 4 Ω = 0,60,4 = 1, 5

p = 0, 99 1 − p = 0, 01 Ω = 0,990,01 = 99

Ω > 1

Ω < 1

Ω = 1

Ω(0,+)

logit(−,+)p (−,+)

p +logit + p

−Ω = 1 logit

ln(

p1 − p

)= β0 + β1x

β0 β1logit

p

logit(p) = β0 + β1x

p1 − p

= eβ0+β1x

eln(A) = A

Ω = eβ0+β1x

p =eβ0+β1x

1 + eβ0+β1x

(x, y) p > 0, 5y = 1

Ω > 1 logit > 0

p(x) S

S

likelihood

y

logit

β0 β1logit(p) = β0 + β1x

β0 β1

n = 35

y = 1 x

β0 β1

β0 β1

β0 β1β0 β1

β0 β1

ln(

p1 − p

)= 5.495958 − 0.004889 × Venit

β1 logitx x

xeβ1 β1

x x x + 1

Ω(x) = eβ0+β1x = eβ0 × eβ1x

x x + 1

Ω(x) = eβ0+β1(x+1) = eβ0 × eβ1x × eβ1

eβ1

x = x + 1 x

OR =Ω(x+1)

Ω(x)=

eβ0+β1(x+1)

eβ0+β1x =eβ0 × eβ1x × eβ1

eβ0 × eβ1x = eβ1

eβ1

x

eβ1 > 1 xeβ1 = 1, 5

x

eβ1 < 1 xeβ1 = 0, 5

eβ1 = 1 xeβ1 = 1 β1 = 0

x

y

x

p(x) =e5.495958−0.004889×x

1 + e5.495958−0.004889×x

β0 = 5.495958 β1 = 0.004889

eβ1 = 0.995123 < 1

x

n = 30

x

eβ0+β1x

1+eβ0+β1x

x = 1 y = 1

eβ0

1+eβ0

x = 0 y = 1

11+eβ0+β1x x = 1

y = 0

11+eβ0

x = 0y = 0

ln(

p1 − p

)= −19.57 + 21.60 × statut ocupational

β1x

eβ1 = e21.6 = 2.4 × 109 > 1x

2.4 × 109

y = 1 x

p(x) =e5.495958−0.004889×100

1 + e5.495958−0.004889×100 = 0.993354

y = 1x = 100

p(x) =e−19,57+21,60

1 + e−19,57+21,60

y = 1x = 1

x

xy

likelihood y

x

[0, 1]

(−∞, 0]

β0

χ2

−2LL

χ2 χ2

χ2 = −2LL0 − (−2LLM) = −2ln(

LL0

LLM

)

−2LL0

−2LLM

−2LL

χ2

LL0 = LLM

(SST = SSR + SSE)

−2LL0

−2LLM

ln(

p1 − p

)= 5.495958 − 0.004889 × venitul mediu lunar

SEE = ∑i(yi − yi)

2

yi yi

SST = ∑i(yi − y)2

yi ySSR =

∑i(yi − y)2 yi

y

β1 = −0.004889

0.00370 < 0.01

eβ1 = 0.995123

β1[−0.009456652,−0.0238618]

χ2

Pr(> Chi) = 6.453e − 08 < 0.05 χ2

χ2

−2LL

29.22

d f

χ2

k xkd f = n − k − 1 n

k + 1χ2 n − 2

χ2

R2

R2 &R2

R2 = 1 −[−2LL0

−2LLM

]2/n

LL0LLM

n

R2 &R2

R2

R2 R2

χ2 χ2

R2 =1 −

[−2LL0−2LLM

]2/n

1 − (2LL0)2/n

AIC = −2LLk + 2k

BIC = −2LLk + 2 × log(n)

xiy

ln(

p1 − p

)= β0 + β1x1 + ... + βkxk

logit(p) = β0 + β1x1 + ... + βkxk

p1 − p

= eβ0+β1x1+...+βkxk

Ω =p

1 − p

Ω = eβ0+∑

kβkxk

p y1 − p y

β0 β1 βk k

p =eβ0+β1x1+...+βkxk

1 + eβ0+β1x1+...+βkxk

xjxi i = j

OR =Ω(xj+1)

Ω(xj)=

eβ0+β j(xj+1)

eβ0+β jxj=

eβ0 × eβ jxj × eβ j

eβ0 × eβ jxj= eβ j

eβ j

xj

eβ j > 1 xj

xj + 1 eβ j = 1, 5xj

eβ j < 1 xj

xj + 1 eβ j = 0, 5xj

eβ j = 1 xj

eβ j = 1 β j = 0

x1 x2

y

x1 x1 = 1x1 = 2

x2x2 = 1

x2 = 2

ln(

p1 − p

)= β0 + β1x1 + β2x2

p

β0 = −6.202β1 = −2.449 β2 = 6.297

eβ1 = e−2.449 = 0.08636 < 1x1 x1 = 1

x1 = 2

β1p =

0.02916

eβ2 = e6.297 = 542.9406 > 1x2 x2 = 1

x2 = 2

β2

x1 x2

eβ J = 1 β j = 0

Y J

Yi =

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

categoria − 1categoria − 2categoria − 3

...categoria − J

i

pi =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

pi1pi2pi3...pij...piJ

j = 1, 2, ..., JJ > 2 J − 1 J

ln(

pi11 − piJ

)= β10 + β11 × xi1 + β12 × xi2 + ... + β1k × xik = β′

1 × xi

ln(

pi21 − piJ

)= β20 + β21 × xi1 + β22 × xi2 + ... + β2k × xik = β′

2 × xi

ln( pij

1 − piJ

)= β j0 + β j1 × xi1 + β j2 × xi2 + ... + β jk × xik = β′

j × xi

ln(

piJ

1 − piJ

)= β J0 + β J1 × xi1 + β J2 × xi2 + ... + β Jk × xik = β′

J × xi

β ji

j

ln( pij

1 − piJ

)= β j0 + β j1 × xi1 + β j2 × xi2 + ... + β jk × xik = β′

j × xi

Ω =pij

1 − pij= eβ j0+β j1×xi1+β j2×xi2+...+β jk×xik

J = 2(j = 1, 2)

ln(

pi11 − pi1

)= β10 + β11 × xi1 + β12 × xi2 + ... + β1k × xik = β′

1 × xi

i

pi1 =eβ10+β11×xi1+β12×xi2+...+β1k×xik

1 + eβ10+β11×xi1+β12×xi2+...+β1k×xik=

eβ′1×xi

1 + eβ′1×xi

i

pi2 = 1 − pi1 = 1 − eβ10+β11×xi1+β12×xi2+...+β1k×xik

1 + eβ10+β11×xi1+β12×xi2+...+β1k×xik=

11 + eβ′

1×xi

Ω =pi1

1 − pi1= eβ10+β11×xi1+β12×xi2+...+β1k×xik = eβ′

1×xi

Ω =p

1 − p= eβ0+β1×xi

J > 2

ij < J

pij =eβ′

J×xi

1 +J−1∑

j=1eβ′

j×xi

j < J

ij = J

piJ =1

1 +J−1∑

j=1eβ′

j×xi

j = J

jJ

xi1, ..., xik

xik xik + 1

OR =Ω(xik+1)

Ω(xik)=

eβ j0+β j1×xi1+β j2×xi2+...+β jk×(xik+1)

eβ j0+β j1×xi1+β j2×xi2+...+β jk×xik= eβ jk

β jkj J

xik

yk

x1k<

>

x2k

x3k

x4k

−LL−2LL

−2LL−2LL

−2LL

j

ln(

pcasnica1 − psalariat

)= βcasnica0 + βcasnicaNATMaghiar × xNATMaghiar+

+βcasnicaNATRoman × xNATRoman + βcasnicaNATRom × xNATRom+

+βcasnicaNIVEscazut × xNIVEscazut + βcasnicaNIVEsuperior × xNIVEsuperior

βcasnicaNIVEsuperior

j

ln(

pelev1 − psalariat

)= βelev0 + βelevNATMaghiar × xNATMaghiar+

+βelevNATRoman × xNATRoman + βelevNATRom × xNATTRom+

+βelevNIVEscazut × xNIVEscazut + βelevNIVEsuperior × xNIVEsuperior

ln( ppensionar

1 − psalariat

)= βpensionar0 + βpensionarNATMaghiar × xNATMaghiar+

+βpensionarNATRoman × xNATRoman + βpensionarNATRom × xNATRom+

+βpensionarNIVEscazut × xNIVEscazut + βpensionarNIVEsuperior × xNIVEsuperior

ln(

pstudent1 − psalariat

)= βstudent0 + βstudentNATMaghiar × xNATMaghiar+

+βstudentNATRoman × xNATRoman + βstudentNATRom × xNATRom+

+βstudentNIVEscazut × xNIVEscazut + βstudentNIVEsuperior × xNIVEsuperior

βstudentNIVEsuperior

eβstudentNIVEsuperior = 0.2540029

1 10

≤<

≥>

Volumei = a · Girthi + b · Heighti + ϵi

Permi = a · Areai + b · Perii + c cot Shapei + ϵi

Yi = a · X1i + b · X2i + ϵi, i = 1...100000000

X

acest: 12si: 35de: 45

acest: 10si: 40de: 15

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

acest: 12si: 78de: 12

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

acest: 34si: 153de: 72

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Calculator 1 Calculator 2 Calculator n Hardware-ul

Sistemul de fisiere distribuit - HDFS

MapReduce

Sistemul Hadoop

Software de analiza statistica

Pig Hive Rhipe RHadoop Streaming

R

R R

Nivelul Middleware

Interfata

yi = β1 × xi1 + · · ·+ βp × xip + ϵi = xTi × β + ϵi, i = 1, . . . , n

yi xi p

i 1 n

n

y =

⎜⎜⎜⎝

y1y2

yn

⎟⎟⎟⎠, X =

⎜⎜⎜⎝

xT1

xT2

xTn

⎟⎟⎟⎠=

⎜⎜⎜⎝

x1,1 · · · x1,px2,1 · · · x2,p

xn,1 · · · xn,p

⎟⎟⎟⎠, β =

⎜⎜⎜⎝

βT1

βT2

βTp

⎟⎟⎟⎠, ϵ =

⎜⎜⎜⎝

ϵT1

ϵT2

ϵTn

⎟⎟⎟⎠

y = X × β + ϵ

β

β

β = (XTX)−1XTy

XTX(XTX)−1 XTy

(XTX)−1

XTXβ = XTy

XTX XTy β

β = (XTX)−1XTy

solve(XTX, XTy)

XX y

n = 20000 xiX (20000, 15) y

20000 A(20000, 15)

solve(XTX, XTy)

XTX

(15, 20000)× (20000, 15) = (15, 15)(15, 15)

XTy (15, 20000)× (20000, 1) = (15, 1)

solve(XTX, XTy)

X20000 × 15 = 300000

X1, 2, 3 . . . 20000 X

y

XTX

Xr

XXr

Xr

X

XXTX

mm < n n

XTX

XTy

yXr

y

Recommended