7. 1.4. X (samples) Y (responds) f : X Y f x1, x2, . . . , xN :
f (xi) = yi (i = 1, 2, . . . , N ). (xi, yi) X Y . {(x1, y1), (x2 ,
y2), . . . , (xN , yN )} , . : f
8. f ? f : X Y , : f , . . f (xi) = f (xi) f (xi) f (xi) (i =
1, 2, . . . , N ). f : ( ) f , . f ( ) , , , , . . . f . f ,
(fitting) .
9. , f , X , . , , : X D. D , , . , , , D = {1, 2, . . . , s}.
|D| = 2 , , , D = {0, 1}, . D , , D = {, , , } D R, ...
10. (1, 2, . . . , p) , 1(x), 2(x), . . . , p(x) x. . x : x =
(x1, x2, . . . , xp) = 1(x), 2(x), . . . , p(x) , X = D1 D2 . . .
Dp . y Y . Y : y = (y1, y2, . . . , yq ) = 1(y), 2 (y), . . . , q
(y) , q = 1, . . y . x , y xj x () .
11. Y . Y , , Y = {1, 2, . . . , K}, ( ): X K Xk = {x X : f (x)
= k} (k = 1, 2, . . . , K). x , . Y = R . f , f . ...
36. 1 I ich io je j a 2 you du tu tu ty 3 he er lui il on 4 we
wir noi nous my 5 you ihr voi vous vy 6 they sie loro ils oni 7
this dieses questo ceci tento 8 that jenes quello cela tamten 9
here hier qui ici zde 10 there dort l a l a tam 11 who wer chi qui
kdo 12 what was che quoi co 13 where wo dove o` u kde 14 when wann
quando quand kdy 15 how wie come comment jak 16 not nicht non ne. .
. pas ne
.................................................................................
205 if wenn se si jestlize 206 because weil perch e parce que
protoze 207 name Name nome nom jm no e
37. . , -.
38. , .
39. English German Dutch Swedish Danish Italian French Spanish
Portuguese Latin Esperanto Slovene Czech Polish Slovio Lithuanian
Latvian Hungarian Finnish Estonian Euskara Quenya Sindarin English
German Dutch Swedish Danish Italian French Spanish Portuguese Latin
Esperanto Slovene Czech Polish Slovio Lithuanian Latvian Hungarian
Finnish Estonian Euskara Quenya Sindarin
40. . 23 , English German Dutch Swedish Danish Italian French
Spanish Portuguese Latin Esperanto Slovene Slovio Czech Polish
Lithuanian Latvian Hungarian Finnish Estonian Quenya Sindarin
Euskara
41. : , ()
42. R : R . , !
43. [1] Hastie T., Tibshirani R., Friedman J. The elements of
statistical learning. Springer, 2001. [2] Ripley B.D. Pattern
recognition and neural networks. Cambridge University Press, 1996.
[3] Bishop C.M. Pattern recognition and machine learning. Springer,
2006. [4] Duda R. O., Hart P. E., Stork D. G. Pattern
classification. New York: JohnWiley and Sons, 2001. [5] Mitchell T.
Machine learning. McGraw Hill,1997. [6] .. . . , , 2005.
48. 1.7. (x, y) (p + 1)- (X, Y ), X Y , F, Pr . X Rp, Y R. P
(x, y) = P (x |y)P (y) , {(x1 , y1), (x2, y2), . . . , (xN , yN )}
, (xi, yi) (X, Y ). f : X Y , x y.
49. () L(y |y) = L(f (x)| y). x , y y = f (x) ( ): L(y |y) = (y
y)2 . 0, f (x) = y, L(y |y) = 1, f (x) = y. K K K L = ( ky ), ky =
L(k |y). , , Y = {0, 1}, y = 0 , y = 1 . L(1|1) = L(0|0) = 0 L(1|0)
= 1 L(0|1) = 10
50. :
51. . R(f ) = E L f (x)| y = L f (x)| y dP (x, y) XY c , . : f
F , R(f ). : P (x, y) R(f ). : 1) P (x, y) , R(F ) 2)
52. 1.8. 1.8.1. R(f ) = L f (x)| y dP (x, y) () XY 1) (x1, y1
), . . . , (xN , yN ) P (x, y). 2) P (x, y) (*) P (x, y) . P (x, y)
. N .
53. 1.1 , , , . , ( ) .
54. 1.8.2. {(x1, y1), . . . , (xN , yN )} , P (X, Y ), 1 N R(f
) R(f ) = R(f, x1, y1, . . . , xN , yN ) = N L f (xi)|yi , i=1 R(f
) . xi, yi , R(f ) (). , 2 E R(f ) = E L f (X)|Y = R(f ), D R(f ) =
, N 2 L(f (X)| Y ). , 2 f . ?
55. 1.2 1 R(f ) R(f ) R(f ) + . N N . D R(f ) Pr |R(f ) E R(f
)| > 2 . . , 2 E R(f ) = E L f (X)|Y = R(f ), D R(f ) = . N .
1.3 f F lim Pr |R(f ) R(f )| > = 0, N . . R(f ) R(f ).
56. , .
57. ( ) : F f , R(f ), f f . , R(f ) R(f ). , , R(f ), R(f ) ,
. 1.3, : lim Pr sup |R(f ) R(f )| > = 0. N f F
59. F = {f : f (x, ), [0, 1]} R() , R() f (x, ) R() R() R() R(
) R() R( ). lim Pr |R(f ) R(f )| > = 0. N
60. R() R() R() R( ) lim Pr sup |R(f ) R(f )| > = 0. N f
F
61. 1.4 . , 1 N 2 R(f ) = yi f (xi) . N i=1 . 1 N R(p) = N ln
p(xi) i=1 ().
62. 1.8.3. R(f ) = L f (x)| y dP (x, y) = L f (x)| y dP (y | x)
dP (x), XY X Y . . R(f ) = E L f (x)| Y |x dP (x) X
63. : 2 2 R(f ) = y f (x) dP (y |x) dP (x) = E Y f (x) |x dP
(x). X Y X , R(f ) : f (x) = argmin E (Y c)2 | x , (1) c f (x) = E
(Y |x). (2) . , y x . 1.5 , (1) (2), R(f ) = E D (Y | X). 1.6 , L(y
| y) = |y y|, f (x) = median(Y |x).
64. E (Y |x) . 1) f (x) 1 f (x) = yi, |I(x)| iI(x) I(x) = {i :
xi = x} , , , x . 2) k 1 f (x) = yi, k xiNk (x) Nk (x) k , ( ) x.
() , f (x) = yi, xi x .
67. 1.8.4. . Y = {1, 2, . . . , K}. K R(f ) = L f (x)| y Pr (y
| x) dP (x). () y=1 X 0, y = y, L(y |y) = 1, y = y. (**) ( x) R(f )
= 1 Pr Y = f (x)| x dP (x), X f (x) = argmin R(f ): f (x) = argmin
1 Pr (y | x) , yY
73. 1.8.5. [Robins, Monroe, 1951, , , , 1965, Amari, 1967, ,
1971, 1973]. F : F = {f (x) = f (x, ) : Rq } . , R() = L f (x, )| y
dP (x, y). XY (k+1) = (k) k L f (x(k), (k))|y (k) (k = 1, 2, . . .
, N ). k L f (x, )| y , R(). . .
82. 5e+03 African elephant Asian elephant Human Giraffe 5e+02
Donkey Horse Chimpanzee Cow Gorilla Rhesus monkey Sheep Pig Jaguar
Brachiosaurus Potar monkey Grey wolf Goat 5e+01 Triceratops brain
Kangaroo Dipliodocus Cat Rabbit Mountain beaver 5e+00 Guinea pig
Mole Rat Golden hamster 5e01 Mouse 1e01 1e+01 1e+03 1e+05 body lg
brain = 0 + 1 lg body 0 = 0.94, 1 = 0.75 brain = 8.6 (body)3/4
83. (x1, y1), (x2, y2), . . . , (xN , yN ) xi X , yi Y (i = 1,
2, . . . , N ) f (xi) = yi (i = 1, 2, . . . , N ) f Y =R
84. : y = f (x) + , (), x, E = 0. f (x) = E (Y |X = x) P (y |
x) X f (x).
85. , f (x) . , : p f (x) = 0 + xj j (1) j=1 ( ) q f (x) = j hj
(x), (2) j=1 j , hj (x) . (1) (2) j ( ) , y = 1e1 x + 2e2x.
86. , , (residual sum of squares) N 2 RSS() = yi f (xi, ) . i=1
.
87. c Y . . p(y, ), . N Y : Y1, Y2, . . . , YN (N ..) N : y1,
y2, . . . , yN .. (Y1 , Y2, . . . , YN ): L() = p(y1, y2, . . . ,
yN , ) = p(y1, ) p(y2, ) . . . p(yN , ) L() : N () = ln L() = ln
p(yi, ). i=1 ( Y , p(yi, ) Pr {Y = yi}) , L() ( ()).
88. y = f (x, ) + , N (0, 2) p(y |x) : 2 1 y f (y, ) 1 2 2 p(y
| x, ) = e 2 N N 1 N () = ln p(yi |x, ) = ln 2 N ln 2 yi f (xi, ) 2
i=1 2 2 i=1 RSS() ,
89. 2.1. : p f (x) = 0 + xj j j=1 Xj : ; (, .); ; , , X3 = X1
X2. = (0, 1, . . . , p) , 2 N N p RSS() = yi f (xi) = yi 0 xij j 2
. i=1 i=1 j=1
90. , . , xi , yi xi.
91. y y = 0 + 1 x1 + 2 x2 x2 x1
92. 1.0 0.5 y 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x
93. 1.0 0.5 y 0.0 0.0 0.2 0.4 0.6 0.8 1.0 x
94. 1.0 0.8 0.6 y 0.4 0.2 0.0 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
x
95. data y~x 1.0 x~y prin. comp. 0.5 y 0.0 0.0 0.2 0.4 0.6 0.8
1.0 x
96. RSS()? 1 x11 x12 . . . x1p y1 1 x x ... x y2 21 22 2p X= ,
y= . .................. . 1 xN 1 xN 2 . . . xN p yN 2 RSS() = y X =
(y X) (y X). ( ) X = y ( ). RSS() p + 1 () 0, 1, . . . , p. , : RSS
2RSS = 2X (y X), = 2X X.
97. x0, x1, . . . , xp X. x0, x1, . . . , xp , X X , RSS() , :
X (y X) = 0 X X = X y. = (X X)1X y , X = y X X = X y. X+ = (X X)1 X
() X. x1, x2, . . . , xN y = (y1 , y2, . . . , yp) = X = X(X X)1X
y. H = X(X X)1X , y = Hy y y , x0, x1, . . . , xp
98. H
99. y x2 y x1
100. X , , RSS(), , , -, y y x0, x1, . . . , xp .
101. 2.1.1. - ( ) p Yi = 0 + j xij + Ei (i = 1, 2, . . . , N ),
j=1 j (j = 0, 1, . . . , p). xij ( ), Ei , E Ei = 0, Var Ei = 2,
Cov(Ei, Ej ) = 0 (i = j). Yi , p E Yi = 0 + j xij , (1) j=1 Var Yi
= 2, Cov(Yi, Yj ) = 0 (i = j). (1) E y = X.
102. , . = (X X)1X y, E = (X X)1 X E y = (X X)1X X = , Cov = (X
X)1X 2X(X X)1 = (X X)1 2. E = , . p ei = yi yi = yi j xij j=1 . , N
ei = 0. (2) i=1
103. (2) , n y = 0 + j xj , j=1 N N 1 1 y= N yi, x= N xi . i=1
i=1 , N N yi = yi. i=1 i=1
104. 2 N 1 2 = (yi yi)2. N p 1 i=1 , RSS y (I H)y, E RSS = 2(N
p 1). N p 1 . .
105. RSS : (: ) N TSS = (yi y)2 i=1 , (: , ) n SSR = (yi y)2.
i=1 , TSS = RSS + SSR . 2.1 , TSS = RSS + SSR. , , y y y y, y ,
y.
106. . , 2 SSR RSS r = =1 . TSS TSS RSS Yi f (xi), TSS yi y, r2
, . 0 r2 1. r2 1, RSS TSS. r2 . 2 2 1 r2 ra =r . N p1 , 0.
107. . Ei : Ei N (0, ) (i = 1, 2, . . . , N ). Ei . , N , (X
X)1 2 (N p 1) 2 22 p1. N j .
110. . ( , ): , . (RSS2 RSS1)/(p1 p2) F = , RSS1 /(N p1 1) RSS1
p1 + 1 , RSS2 c p2 + 1 , ( , p1 p2 ). , (??) , F F (p1 p2, N p1 1)
. , F zj (3).
111. . 1, . . . , p ( 0) , p + 1 , y = 0 . , N 1 0 = y = N yi.
i=1 , ( ) 2 N N 1 TSS = yi yi i=1 N i=1 F - (TSS RSS)/p F = , RSS
/(N p 1) RSS = RSS() .
112. Fp, N p1. , TSS = RSS + SSR, n SSR = (yi y)2 i=1 , . , (,
, 0) , , , . , .
113. . j j z (1) vj , j + z (1) vj , z (1) (1 )- : z (10.1) =
1.645, z (10.05) = 1.96, z (10.01) = 2.58, . . (vj j- (X X)1 , se j
= vj j ). , 2 se 95%.
124. : 2, . ., , , . - , . W = 0.9451. p-value 0.02153. = 0.01
. (: ) . N 1 (ei+1 ei)2 i=1 D= N . e2 i i=1 2.2 , 0 D 4. D < D
L() D > 4 D L(),
125. . D L() < D < D U() 4 D U() < D < 4 D L(), . D
U() < D < 4 D U(), . D L() D U() , N , p , , [, . 1, .
211].
126. : P (x) X . H0: P (x) = P (x). H1. 1- : H0 ( H1), . 2- :
H0 , . .
127. , , 1- (, 0.1, 0.05, 0.01) t t(X), ( H0). T = T (H0, )
t(X), , Pr (t T |H0) = . t T , H0 . t T , , / H0, H0 . p(t) p(t) t
T t T
128. T (H0, ) p(t) t {t : t t } t T p(t) t {t : t t } T t p(t)
t {t : |t| t} T t t T .
129. 2.1.2. p-value p-value [0, 1] t t(X) ( H0, ). p-value , t
t(X) T (H0, ): p-value(t , H0) = inf { : t T (H0, )} p-value.
p-value , H0 . H0 .
130. T = {t : t t }, p-value = Pr {t(X) t} p(t) t t t T = {t :
t t }, p-value = Pr {t(X) t} p(t) t t t T = {t : |t| t}, p-value =
Pr {|t(X)| |t|} p(t) t t t t t p-value = , =
131. 2.2. , , . : . , : . , , , , - , , .
132. data 2.5 degree = 1 degree = 2 degree = 5 degree = 8 2.0 y
1.5 1.0 0.2 0.4 0.6 0.8 1.0 x Y = X 2 0.8X + 7 + , N (0, 0.05)
147. , , . , , 0, , . , k , . . RSS() RSS() F = . RSS()/(N k 2)
, F . , , F 90% 95% F (1, N k 2) . , .
148. 2.3.2. () () (ridge regression) RSS , j : N 2 p p ridge =
argmin yi 0 xij j + j , 2 i=1 j=1 j=1 : , , : N 2 p p ridge =
argmin yi 0 xij j , j s. 2 i=1 j=1 j=1 s , .
149. 2 O ridge 1
150. , . , Xi Xj . : j , i : Xi Xj , j xj ixi. , j .
151. 0 , Xj . 0 . , : 1. : xij xij xj (i = 1, 2, . . . , N ; j
= 1, 2, . . . , p) 0 y, N N 1 1 xj = N xij , y= N yi. i=1 i=1 2. (
0) , , X p ( p + 1)
152. RSSridge(, ) = (y X) (y X) + , RSSridge(, ) min , :
RSSridge 2 RSSridge = 2X (y X) + 2, = 2X X + 2I. ridge = (X X + I)1
X y ( ) , ridge , y. > 0, X X + I ( ), X ( ) , X = y (X X + I) =
X y
153. . (.. ) (X X + I) = X y X = y. , > 0. () () (X X + I) =
X y. 2.3 (.. ) n , n > 0 n 0. (n) X = y ( ). ( , , ) , , () . n
, .
154. 2.3.3. , SVD- (singular value decomposition), X N p X = U
D V N p N p pp pp U N p (U U = I), V p p (V = V1), D = diag(d1, d2,
. . . , dp) p p , d1 d2 . . . dp 0. d1, d2, . . . , dp ( ) X U V ,
u1, u2, . . . , up U , x1, x2, . . . , xp X.
155. ls yls ( ), SVD: 1 ls = (X X)1 X y = (UDV ) UDV (UDV ) y =
1 = VDU UDV VDU y = (V )1D2V1VDU y = VD1 U y p yls = X ls = UDV
VD1U y = UU y = uj uj y j=1 , uj (uj y) y uj , ( ) y , u1 , u2 , .
. . , up . , : p d2 yridge = X ridge = X(X X + I)1X y = UD(D + I)1
DU y = uj j uj y j=1 d2 j +
156. p d2 yridge = UD(D + I)1 DU y = uj j uj y j=1 d2 j + , ,
u(uj y) y uj , d2 j 1. d2 + j dj , 1. dj , 0. , , dj .
157. dj ? , -, , . S = XX /N , 1 1 S= XX = VDU (VDU ) = V D2/N
V . N N , v1, v2, . . . , vp V S, 2 dj /N . vj ( ) (principal
components) X.
166. . s t= p |j | j=1 lasso t = 0 j . lasso t = 1 j = j (j =
1, 2, . . . , p).
167. lasso s j t = p |j | j=1 . 4 3 RM RAD 2 ZN 1 B CHAS INDUS
j 0 AGE 1 CRIM TAX 2 NOX PTRATIO 3 DIS LSTAT 4 0 0.2 0.4 0.6 0.8 1
t
168. 2.3.5. (principal component regression): y z1, z2, . . . ,
zM , M p. z1, z2, . . . , zM , M M y, zm y pcr =y+ mzm, pcr (M ) =
mvm, m = zm, zm . m=1 m=1 . X, , , , p M .
169. 2.3.6. (partial least squares) xj (j = 1, 2, . . . , p) zm
(m = 1, 2, . . . , M ). xj y.
170. y xj , 0, 1. begin y(0) = 1y x(0) = xj (j = 1, 2, . . . ,
p) j for m = 1, 2, . . . , p p zm = mj x(m1), mj = y, x(m1) j j j=1
(m) (m1) y, zm y =y + mzm, m = zm, zm x(m1), zm j xj = x(m1) (m) j
zm (j = 1, 2, . . . , p) zm, zm end m pls y (m) (m = 1, 2, . . . ,
p) jm = j =1 end p pls y (m) = jmxj (m = 1, 2, . . . , p) j=1
171. , xj y xj y
172. , vm max Var(X) : = 1, v S = 0, = 1, 2, . . . , m 1 , S =
X X/N X. v S = 0 , zm = X z = Xv ( < m). max Corr2(y, X) Var(X)
: = 1, S = 0, = 1, 2, . . . , m 1 .
173. 2.3.7. ( ) : QR-, , SV D- . O(N p2 + p3) .
174. . N 2 p lasso = argmin yi 0 xij j , i=1 j=1 p p |j | s j j
s, j {1, 1} . j=1 j=1 p + 1 2p ! (+) () (+) () j = j j , j 0, j 0
(j = 1, 2, . . . , p) p (+) () j + j s, (+) j 0, () j 0 (j = 1, 2,
. . . , p)