View
9
Download
0
Category
Preview:
Citation preview
1
CS 2710 Foundations of AI
CS 2710 Foundations of AILecture 23
Milos Hauskrechtmilos@cs.pitt.edu5329 Sennott Square
Supervised learning.Multilayer neural networks.
CS 2710 Foundations of AI
Announcements
Homework 10: • due on Thursday, December 2, 2004
Final exam: December 14, 2004 at 1:00pm-3:00pm• Location: Sennott Square 5129• Closed book• Cumulative
2
CS 2710 Foundations of AI
Linear units
Logistic regressionLinear regression
∑
1
)|1( xyp =
0w
1w2w
dw
z∑
1
1x0w
1w2w
dw
dx
2x
)(xf
∑=
+=d
jjj xwwf
10)(x )(),|1()(
10 ∑
=
+===d
jjj xwwgypf wxx
jjj xfyww ))(( x−+← α jjj xfyww ))(( x−+← α
))((00 xfyww −+← αOn-line gradient update:
))((00 xfyww −+← αOn-line gradient update:
The same
=)(xf
1x
dx
2x
CS 2710 Foundations of AI
Limitations of basic linear units
Logistic regressionLinear regression
∑
1
)|1( xyp =
0w
1w2w
dw
z∑
1
0w
1w2w
dw
)(xf
Function linear in inputs !! Linear decision boundary!!
∑=
+=d
jjj xwwf
10)(x )(),|1()(
10 ∑
=
+===d
jjj xwwgypf wxx
1x
dx
2x1x
dx
2x
3
CS 2710 Foundations of AI
Extensions of simple linear units
)()(1
0 xx j
m
jjwwf φ∑
=
+=
∑)(1 xφ
)(2 xφ
)( xmφ
1
1x
0w
1w2w
mwdx
)(xjφ - an arbitrary function of x
• use feature (basis) functions to model nonlinearities
))(()(1
0 xx j
m
jjwwgf φ∑
=
+=
Linear regression Logistic regression
CS 2710 Foundations of AI
Regression with the quadratic model.
Limitation: linear hyper-plane onlya non-linear surface can be better
4
CS 2710 Foundations of AI
Classification with the linear model.
Logistic regression model defines a linear decision boundary• Example: 2 classes (blue and red points)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2Decision boundary
CS 2710 Foundations of AI
Linear decision boundary• logistic regression model is not optimal, but not that bad
-4 -3 -2 -1 0 1 2 3 4 5 6-4
-3
-2
-1
0
1
2
3
4
5
5
CS 2710 Foundations of AI
When logistic regression fails?
• Example in which the logistic regression model fails
-4 -3 -2 -1 0 1 2 3 4 5-4
-3
-2
-1
0
1
2
3
4
5
CS 2710 Foundations of AI
Limitations of linear units.
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
• Logistic regression does not work for parity functions-no linear decision boundary exists
Solution: a model of a non-linear decision boundary
6
CS 2710 Foundations of AI
Example. Regression with polynomials.Regression with polynomials of degree m• Data points: pairs of • Feature functions: m feature functions
• Function to learn:i
m
iii
m
ii xwwxwwxf ∑∑
==
+=+=1
01
0 )(),( φw
ii xx =)(φ
>< yx,
mi ,,2,1 K=
∑x=)(1 xφ2
2 )( x=xφ
mm x=)(xφ
1
x
0w
1w2w
mw
CS 2710 Foundations of AI
Learning with extended linear units
)()(1
0 xx j
m
jjwwf φ∑
=
+=
Feature (basis) functions model nonlinearities
Important property:• The problem of learning the weights is the same as it was for
the linear units• Trick: we have changed the inputs – but the weights are still
linear in the new input
∑)(1 xφ
)(2 xφ
)( xmφ
1
1x
0w
1w2w
mwdx
))(()(1
0 xx j
m
jjwwgf φ∑
=
+=
Linear regression Logistic regression
7
CS 2710 Foundations of AI
Function to learn:
On line gradient update for the <x,y> pair
Gradient updates are of the same form as in the linear and logistic regression models
)()),(( xwx jjj fyww φα −+=
)),((00 wxfyww −+= α
Learning with feature functions.
)(),(1
0 xwwxf i
k
iiφ∑
=
+=w
CS 2710 Foundations of AI
Example: Regression with polynomials of degree m
• On line update for <x,y> pair
jjj xfyww )),(( wx−+= α
)),((00 wxfyww −+= α
Example. Regression with polynomials.
im
iii
m
ii xwwxwwxf ∑∑
==
+=+=1
01
0 )(),( φw
8
CS 2710 Foundations of AI
Multi-layered neural networks
• Alternative way to introduce nonlinearities to regression/classification models
• Idea: Cascade several simple neural models with logistic units. Much like neuron connections.
CS 2710 Foundations of AI
Multilayer neural network
Hidden layer Output layerInput layer
Cascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)
∑1
1x)|1( xyp =
)1(1,0w
)1(1,kw)1(2,kw
dx
2x)2(1z
)1(2,0w
∑
)1(1z
)1(2z
1
)2(1,0w
)2(1,1w
)2(1,2w
Example: (2 layer) classifier with non-linear decision boundaries
9
CS 2710 Foundations of AI
Multilayer neural network
• Models non-linearities through logistic regression units• Can be applied to both regression and binary classification
problems
∑1
),|1()( wxx == ypf
)1(1,0w
)1(1,kw)1(2,kw
)2(1z
)1(2,0w
∑
)1(1z
)1(2z
)2(1,0w
)2(1,1w
)2(1,2w
Hidden layer Output layerInput layer
1),()( wxx ff =
regression
classification
option
1x
dx
2x
CS 2710 Foundations of AI
Multilayer neural network
• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)
• The output layer determines whether it is a regression or a binary classification problem
),|1()( wxx == ypf
Hidden layers Output layerInput layer
),()( wxx ff =
regression
classification
option
1x
dx
2x
10
CS 2710 Foundations of AI
Learning with MLP
• How to learn the parameters of the neural network?• Online gradient descent algorithm
– Weight updates based on online error:
• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!
• The process is called back-propagation
),(online wij
jj DJw
ww∂∂
−← α
),(online wiDJ
CS 2710 Foundations of AI
Backpropagation
∑)1( +kxl)(kxi
)(, kw ji )(kzi )1( +kzl)1(, +kw il
∑)1( −kx j
k-th level (k+1)-th level(k-1)-th level
)(kxi - output of the unit i on level k)(kzi - input to the sigmoid function on level k
∑ −+=j
jjiii kxkwkwkz )1()()()( ,0,
))(()( kzgkx ii =
)(, kw ji - weight between units j and i on levels (k-1) and k
11
CS 2710 Foundations of AI
Backpropagation
)(kiδ
))(()( wx,fyKi −−=δ
)(, kw jiUpdate weight using a data point),(
)()()(
,,, wuonline
jijiji DJ
kwkwkw
∂∂
−← α
),()(
)( wuonlinei
i DJkz
k∂∂
=δLet
Then: )1()()(
)()(
),(),(
)( ,,
−=∂∂
∂∂
=∂
∂ kxkkw
kzkzDJ
DJkw ji
ji
i
i
uonlineuonline
ji
δw
w
S.t. is computed from and the next layer )1( +klδ
))(1)(()1()1()( , kxkxkwkk iiill
li −
++= ∑δδ
Last unit (is the same as for the regular linear units):
It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!
>=< yDu ,x
)(kxi
CS 2710 Foundations of AI
Learning with MLP
• Online gradient descent algorithm– Weight update:
),()(
)()( online,
,, wuji
jiji DJkw
kwkw∂
∂−← α
)1()()(
)()(
),(),(
)( ,,
−=∂∂
∂∂
=∂
∂ kxkkw
kzkzDJ
DJkw ji
ji
i
i
uonlineuonline
ji
δw
w
)1()()()( ,, −−← kxkkwkw jijiji αδ
)1( −kx j
)(kiδ- j-th output of the (k-1) layer
- derivative computed via backpropagationα - a learning rate
12
CS 2710 Foundations of AI
Online gradient descent algorithm for MLP
Online-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterations
do select a data point Du=<x,y> from Dset learning rate compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)
end forreturn weights w
)(, kw ji
α
)1()()()( ,, −−← kxkkwkw jijiji αδ
)(kx j
)(kiδ
CS 2710 Foundations of AI
Xor Example.
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
• linear decision boundary does not exist
13
CS 2710 Foundations of AI
Xor example. Linear unit
CS 2710 Foundations of AI
Xor example. Neural network with 2 hidden units
14
CS 2710 Foundations of AI
Xor example. Neural network with 10 hidden units
CS 2710 Foundations of AI
MLP in practice
• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions
10 outputs (0,1,…9)…
20x20 = 400 inputs
5 10 3000
4 300 1200
3 1200 50000
2 784 3136
1 3136 78400
layer Neurons Weights
Recommended