Supervised learning. Multilayer neural networks

CS 2710 Foundations of AI

CS 2710 Foundations of AILecture 23

Milos Hauskrechtmilos@cs.pitt.edu5329 Sennott Square

Supervised learning.Multilayer neural networks.

Announcements

Homework 10: • due on Thursday, December 2, 2004

Final exam: December 14, 2004 at 1:00pm-3:00pm• Location: Sennott Square 5129• Closed book• Cumulative

Linear units

Logistic regressionLinear regression

)|1( xyp =

jjj xwwf

10)(x )(),|1()(

10 ∑

jjj xwwgypf wxx

jjj xfyww ))(( x−+← α jjj xfyww ))(( x−+← α

))((00 xfyww −+← αOn-line gradient update:

The same

Limitations of basic linear units

Logistic regressionLinear regression

)|1( xyp =

Function linear in inputs !! Linear decision boundary!!

jjj xwwf

10)(x )(),|1()(

10 ∑

jjj xwwgypf wxx

Extensions of simple linear units

0 xx j

jjwwf φ∑

∑)(1 xφ

)(2 xφ

)( xmφ

)(xjφ - an arbitrary function of x

• use feature (basis) functions to model nonlinearities

))(()(1

0 xx j

jjwwgf φ∑

Linear regression Logistic regression

Regression with the quadratic model.

Limitation: linear hyper-plane onlya non-linear surface can be better

Classification with the linear model.

Logistic regression model defines a linear decision boundary• Example: 2 classes (blue and red points)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

2Decision boundary

Linear decision boundary• logistic regression model is not optimal, but not that bad

-4 -3 -2 -1 0 1 2 3 4 5 6-4

When logistic regression fails?

• Example in which the logistic regression model fails

-4 -3 -2 -1 0 1 2 3 4 5-4

Limitations of linear units.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

• Logistic regression does not work for parity functions-no linear decision boundary exists

Solution: a model of a non-linear decision boundary

Example. Regression with polynomials.Regression with polynomials of degree m• Data points: pairs of • Feature functions: m feature functions

• Function to learn:i

ii xwwxwwxf ∑∑

0 )(),( φw

ii xx =)(φ

>< yx,

mi ,,2,1 K=

∑x=)(1 xφ2

2 )( x=xφ

mm x=)(xφ

Learning with extended linear units

0 xx j

jjwwf φ∑

Feature (basis) functions model nonlinearities

Important property:• The problem of learning the weights is the same as it was for

the linear units• Trick: we have changed the inputs – but the weights are still

linear in the new input

∑)(1 xφ

)(2 xφ

)( xmφ

))(()(1

0 xx j

jjwwgf φ∑

Linear regression Logistic regression

Function to learn:

On line gradient update for the <x,y> pair

Gradient updates are of the same form as in the linear and logistic regression models

)()),(( xwx jjj fyww φα −+=

)),((00 wxfyww −+= α

Learning with feature functions.

)(),(1

0 xwwxf i

iiφ∑

Example: Regression with polynomials of degree m

• On line update for <x,y> pair

jjj xfyww )),(( wx−+= α

)),((00 wxfyww −+= α

Example. Regression with polynomials.

ii xwwxwwxf ∑∑

0 )(),( φw

Multi-layered neural networks

• Alternative way to introduce nonlinearities to regression/classification models

• Idea: Cascade several simple neural models with logistic units. Much like neuron connections.

Multilayer neural network

Hidden layer Output layerInput layer

Cascades multiple logistic regression unitsAlso called a multilayer perceptron (MLP)

1x)|1( xyp =

)1(1,0w

)1(1,kw)1(2,kw

2x)2(1z

)1(2,0w

)2(1,0w

)2(1,1w

)2(1,2w

Example: (2 layer) classifier with non-linear decision boundaries

• Models non-linearities through logistic regression units• Can be applied to both regression and binary classification

problems

),|1()( wxx == ypf

)1(1,0w

)1(1,kw)1(2,kw

)1(2,0w

)2(1,0w

)2(1,1w

)2(1,2w

Hidden layer Output layerInput layer

1),()( wxx ff =

regression

classification

option

• Non-linearities are modeled using multiple hidden logistic regression units (organized in layers)

• The output layer determines whether it is a regression or a binary classification problem

),|1()( wxx == ypf

Hidden layers Output layerInput layer

),()( wxx ff =

regression

classification

option

Learning with MLP

• How to learn the parameters of the neural network?• Online gradient descent algorithm

– Weight updates based on online error:

• We need to compute gradients for weights in all units• Can be computed in one backward sweep through the net !!!

• The process is called back-propagation

),(online wij

jj DJw

ww∂∂

−← α

),(online wiDJ

Backpropagation

∑)1( +kxl)(kxi

)(, kw ji )(kzi )1( +kzl)1(, +kw il

∑)1( −kx j

k-th level (k+1)-th level(k-1)-th level

)(kxi - output of the unit i on level k)(kzi - input to the sigmoid function on level k

∑ −+=j

jjiii kxkwkwkz )1()()()( ,0,

))(()( kzgkx ii =

)(, kw ji - weight between units j and i on levels (k-1) and k

Backpropagation

)(kiδ

))(()( wx,fyKi −−=δ

)(, kw jiUpdate weight using a data point),(

)()()(

,,, wuonline

jijiji DJ

kwkwkw

∂∂

−← α

)( wuonlinei

i DJkz

k∂∂

=δLet

Then: )1()()(

),(),(

−=∂∂

∂∂

∂ kxkkw

kzkzDJ

DJkw ji

uonlineuonline

S.t. is computed from and the next layer )1( +klδ

))(1)(()1()1()( , kxkxkwkk iiill

li −

++= ∑δδ

Last unit (is the same as for the regular linear units):

It is the same for the classification with the log-likelihoodmeasure of fit and linear regression with least-squares error!!!

>=< yDu ,x

Learning with MLP

• Online gradient descent algorithm– Weight update:

)()( online,

,, wuji

jiji DJkw

kwkw∂

∂−← α

)1()()(

),(),(

−=∂∂

∂∂

∂ kxkkw

kzkzDJ

DJkw ji

uonlineuonline

)1()()()( ,, −−← kxkkwkw jijiji αδ

)1( −kx j

)(kiδ- j-th output of the (k-1) layer

- derivative computed via backpropagationα - a learning rate

Online gradient descent algorithm for MLP

Online-gradient-descent (D, number of iterations)Initialize all weightsfor i=1:1: number of iterations

do select a data point Du=<x,y> from Dset learning rate compute outputs for each unitcompute derivatives via backpropagationupdate all weights (in parallel)

end forreturn weights w

)(, kw ji

)1()()()( ,, −−← kxkkwkw jijiji αδ

)(kx j

)(kiδ

Xor Example.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

• linear decision boundary does not exist

Xor example. Linear unit

Xor example. Neural network with 2 hidden units

Xor example. Neural network with 10 hidden units

MLP in practice

• Optical character recognition – digits 20x20– Automatic sorting of mails– 5 layer network with multiple output functions

10 outputs (0,1,…9)…

20x20 = 400 inputs

5 10 3000

4 300 1200

3 1200 50000

2 784 3136

1 3136 78400

layer Neurons Weights

Supervised learning. Multilayer neural networks

Documents

Chapter 6: Multilayer Neural Networks (Sections 6.1-6.3)

Multilayer 3D electrodes for neural implants

Image retrieval using multilayer neural networkshinton/csc2535/notes/lec12r.pdfImage retrieval using multilayer neural networks Geoffrey Hinton. Overview • An efficient way to train

Using a Non-linear Multilayer Feedforward Neural Network

6.- Supervised Neural Networks: Multilayer Perceptron

6.- Supervised Neural Networks: Multilayer Perceptron€¦ · Supervised Neural Networks: Multilayer Perceptron by Pascual Campoy ... XOR function 1 ... c) changing # of

Multilayer and Multimodal Fusion of Deep Neural Networks ...research.nvidia.com/sites/default/files/publications/MM16.pdf · Multilayer and Multimodal Fusion of Deep Neural Networks

Neural Networks Part I The Multilayer Perceptron

Encoding strategies in multilayer neural networksdeim.urv.cat/...Encoding_strategies_in_multilayer_neural_networks.pdf · Encoding strategies in multilayer neural networks ... a number

Analyzing the Performance of Multilayer Neural Networks ... · Analyzing the Performance of Multilayer Neural Networks for Object Recognition Pulkit Agrawal, Ross Girshick, Jitendra

Multilayer Neural Networks - University of Haifa

Multilayer Perceptron = FeedForward Neural Network€¦ · Multilayer Perceptron = FeedForward Neural Network History Definition Classification = feedforward operation Learning =

Multilayer and Multimodal Fusion of Deep Neural Networks ...jankautz.com/publications/MMFusion4Video_ACMM16.pdf · Multilayer and Multimodal Fusion of Deep Neural Networks ... proposed

Chapter 9: Supervised Learning Neural Networks

Multilayer Perceptrons CS/CMPE 537 – Neural Networks

Chapter 4 Supervised learning: Multilayer Networks II

Multilayer Perceptron and Neural Networks - Semantic Scholar · Multilayer Perceptron and Neural Networks MARIUS-CONSTANTIN POPESCU1 VALENTINA E. BALAS2 LILIANA PERESCU-POPESCU3 NIKOS

Multilayer Neural Networks: One or Two Hidden Layers?papers.nips.cc/paper/1239-multilayer-neural... · Multilayer Neural Networks: One or Two Hidden Layers? 149 1.1 NOTATIONS AND

Airplane Vortex Encounters Identification Using Multilayer ... · neural networks. Multilayer feedforward neural networks are trained using the back-propagation learning algorithm

Chapter 3 Supervised learning: Multilayer Networks I