48
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi Arabia [email protected]

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

Embed Size (px)

Citation preview

Page 1: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

Neural Networks and Machine Learning

ApplicationsCSC 563

Prof. Mohamed Batouche

Computer Science DepartmentCCIS – King Saud University

Riyadh, Saudi Arabia

[email protected]

Page 2: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

Artificial Complex Systems

Artificial Neural NetworksPerceptrons and Multi Layer

Perceptrons (MLP)

Page 3: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

Artificial Neural Networks

Perceptron

Page 4: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

4

The Perceptron

x1

w0

y

x2

x3

x4

x5

w1

w2

w3

w4

w5

0

1

sgn wxwyn

iiiΣΣ

Initialisation : 00,..0 iwni

The first model of a biological neuron

Page 5: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

5

Artificial Neuron: Perceptron

It’s a step function based on a linear It’s a step function based on a linear combination of real-valued inputs. If the combination of real-valued inputs. If the combination is above a threshold it combination is above a threshold it outputs a 1, otherwise it outputs a –1. outputs a 1, otherwise it outputs a –1.

x1x1

x2x2

xnxn

{1 or –1}{1 or –1}

X0=1X0=1

w0w0

w1w1

w2w2

wnwn

ΣΣ

Page 6: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

6

Perceptron: activation rule

O(xO(x11,x,x22,…,x,…,xnn) = ) = 1 if w1 if w00 + w + w11xx11 + w + w22xx22 + … + w + … + wnnxxnn > 0 > 0

-1 otherwise-1 otherwise

To simplify we can represent the function as follows:To simplify we can represent the function as follows:

O(O(XX) = sgn() = sgn(WWTTXX) where) where

sgn(y) = sgn(y) = 1 if y > 01 if y > 0-1 otherwise-1 otherwise

Activation Rule: Linear Threshold (step Unit)

Page 7: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

7

What a Perceptron does ?

For a perceptron with 2 input variables namely x1 and x2

Equation WTX = 0 determines a line separating positive from negative examples.

x2

w1 x

1 + w2 x

2 + w0 = 0

x1

x1

y

x2

w1

w2

ΣΣ

w0

y = sgn(w1x1+w2x2+w0)

Page 8: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

8

What a Perceptron does ?

For a perceptron with n input variables, it draws a For a perceptron with n input variables, it draws a hyperplane as the decision boundary over the (n-hyperplane as the decision boundary over the (n-dimensional) input space. It classifies input patterns into dimensional) input space. It classifies input patterns into two classes.two classes.

The perceptron outputs 1 for instances lying on one side of thehyperplane and outputs –1 for instances on the other side.

x3

x2

w1 x

1 + w2 x

2 + w3 x

3 + w0 = 0

x1

Page 9: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

9

What can be represented using Perceptrons?

and or

Representation Theorem: perceptrons can only represent linearly separable functions. Examples: AND,OR, NOT.

Page 10: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

10

Limits of the PerceptronA perceptron can learn only examples that are called A perceptron can learn only examples that are called ““linearly separable”. These are examples that can be perfectly linearly separable”. These are examples that can be perfectly separated by a hyperplane. separated by a hyperplane.

++

++++

----

--

++

++

++--

--

--

Linearly separableLinearly separable Non-linearly separableNon-linearly separable

Page 11: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

11

Functions for Perceptron

Perceptrons can learn many boolean functions:Perceptrons can learn many boolean functions:AND, OR, NAND, NOR, but not XOR AND, OR, NAND, NOR, but not XOR

AND: AND: x1x1

x2x2

X0=1X0=1

W0 = -0.8W0 = -0.8

W1=0.5W1=0.5

W2=0.5W2=0.5ΣΣ

Page 12: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

12

Learning Perceptrons•Learning is a process by which the free parameters of a Learning is a process by which the free parameters of a neural network are adapted through a process of neural network are adapted through a process of stimulation by the environment in which the network is stimulation by the environment in which the network is embedded. The type of learning is determined by the embedded. The type of learning is determined by the manner in which the parameters changes take place.manner in which the parameters changes take place.

•In the case of Perceptrons, we use a supervised learning.In the case of Perceptrons, we use a supervised learning.

•Learning a perceptron means finding the right values for Learning a perceptron means finding the right values for W that satisfy the input examples {(inputW that satisfy the input examples {(inputii, target, targetii))**}}

•The hypothesis space of a perceptron is the space of all The hypothesis space of a perceptron is the space of all weight vectorsweight vectors..

Page 13: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

13

Learning Perceptrons

Principle of learning using the perceptron rule:Principle of learning using the perceptron rule:

1.1. A set of training examples is given: {(x, t)*} where x A set of training examples is given: {(x, t)*} where x is the input and t the target output [supervised is the input and t the target output [supervised learning]learning]

2. Examples are presented to the network.2. Examples are presented to the network.

3.3. For each example, the network gives an output o.For each example, the network gives an output o.

4.4. If there is an error, the hyperplane is moved in order If there is an error, the hyperplane is moved in order to correct the output error.to correct the output error.

5.5. When all training examples are correctly classified, When all training examples are correctly classified, Stop learning.Stop learning.

Page 14: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

14

Learning Perceptrons

More formally, the algorithm for learning Perceptrons is as More formally, the algorithm for learning Perceptrons is as follows:follows:

1.1. Assign random values to the weight vectorAssign random values to the weight vector

2. Apply the 2. Apply the perceptron ruleperceptron rule to every training example to every training example

3. Are all training examples correctly classified?3. Are all training examples correctly classified?

Yes. QuitYes. QuitNo. Go Back to Step 2.No. Go Back to Step 2.

Page 15: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

15

Perceptron Training RuleThe perceptron training rule:The perceptron training rule:

For a new training example [X = (xFor a new training example [X = (x11, x, x22, …, x, …, xnn), t] ), t] update each weight according to this rule:update each weight according to this rule:

wwii = w = wii + Δw + Δwii

Where ΔwWhere Δwii = η (t-o) x = η (t-o) xii

t: target outputt: target outputo: output generated by the perceptrono: output generated by the perceptronη: constant called the learning rate (e.g., 0.1)η: constant called the learning rate (e.g., 0.1)

Page 16: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

16

Perceptron Training RuleComments about the perceptron training rule:Comments about the perceptron training rule:

• If the example is correctly classified the term (t-o) equals zero, If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary.and no update on the weight is necessary.

• If the perceptron outputs –1 and the real answer is 1, the If the perceptron outputs –1 and the real answer is 1, the weight is increased.weight is increased.

• If the perceptron outputs a 1 and the real answer is -1, the If the perceptron outputs a 1 and the real answer is -1, the weight is decreased.weight is decreased.

• Provided the examples are linearly separable and a small value Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples for η is used, the rule is proved to classify all training examples correctly.correctly.

Page 17: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

17

Perceptron Training RuleConsider the following example: (two classes: Red and Green)

Page 18: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

18

Perceptron Training RuleRandom Initialization of perceptron weights …

Page 19: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

19

Perceptron Training RuleApply Iteratively Perceptron Training Rule on the different examples:

Page 20: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

20

Perceptron Training RuleApply Iteratively Perceptron Training Rule on the different examples:

Page 21: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

21

Perceptron Training RuleApply Iteratively Perceptron Training Rule on the different examples:

Page 22: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

22

Perceptron Training RuleApply Iteratively Perceptron Training Rule on the different examples:

Page 23: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

23

Perceptron Training RuleAll examples are correctly classified … stop Learning

Page 24: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

24

Perceptron Training RuleThe straight line w1x+ w2y + w0=0 separates the two classes

W 1x +

W 2y +

W 0 =

0

Page 25: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

25

Demo Matlab

Perception training rule Demo Learning AND, OR functions Try to learn XOR with Perceptron

Page 26: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

26

Learning AND/OR operationsP = [ 0 0 1 1; ... % Input patterns 0 1 0 1 ]; T = [ 0 1 1 1];% Desired Outputs

net = newp([0 1;0 1],1);

net.adaptParam.passes = 35;net = adapt(net,P,T);

x = [1; 1];y = sim(net,x);display(y);

x1

y

x2

w1

w2

ΣΣ

w0

Page 27: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

Artificial Neural Networks

MultiLayer Perceptron (MLP)

Page 28: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

28

Solution for XOR : Add a hidden layer !!

Input nodesInput nodes Internal nodesInternal nodes Output nodesOutput nodes

X1

X2

X1 XOR X2

x1

x2

x1

x2

x1

Page 29: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

29

Solution for XOR : Add a hidden layer !!

Input nodesInput nodes Internal nodesInternal nodes Output nodesOutput nodes

X1

X2

The problem is: How to learn Multi Layer Perceptrons??

Solution: Backpropagation Algorithm invented by Rumelhart and colleagues in 1986

X1 XOR x2

Page 30: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

30

MultiLayer Perceptron

In contrast to perceptrons, multilayer networks can learn not only In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear.multiple decision boundaries, but the boundaries may be nonlinear.

Input nodesInput nodes Internal nodesInternal nodes Output nodesOutput nodes

Page 31: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

31

MultiLayer PerceptronDecision Boundaries

A

B

A

B

A

B

A

A

B

B

A

A

B

B

A

A

B

B

HALF PLANE BOUNDED BY HYPERPLANE

CONVEX OPEN OR CLOSED REGION

ARBITRARY (complexity limited by number of neurons)

Single-layer

Two-layer

Three-layer

Page 32: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

32

Example

x1x1

x2x2

Page 33: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

33

One single unit

To make nonlinear partitions on the space we need to define To make nonlinear partitions on the space we need to define each unit as a nonlinear function (unlike the perceptron).each unit as a nonlinear function (unlike the perceptron).One solution is to use the sigmoid unit.One solution is to use the sigmoid unit.

x1x1

x2x2

xnxnX0=1X0=1

w0w0

w1w1

w2w2

wnwn

ΣΣ

O = σ(net) = 1 / 1 + e O = σ(net) = 1 / 1 + e -net-net

netnet

Page 34: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

34

Sigmoid or logistic function

O(xO(x11,x,x22,…,x,…,xnn) = ) = σ ( WX )σ ( WX )

where: σ ( WX ) = 1 / 1 + e where: σ ( WX ) = 1 / 1 + e -WX-WX

Function σ is called the sigmoid or logistic Function σ is called the sigmoid or logistic function. function.

This function is easy to differentiate and has This function is easy to differentiate and has the following property:the following property:

d σ(y) / dy = σ(y) (1 – σ(y))d σ(y) / dy = σ(y) (1 – σ(y))

Page 35: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

35

Hyperbolic Tangent activation function: Tanh

O(xO(x11,x,x22,…,x,…,xnn) = ) = Tanh ( WX )Tanh ( WX )

where: Tanh ( WX ) = (e where: Tanh ( WX ) = (e WXWX - e - e –WX–WX) / (e ) / (e WXWX + e + e ––

WXWX) )

Function Tanh is called the Hyperbolic Tangent Function Tanh is called the Hyperbolic Tangent function. function.

This function is easy to differentiate and has the This function is easy to differentiate and has the following property:following property:

d Tanh(y) / dy = (1 + Tanh(y)) (1 – Tanh(y))d Tanh(y) / dy = (1 + Tanh(y)) (1 – Tanh(y))

Page 36: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

36

Learning MultiLayer Perceptron

BackPropagation Algorithm:BackPropagation Algorithm:

Goal: To learn the weights for all links in an Goal: To learn the weights for all links in an interconnected multilayer network. interconnected multilayer network.

We begin by defining our measure of error:We begin by defining our measure of error:

E(W) = ½ ΣE(W) = ½ Σdd Σ Σkk (t (tkdkd – o – okdkd))2 2 = ½ Σ= ½ Σexamplesexamples (t-o) (t-o)22= ½ Err= ½ Err22

k varies along the output nodes and k varies along the output nodes and d over the training examples.d over the training examples.

The idea is to use a gradient descent over the space of The idea is to use a gradient descent over the space of weights to find a global minimum (no guarantee). weights to find a global minimum (no guarantee).

Page 37: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

37

Gradient Descent

Page 38: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

38

Minimizing Error Using Steepest Descent

The main idea:Find the way downhill and take a step:

E

x

minimum

downhill = - _____d Ed x

= step size

x x -d Ed x

Page 39: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

39

Reduction of Squared Error

Gradient descent reduces the squared error by calculatingthe partial derivative of E with respect to each weight:

j

n

jjj

j

jj

xingErr

xWgtW

Err

W

ErrErr

W

EE

)('

0

jjj xingErrWW )('

chain rule for derivatives

expand second Err to (t – g(in))

This is called “in”

0

jW

tbecause and chain rule

The weight is updated by η times this gradient of error in weight space. The fact that the weight is updated in the correct direction (+/-) can be verified with examples.

learning rate

The learning rate, η, is typically set to a small value such as 0.1

E is a vector

Page 40: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

40

BackPropagation Algorithm Create a network with nCreate a network with ninin input nodes, input nodes,

nnhiddenhidden internal nodes, and n internal nodes, and noutout output output nodes. nodes.

Initialize all weights to small random Initialize all weights to small random numbers in the range of -0.5 to 0.5. numbers in the range of -0.5 to 0.5.

Until error is small do:Until error is small do: For each example X doFor each example X do

Propagate example X forward through the networkPropagate example X forward through the network Propagate errors backward through the networkPropagate errors backward through the network

Page 41: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

41

Y

BackPropagation Algorithm

X

E

D

y1

y2

y4

y3

e1

e2

e4

e3

x1

x2

x3

x4

x5

In the classification phase, only propagation step is used to classify patterns

(X,D)

Page 42: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

42

The Backpropagation Algorithm for Three-Layer Networks with Sigmoid Units

Initialize all weights in the network to small random numbers. Until weights converge (may take thousands of iterations) do

For each training example Compute network output vector o For each output unit i do

Update each network weight

For each hidden unit j do

Update network weight from each input k to hidden j

))(1( iiii otoo

ioutputsi

jijjj Woo

)1(

error backpropagationijjiji aWWij ,,

jkkjkj aWWjk ,,

error gradient

Page 43: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

43

The problem of overfitting … Approximation of the function y =

f(x) :2 neurons in hidden layer

5 neurons in hidden layer

40 neurons in hidden layer

x

y

The overfitting is not detectable in the learning phase …

So use Cross-Validation ...

Page 44: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

44

Application of ANNs

Network

Stimulus Response

0 1 0 1 1 1 0 0

1 1 00 10 10

Input Pattern

Output Pattern

encodingdecoding

The general scheme when using ANNs is as follows:

Page 45: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

45

Application: Digit Recognition

Page 46: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

46

Matlab Demo

Function approximation Pattern Recognition

Page 47: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

47

Learning XOR Operation: Matlab CodeP = [ 0 0 1 1; ... 0 1 0 1]T = [ 0 1 1 0];

net = newff([0 1;0 1],[6 1],{'tansig' 'tansig'});

net.trainParam.epochs = 4850;net = train(net,P,T);

X = [0 1];Y = sim(net,X);display(Y);

Page 48: Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi

48

Function Approximation:Learning Sinus Function P = 0:0.1:10; T = sin(P)*10.0;

net = newff([0.0 10.0],[8 1],{'tansig' 'purelin'}); plot(P,T); pause; Y = sim(net,P); plot(P,T,P,Y,’o’); pause;

net.trainParam.epochs = 4850; net = train(net,P,T);

Y = sim(net,P); plot(P,T,P,Y,’o’);