Intro to ML and Learning Algorithms for Single-Layer NN

7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN

1/12

SC 549 Artificial Neural

Networks 2015/2016

Topic 02 : Introduction to Machine

Learning and Learning Algorithms forSingleLayer Neural Networks

Postgraduate Institute of Science MSc in Computer Science SC549 ANN

Contents

Machine learning and Learning algorithms

Supervised and unsupervised

Learning in neural networks

Hebb rule (Hebbian learning)

Perceptron and its learning algorithm

ADALINE and its learning algorithm

2


Learning in Neural Networks

Learning (Training) in a neural network

essentially means selecting one model from

the set of allowed models, that minimize a

cost function

It is the process of finding the decision

boundary by adjusting the weights

It is the process of finding the weight matrixthat provides the correct classification

3Postgraduate Institute of Science MSc in Computer Science SC549 ANN

Machine Learning

Machine learning

scientific discipline that explores the construction

and study of algorithms that can learn from data

and make predictions on data

science of getting computers to act without being

explicitly programmed


2/12


Types of Learning Algorithms

Supervised learning

infers a function from labeled training data

Unsupervised learning

tries to find hidden structure in unlabeled data


Supervised Learning

The NN is trained repeatedly by a teacher.

Each input presented to the network has anassociated desired output.

In each learning cycle, the error between theactual and the desired output is used to adjust theweights.

When the error is an acceptable amount the

learning stops.

Applications : Classification and Regressionproblems

6


Unsupervised Learning

A teacher is not involved

The network uses only the inputs

The inputs form automatic clustering based onsome closeness or similarity criteria.

Meanings are associated to these clustersdepending on the data.

Applications : Clustering, Dimensionalityreduction


Machine Learning

Unsupervised Learning

8

Supervised Learning

x

Clustering (K means,

GMM(EM), Mean

shift)

Dimensionality

Reduction (PCA, LDA)

Classification,

Regression,

ANN

SVM

Decision tree

Polynomialcurve fit

Gaussian

Process

(x,t) f:x t

t={1,,n}

t: conti. variable

p(x)


3/12


Application of Machine Learning

Pattern Recognition

Pattern recognition is a branch of machine learning

that focuses on the recognition of patterns andregularities in data


Pattern Recognition

Predicting tumor cells as benign ormalignant

Classifying credit card transactionsas legitimate or fraudulent

Classifying secondary structures ofprotein as alphahelix, betasheet, orrandom coil

Categorizing news stories as finance,weather, entertainment, sports, etc

10


Pattern Recognition Basic Concepts

Given a collection of records (training set )

Each record contains a set of attributes, one ofthe attributes is the class.

Find a model for class attribute as a function

of the values of other attributes.


Pattern Recognition Basic Concepts

Goal: previously unseen recordsshould be

assigned a class as accurately as possible.

A test set is used to determine the accuracyof

the model.

Usually, the given data set is divided into

training and test sets, with training set used to

build the model and test set used to validate it.

12


4/12


Pattern Recognition Example

Apply

Model

Learn

Model

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes10

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ?10


Classification Techniques

Decision tree based Methods

Rulebased Methods

Memory based reasoning

Neural Networks

Nave Bayes and Bayesian Belief Networks

Support Vector Machines

14


Classifiers Examples Support vector machine

LibSVM

SVMLight

Decision Tree

J48 C4.5

KNearest Neighbor

Bayesian Nave Bayes

Artificial Neural Networks Perceptron

Multilayer Perceptron

Selforganizing maps

Homework : Go through the list of classifiers in Weka


Learning Algorithms for Single

Layer Neural Networks

Hebbian learning (Hebb rule)

Perceptron learning

Least mean square (LMS) learning

16


5/12


Hebb Nets and Hebbian Learning

Donald Hebb, in his influential book The

organization of Behavior (1949), claimed

Behavior changes are primarily due to thechanges of synaptic strengths (wij) between

neurons i and j

The weight between two neurons increases if

the two neurons activate simultaneously, and

reduces if they activate separately

That is wiincreases only when both i and j

(two connected neurons) are on: the

Hebbian learning law (algorithm)


Hebb Nets and Hebbian Learning

In ANN, Hebbian law can be stated:

increases only if the outputs of both units

and have the same sign.This is a generalized version of Hebbian law.

The weights are increased as follows;

Sometimes, there is a learning rate ,

iw

ix

y

yxoldwnewww iiii )()(

18

yxoldwnewwiii

)()(


Hebbian learning algorithm

Step 0. Initialization: b = 0, wi = 0, i = 1 to nStep 1. For each training sample s:t do steps 24

/* s is the input pattern, t the target output of thesample */

Step 2. xi := si, i = 1 to n /* set s to input units */Step 3. y := t /* set y to the target */Step 4. wi := wi + xi * y, i = 1 to n /* update weight */

b := b + y /* update bias */


Hebb Net Example AND Function

Examples: AND function

Binary units (1, 0)

(x1, x2) y=t w1 w2 b(1, 1) 1 1 1 1(1, 0) 0 1 1 1(0, 1) 0 1 1 1(0, 0) 0 1 1 1

An incorrect boundary:

1 + x1 + x2 = 0

Is learned after using

each sample once

20


6/12



A boundary1 + x1 + x2 = 0

is learned. This is not the

correct boundary



Bipolar units (1, 1)

(x1, x2) y=t w1 w2 b(1, 1) 1 1 1 1(1, 1) 1 0 2 0(1, 1) 1 1 1 1(1, 1) 1 2 2 2

A correct boundary

1 + x1 + x2 = 0

is successfully

learned

(2 + 2x1 + 2x2 =0

is the boundary

and 2 is cancelled

out)

22



With bipolar units, a

correct boundary

1 + x1 + x2 = 0

is successfully learned


Stronger learning methods are

needed

Classification error can be used to determinethe weight update

Training samples can be used repeatedly, andeach time only change weights slightly

Learning methods of Perceptron and ADALINEmodels are error driven

24


7/12


Perceptron

The perceptron occupies a special place in the

historical development of neural networks.

It was the first algorithmically described

neural network.

It was invented by Frank Rosenblatt, a

psychologist (1962).


Perceptron

Rosenblatts perceptron is built around the

McCullochPitts model of a neuron

Basically, it consists of a single neuron with

adjustable synaptic weights and bias

Perceptron works as a binary classifier

26


Perceptron

27

Activation function = Signum function


Perceptron

The output of the perceptron y = f(s) is

computed using the signum (sign) activation

function.

OR

28


8/12


Perceptron

Perceptrons can differentiate patterns only if theyare linearly separable.

Rosenblatt proved that if the patterns (vectors)used to train the perceptron are drawn from twolinearly separable classes, then the perceptron algorithm converges and

positions the decision surface in the form of ahyperplane between the two classes.

The proof of convergence of the algorithm isknown as the perceptron convergence theorem.


Perceptron learning algorithm

Variables and Parameters:

x(n) = (m + 1)by1 input vector= [1, x1(n), x2(n), ..., xm(n)]

w(n) = (m + 1)by1 weight vector

= [b,w1(n),w22(n), ...,wm(n)]

b = bias

y(n) = actual responsed(n) = desired response (target)

= learningrate parameter

30



1. Initialization. Set w(0) = 0. Then perform the followingcomputations for timestep (iteration) n = 1, 2, ....

2. Activation. At timestep n, activate the perceptron by applyingcontinuousvalued input vector x(n) and desired response d(n).

3. Computation of Actual Response. Compute the actual response ofthe perceptron as

y(n) = sgn[wT(n)x(n)] where sgn() is the signum function

4. Adaptation of Weight Vector.Update the weight vector of theperceptron to obtain

w(n + 1) = w(n) + [d(n) y(n)]x(n)

5. Continuation. Increment time step n by one and go back to step 2until convergence.



Weight update rule

w(n + 1) = w(n) + [d(n) y(n)]x(n)

Weight update is based on this error correctionrule known as perceptron convergence theorem.

Learning paramater (learining rate), 0 < 1

The initial weights are set to small random values.

32


9/12


Perceptron Convergence (Stopping

Condition)

Each iteration goes through each sample in the

training set. One iteration is called an epoch. Algorithm runs for several epochs until

convergence

Convergence When the mean error, , where ei(n) =

di(n) yi(n) is less than a threshold value or ideally,

when mean error = 0 m = number of input samples

Or predetermined number of iterations have beencompleted


An Application of Perceptron

Character recognition

7 characters (A, B,C,D,E,F, and G) from 3 fonts

are provided as shown in the next slide.

21 inputs samples

An algorithm should be developed to classify a

given character into one of seven characters

34


An Application of Perceptron Input Samples


An Application of Perceptron Character

Recognition

Solution: Singlelayer neural network ofperceptrons

Input layer63 binary inputs

Representing 9x7 pixels where dot is 0 and hash is 1

Output layer7 perceptrons

Perceptron 1 outputs A or Not A, perceptron 2outputs B or Not B and so on.

Eg: The output vector for letter B of perceptron 2 is0100000 (or 1+111111)

36


10/12


Learning Rate

The learning rate has to be chosen

appropriately:A Small value will make the learning process

extremely slow.

A large value will result in fast learning, but the

learning process may not converge.


Least Mean Square (LMS) Learning

and ADALINE The leastmeansquare (LMS) was the first linear

adaptivefiltering algorithm for solving problems such

as prediction and communicationchannel equalization.

LMS finds a desired filter by computing the filtercoefficients that relate to producing the least meansquares of the error signal (difference between thedesired and the actual signal).

It was invented in 1960 by Stanford Universityprofessor Bernard Widrow and his first Ph.D. student,Ted Hoff.

38


Adaptive Filter

An adaptive filter is a system with a linear

filter that has a transfer function controlled by

variable parameters and a means to adjustthose parameters according to an

optimization (adaptive) algorithm.


Adaptive Filter

40

x(n) y(n)

d(n)

e(n)

+

-

Linear Filter

Adaptive

Algorithm

This system can easily be modeled using a simple

neuron (McCullochPitts model)


11/12


ADALINE

Adaptive Linear Neuron (ADALINE) was introduced byWidrow and Hoff (1960), is an implementation of an

adaptive filter.

The ADALINE networks are similar to the perceptron, buttheir transfer function is linear (f(u) = u) rather than hardlimiting (i.e, Signum).

This allows their outputs to take on any value, whereas the

perceptron output is limited to either 0 or 1 (or 1 or 1).

Hence, ADALINE is also built around the McCullochPittsmodel of a neuron.


ADALINE

42

Activation function = Linear function


ADALINE

The ADALINE is trained using the leastmean

square(LMS) or WidrowHoff rule


LMS Learning Rule

LMS learning rule is similar to perceptron

learning, except for the weight update rule.

The LMS rule adjusts the weights to reduce

the difference (error) between net input (local

induced field) and the desired outputs

This because the activation function is linear

44


12/12


LMS Learning Rule

Learning algorithm: similar to Perceptron

learning except the weight update rule,

45

w(n + 1) = w(n) + [d(n) - y(n)]x(n)

where, y(n) = wT(n)x(n)


LMS Convergence (Stopping

Condition)

LMS stops when the meansquare error (MSE)

is less than a certain threshold value.When, error, e(n) = d(n) y(n) and threshold =

MSE =

46


ADALINE LMS Algorithm

Step 0 Initialize weights. Set learning rate

Step 1 While stopping condition is false, do Step 26.

Step 2 For each training pair x : d, do Step 35.

Step 3 Set activations of input units, i = 1 n: xi(n) = si .

Step 4 Compute net input to output unit:y(n) = wT(n)x(n)

Step 5 Update bias and weights, i = 1 n:

Step 6 Test for stopping condition: if the meansquareerror is less than a threshold value , then stop ;otherwise go to Step 2 and continue.

47

w(n + 1) = w(n) + [d(n) y(n)]x(n)


MADALINE

Extension of ADALINE

MADALINE (Many ADALINEs) is a threelayer

(input, hidden, output), fully connected, feed

forward artificial neural network architecture

for classification that uses ADALINE units in its

hidden and output layers

48

Documents

Intro to ML and Learning Algorithms for Single-Layer NN