Upload
chamara-prasanna
View
215
Download
0
Embed Size (px)
Citation preview
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
1/12
SC 549 Artificial Neural
Networks 2015/2016
Topic 02 : Introduction to Machine
Learning and Learning Algorithms forSingleLayer Neural Networks
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Contents
Machine learning and Learning algorithms
Supervised and unsupervised
Learning in neural networks
Hebb rule (Hebbian learning)
Perceptron and its learning algorithm
ADALINE and its learning algorithm
2
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Learning in Neural Networks
Learning (Training) in a neural network
essentially means selecting one model from
the set of allowed models, that minimize a
cost function
It is the process of finding the decision
boundary by adjusting the weights
It is the process of finding the weight matrixthat provides the correct classification
3Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Machine Learning
Machine learning
scientific discipline that explores the construction
and study of algorithms that can learn from data
and make predictions on data
science of getting computers to act without being
explicitly programmed
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
2/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Types of Learning Algorithms
Supervised learning
infers a function from labeled training data
Unsupervised learning
tries to find hidden structure in unlabeled data
5Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Supervised Learning
The NN is trained repeatedly by a teacher.
Each input presented to the network has anassociated desired output.
In each learning cycle, the error between theactual and the desired output is used to adjust theweights.
When the error is an acceptable amount the
learning stops.
Applications : Classification and Regressionproblems
6
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Unsupervised Learning
A teacher is not involved
The network uses only the inputs
The inputs form automatic clustering based onsome closeness or similarity criteria.
Meanings are associated to these clustersdepending on the data.
Applications : Clustering, Dimensionalityreduction
7Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Machine Learning
Unsupervised Learning
8
Supervised Learning
x
Clustering (K means,
GMM(EM), Mean
shift)
Dimensionality
Reduction (PCA, LDA)
Classification,
Regression,
ANN
SVM
Decision tree
Polynomialcurve fit
Gaussian
Process
(x,t) f:x t
t={1,,n}
t: conti. variable
p(x)
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
3/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Application of Machine Learning
Pattern Recognition
Pattern recognition is a branch of machine learning
that focuses on the recognition of patterns andregularities in data
9Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Pattern Recognition
Predicting tumor cells as benign ormalignant
Classifying credit card transactionsas legitimate or fraudulent
Classifying secondary structures ofprotein as alphahelix, betasheet, orrandom coil
Categorizing news stories as finance,weather, entertainment, sports, etc
10
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Pattern Recognition Basic Concepts
Given a collection of records (training set )
Each record contains a set of attributes, one ofthe attributes is the class.
Find a model for class attribute as a function
of the values of other attributes.
11Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Pattern Recognition Basic Concepts
Goal: previously unseen recordsshould be
assigned a class as accurately as possible.
A test set is used to determine the accuracyof
the model.
Usually, the given data set is divided into
training and test sets, with training set used to
build the model and test set used to validate it.
12
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
4/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Pattern Recognition Example
Apply
Model
Learn
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?10
13Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Classification Techniques
Decision tree based Methods
Rulebased Methods
Memory based reasoning
Neural Networks
Nave Bayes and Bayesian Belief Networks
Support Vector Machines
14
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Classifiers Examples Support vector machine
LibSVM
SVMLight
Decision Tree
J48 C4.5
KNearest Neighbor
Bayesian Nave Bayes
Artificial Neural Networks Perceptron
Multilayer Perceptron
Selforganizing maps
Homework : Go through the list of classifiers in Weka
15Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Learning Algorithms for Single
Layer Neural Networks
Hebbian learning (Hebb rule)
Perceptron learning
Least mean square (LMS) learning
16
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
5/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebb Nets and Hebbian Learning
Donald Hebb, in his influential book The
organization of Behavior (1949), claimed
Behavior changes are primarily due to thechanges of synaptic strengths (wij) between
neurons i and j
The weight between two neurons increases if
the two neurons activate simultaneously, and
reduces if they activate separately
That is wiincreases only when both i and j
(two connected neurons) are on: the
Hebbian learning law (algorithm)
17Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebb Nets and Hebbian Learning
In ANN, Hebbian law can be stated:
increases only if the outputs of both units
and have the same sign.This is a generalized version of Hebbian law.
The weights are increased as follows;
Sometimes, there is a learning rate ,
iw
ix
y
yxoldwnewww iiii )()(
18
yxoldwnewwiii
)()(
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebbian learning algorithm
Step 0. Initialization: b = 0, wi = 0, i = 1 to nStep 1. For each training sample s:t do steps 24
/* s is the input pattern, t the target output of thesample */
Step 2. xi := si, i = 1 to n /* set s to input units */Step 3. y := t /* set y to the target */Step 4. wi := wi + xi * y, i = 1 to n /* update weight */
b := b + y /* update bias */
19Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebb Net Example AND Function
Examples: AND function
Binary units (1, 0)
(x1, x2) y=t w1 w2 b(1, 1) 1 1 1 1(1, 0) 0 1 1 1(0, 1) 0 1 1 1(0, 0) 0 1 1 1
An incorrect boundary:
1 + x1 + x2 = 0
Is learned after using
each sample once
20
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
6/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebb Net Example AND Function
A boundary1 + x1 + x2 = 0
is learned. This is not the
correct boundary
21Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebb Net Example AND Function
Bipolar units (1, 1)
(x1, x2) y=t w1 w2 b(1, 1) 1 1 1 1(1, 1) 1 0 2 0(1, 1) 1 1 1 1(1, 1) 1 2 2 2
A correct boundary
1 + x1 + x2 = 0
is successfully
learned
(2 + 2x1 + 2x2 =0
is the boundary
and 2 is cancelled
out)
22
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Hebb Net Example AND Function
With bipolar units, a
correct boundary
1 + x1 + x2 = 0
is successfully learned
23Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Stronger learning methods are
needed
Classification error can be used to determinethe weight update
Training samples can be used repeatedly, andeach time only change weights slightly
Learning methods of Perceptron and ADALINEmodels are error driven
24
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
7/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron
The perceptron occupies a special place in the
historical development of neural networks.
It was the first algorithmically described
neural network.
It was invented by Frank Rosenblatt, a
psychologist (1962).
25Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron
Rosenblatts perceptron is built around the
McCullochPitts model of a neuron
Basically, it consists of a single neuron with
adjustable synaptic weights and bias
Perceptron works as a binary classifier
26
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron
27
Activation function = Signum function
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron
The output of the perceptron y = f(s) is
computed using the signum (sign) activation
function.
OR
28
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
8/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron
Perceptrons can differentiate patterns only if theyare linearly separable.
Rosenblatt proved that if the patterns (vectors)used to train the perceptron are drawn from twolinearly separable classes, then the perceptron algorithm converges and
positions the decision surface in the form of ahyperplane between the two classes.
The proof of convergence of the algorithm isknown as the perceptron convergence theorem.
29Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron learning algorithm
Variables and Parameters:
x(n) = (m + 1)by1 input vector= [1, x1(n), x2(n), ..., xm(n)]
w(n) = (m + 1)by1 weight vector
= [b,w1(n),w22(n), ...,wm(n)]
b = bias
y(n) = actual responsed(n) = desired response (target)
= learningrate parameter
30
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron learning algorithm
1. Initialization. Set w(0) = 0. Then perform the followingcomputations for timestep (iteration) n = 1, 2, ....
2. Activation. At timestep n, activate the perceptron by applyingcontinuousvalued input vector x(n) and desired response d(n).
3. Computation of Actual Response. Compute the actual response ofthe perceptron as
y(n) = sgn[wT(n)x(n)] where sgn() is the signum function
4. Adaptation of Weight Vector.Update the weight vector of theperceptron to obtain
w(n + 1) = w(n) + [d(n) y(n)]x(n)
5. Continuation. Increment time step n by one and go back to step 2until convergence.
31Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron learning algorithm
Weight update rule
w(n + 1) = w(n) + [d(n) y(n)]x(n)
Weight update is based on this error correctionrule known as perceptron convergence theorem.
Learning paramater (learining rate), 0 < 1
The initial weights are set to small random values.
32
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
9/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Perceptron Convergence (Stopping
Condition)
Each iteration goes through each sample in the
training set. One iteration is called an epoch. Algorithm runs for several epochs until
convergence
Convergence When the mean error, , where ei(n) =
di(n) yi(n) is less than a threshold value or ideally,
when mean error = 0 m = number of input samples
Or predetermined number of iterations have beencompleted
33Postgraduate Institute of Science MSc in Computer Science SC549 ANN
An Application of Perceptron
Character recognition
7 characters (A, B,C,D,E,F, and G) from 3 fonts
are provided as shown in the next slide.
21 inputs samples
An algorithm should be developed to classify a
given character into one of seven characters
34
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
An Application of Perceptron Input Samples
35Postgraduate Institute of Science MSc in Computer Science SC549 ANN
An Application of Perceptron Character
Recognition
Solution: Singlelayer neural network ofperceptrons
Input layer63 binary inputs
Representing 9x7 pixels where dot is 0 and hash is 1
Output layer7 perceptrons
Perceptron 1 outputs A or Not A, perceptron 2outputs B or Not B and so on.
Eg: The output vector for letter B of perceptron 2 is0100000 (or 1+111111)
36
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
10/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Learning Rate
The learning rate has to be chosen
appropriately:A Small value will make the learning process
extremely slow.
A large value will result in fast learning, but the
learning process may not converge.
37Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Least Mean Square (LMS) Learning
and ADALINE The leastmeansquare (LMS) was the first linear
adaptivefiltering algorithm for solving problems such
as prediction and communicationchannel equalization.
LMS finds a desired filter by computing the filtercoefficients that relate to producing the least meansquares of the error signal (difference between thedesired and the actual signal).
It was invented in 1960 by Stanford Universityprofessor Bernard Widrow and his first Ph.D. student,Ted Hoff.
38
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Adaptive Filter
An adaptive filter is a system with a linear
filter that has a transfer function controlled by
variable parameters and a means to adjustthose parameters according to an
optimization (adaptive) algorithm.
39Postgraduate Institute of Science MSc in Computer Science SC549 ANN
Adaptive Filter
40
x(n) y(n)
d(n)
e(n)
+
-
Linear Filter
Adaptive
Algorithm
This system can easily be modeled using a simple
neuron (McCullochPitts model)
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
11/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
ADALINE
Adaptive Linear Neuron (ADALINE) was introduced byWidrow and Hoff (1960), is an implementation of an
adaptive filter.
The ADALINE networks are similar to the perceptron, buttheir transfer function is linear (f(u) = u) rather than hardlimiting (i.e, Signum).
This allows their outputs to take on any value, whereas the
perceptron output is limited to either 0 or 1 (or 1 or 1).
Hence, ADALINE is also built around the McCullochPittsmodel of a neuron.
41Postgraduate Institute of Science MSc in Computer Science SC549 ANN
ADALINE
42
Activation function = Linear function
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
ADALINE
The ADALINE is trained using the leastmean
square(LMS) or WidrowHoff rule
43Postgraduate Institute of Science MSc in Computer Science SC549 ANN
LMS Learning Rule
LMS learning rule is similar to perceptron
learning, except for the weight update rule.
The LMS rule adjusts the weights to reduce
the difference (error) between net input (local
induced field) and the desired outputs
This because the activation function is linear
44
7/24/2019 Intro to ML and Learning Algorithms for Single-Layer NN
12/12
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
LMS Learning Rule
Learning algorithm: similar to Perceptron
learning except the weight update rule,
45
w(n + 1) = w(n) + [d(n) - y(n)]x(n)
where, y(n) = wT(n)x(n)
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
LMS Convergence (Stopping
Condition)
LMS stops when the meansquare error (MSE)
is less than a certain threshold value.When, error, e(n) = d(n) y(n) and threshold =
MSE =
46
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
ADALINE LMS Algorithm
Step 0 Initialize weights. Set learning rate
Step 1 While stopping condition is false, do Step 26.
Step 2 For each training pair x : d, do Step 35.
Step 3 Set activations of input units, i = 1 n: xi(n) = si .
Step 4 Compute net input to output unit:y(n) = wT(n)x(n)
Step 5 Update bias and weights, i = 1 n:
Step 6 Test for stopping condition: if the meansquareerror is less than a threshold value , then stop ;otherwise go to Step 2 and continue.
47
w(n + 1) = w(n) + [d(n) y(n)]x(n)
Postgraduate Institute of Science MSc in Computer Science SC549 ANN
MADALINE
Extension of ADALINE
MADALINE (Many ADALINEs) is a threelayer
(input, hidden, output), fully connected, feed
forward artificial neural network architecture
for classification that uses ADALINE units in its
hidden and output layers
48