Unit-I & II Slides

7/28/2019 Unit-I & II Slides

1/86

Neural Networks

Unit I & II

(of a Total of VIII units)

K Raghu Nathan

Retd Dy Controller (R&D)


2/86

Topics covered in this Unit

Biological Neural Networks

Computers & Biological Neural Networks

Models of Neuron [Artificial Neurons] ANN Terminology

Artificial Neural Networks

Historical Development of NN Principles


3/86

Topics covered [contd]

ANN Topologies

ANN Functional Usage

Pattern Recognition Tasks Learning in ANNs

Basic Learning Laws


4/86

Biological Neural Networks

Nervous System

Complex system of interconnected nerves

Made up of Nerve Cells called Neurons

Neurons Receive & Transmit informationbetween various parts/organs of the body

Sensory (Receptor) Neuron, Motor

Neuron, Inter-Neuron Transmission of signal is a complex

electro-chemical process


5/86


6/86


7/86


8/86

The Biological Neuron


9/86


10/86


11/86

Biological Neuron

Cell Body Soma Has a Nucleus

Dendrites

Fiber-like; large number; branched structure Receive signals from other neurons

Axon One per neuron; longer & thicker; branched at its end

Transmits signals to other neurons Contains Vesicles, which hold chemical substance

called neural transmitters


12/86

Biological Neuron [contd]

Synapse Synaptic Cleft Synaptic Gap

Junction of axon & dendrites

Pre-synaptic neuron

Transmitting neuron

Post-synaptic neuron Receiving neuron


13/86

The Synapse


14/86


15/86


16/86

Neuron Signals

Complex electro-chemical process

Incoming signals raise or lower the electrical potentialinside the neuron

If potential crosses a threshold, a short electrical pulse isproduced

We say the neuron fires [is triggered or activated] The pulse is sent down the axon

electrical activity inside the neurons

Chemical activity occurs at the synapses

Vesicles in the axon release chemical substance, calledneural transmitters

These are collected by dendrites of receiving neuron

This raises/lowers electric potential in the receivingneuron


17/86

Neuron Signals

Each neuron receives lot of input signalsthru its dendrites

from many other neurons

Sends an output signal thru its axon to many other neurons

Output depends on all inputs

Cell body acts like a summing &processing device

Process depends on type of neuron


18/86

Characteristics of Biological NN

Robustness & Fault Tolerance Decay of nerve cells does not seem to affect

performance significantly

Flexibility

Automatically adjusts to new environment

Ability to deal with wide variety ofsituations

Uncertain, Vague, Inconsistent, Noisy

Collective Computation

Massively Parallel

Distributed


19/86

Aspect Computer Biological NN

Speed Numeric: FasterPatterns: Slower

SlowerFaster

Processing Sequential Massively Parallel

Size &

Complexity

Less complex Very Complex

Storage in memory locations;

addressable; fixed capacity;

new info overwrites old info

In the strengths of the

interconnections;

Adaptable size, to add

new infoFault Tolerance no yes

Control

Mechanism

centralized distributed

f


20/86

Artificial Neuron - Neuron Models

Mathematical Models of Neuron

M & P model

Perceptron Adaline

Madaline

Neocognitron


21/86

McCulloch & Pitts Model

aiwi + bf(x)

a1

a2

an

ai

xs

w1

w2

wi

wn

inputs weights

Summing part Output part

Activation

value

Output

signal

[b = bias]

[s= f(x)]output

function


22/86

Output Function

Binary : if x>=t, s=1

else, s=0

(t = threshold)

Linear : s x

s = k t

t

0

1

x

s

x

0

s


23/86

Ramp :

Sigmoid :s = 1/(1+e-x)

s

x0

0


24/86

NOR gate, using the M&P model

a1

a2

-1

-1

s

a1 a2 x s

0 0 1 1

0 1 0 0

1 0 0 0

1 1 -1 0


25/86

Perceptron

Inputs are first processed by Association

Units

Weights are adjustable, to enable

Learning

Actual output is compared with desired

output; the difference is Error

Error is used to adjust the weights, to

obtain desired output


26/86

Perceptron (contd)

a1

a2

a3

aiwi

+b

f(x)

Sensory

units

Association

units

Summing

unit

Output

function

s

w1

w2

w3

x


27/86

Perceptron (contd)

Expected output = s

Actual output = s

Error = = ss Weight change =wi = ai

is the Learning Rate parameter


28/86

Perceptron Learning

Perceptron Learning Rule

Procedure for adjusting the weights

If weight adjustments lead to zero-error,

we say it converges Whether error reduces to 0, depends on

nature of desired input-output pairs of data

Perceptron Convergence Theorem To determine whether desired input-output

pairs are representable [achievable]


29/86

Adaline

Adaline = Adaptive LinearElement

Similar to Perceptron; difference is :

Employs Linear Output Function (s=x)

Weight update rule minimises the mean

squared error, averaged over all inputs

Hence known as LMS (Least Mean Squared)

Error Learning Rule

Also known as : Gradient Descent Algorithm


30/86

Terminology

Processing Unit

Summing part, output part

Inputs, weights, bias, activation value

Output function, output signal Interconnections various Topologies

Operations

Activation Dynamics, Learning Laws Update

Synchronous, Asynchronous


31/86

Artificial Neural Networks

It is possible to create models of the

biological neurons as processing units

and link them to form closely interconnected

networks

Models may be electronic / software

Such networks are called Artificial Neural

Networks [ANN]


32/86

ANN

ANNs exhibit abilities surprisingly similar

to Biological NNs

They can Learn, Recognize, Remember,

Match & Retrieve Patterns of Information

Hardware implementations of ANN are

also available nowadays

Costly but faster than software

implementation


33/86

Historical Development of ANN

1943 - McCulloch & Pitts Model of Neuron 1949 - Hebbian Learning Law

1958 - Rosenblatts Model Perceptron

1960 Widrow & Hoff Adaptive LinearElement [Adaline] & Least Mean Squared[LMS] Error Learning Law

1969 Minsky & Papert - Multilayer Perceptron

1971 Kohonen Associative Memory

1971 Wilshaw Self-Organization

1974 Werbos Error Backpropagation


34/86

Historical Development of ANN [contd]

1976 Grossberg Adaptive ResonanceTheory [ART]

1980 Fukushima - Neocognitron

1982 Hopfield Energy Analysis 1985 Sejnowski Boltzmann Machine

1987 Nielsen Counterpropagation [CPN]

1988 Kosko Bidirectional Associative

Memory [BAM] 1988 Broomhead Radial Basis Function

[RBF]


35/86

Topology

Topology is the physical organisation of the ANN

Arrangement of the processing units,interconnections & pattern input & output

ANN is made up of Layers of Neurons All Neurons within one layer have same

activation dynamics & output function

In addition to interlayer connections, intralayer

connections may also be made Connections across the layers may be in feed-

forward or feed-back manner


36/86

Topology (contd)

One Input layer, one output layer

zero or more intermediate layers (usuallyreferred as hidden layers)

No limit on no. of layers There can be any no. of neurons in any layer; all

layers need not have same no. of neurons

If there is no hidden layer, the ANN is called

single-layer network If one or more hidden layers are present, it is

called multi-layer network


37/86

Topology (contd)

Feedforward Networks

the units are connected such that data flowsonly in forward direction, ie. from input layer tooutput layer, via successive hidden layers ifany

Feedback Networks data flows in forward direction, as above

in addition, the connections allow data flowfrom output layer towards input layer also

the reverse flow (feedback) is for error-correction, for adjusting weights suitably toget desired output, which is an essential

feature of the mechanism for NN Learning


38/86

Single Layer FF Network

Input layerOutput layer


39/86

Multilayer Feedforward Network

Input

layer

Output

layerHidden layers


40/86

Feedback Network


41/86

Neuronal Dynamics

Operation of NN governed by Neuronal

Dynamics

Dynamics of activation state

Dynamics of synaptic weights

Short term Memory (STM) modelled by

activation state of the NN

Long Term Memory (LTM) corresponds to

encoded pattern of info in synaptic weights

Applications of Artificial Neural


42/86

Artificial

Intellect with

Neural

Networks

Intell igent

Contro l

Technical

Diagnist i

cs

Intell igent

Data An alysisand Signal

Processing

Advance

Robot ics

Machine

Vision

Image &

Pattern

Recognit ion

Intell igent

Securi ty

Systems

Intell igent

l

Medicine

Devices

Intell igent

Expert

Systems

Applications of Artificial Neural

Networks

42


43/86

Major Areas of Usage

Pattern Recognition Tasks

These tasks necessarily involve Learning

Memory

Information Retrieval


44/86

Patterns

Computers deal with Data

Humans deal with Patterns

Objects/Images, voices/sounds, even

actions [walking etc] have patterns Different images, sounds & actions have

different patterns

Patterns enable us to recognise, classify &identify objects & to take decisions basedon such identification


45/86

Pattern Recognition Tasks

Pattern Association

Pattern Classification

Pattern Mapping

Pattern Clustering (aka Pattern Grouping)

Feature Mapping


46/86

Pattern Association

Every input pattern is associated with an outputpattern, to form a pair of input-output patterns

There will be many such pairs of input-outputpatterns

A well-designed ANNs can be trained to learn(remember) many such pairs of patterns

Whenever a pattern is input, the ANN shouldretrieve (output) the corresponding outputpattern

Supervised Learning has to be employed [beingtaught]

This is purely a memory function & is calledauto-assoc iat ion task


47/86

Pattern Association (contd)

Desirable : even if the input pattern is incompleteor noisy [ie. contains some errors], we shouldget correct output pattern

Among the various input patterns in its memory,

the ANN should select one pattern which isclosest to the test input & the correspondingoutput pattern should be output by the ANN

This needs content addressable memory & the

process is called accret ive behaviou r Example of Pattern Association task : OCR of

printed characters


48/86

Pattern Classification

Objects belonging to the same class have manycommon features/patterns

This fact enables us to classify objects into classes & toidentify new classes

Supervised Learning the patterns for each class has tobe taught to the system

Pattern classification tasks must exhibit accret ivebehaviour ie. an incomplete or noisy input shouldpoduce an output corresponding to its closest known

input pattern

Example of Pattern Classification task: VoiceRecognition, Handwriting Recognition


49/86

Pattern Mapping

Capturing the relation between the input

pattern & its corresponding output pattern

This is a general isat iontask, not mere

memorising

This is called in terpolative behaviou r

Example of Pattern Mapping task: Speech

Recognition


50/86

Pattern Clustering

Identifying subsets of patterns having

similar distinctive features & grouping

them together

Sounds similar to Pattern Classification,

but is not the same

Has to employ Unsupervised Learning


51/86

Classification

Patterns for eachclass are input

separately

That is, system is

trained to learn

patterns of one class

first

Then it is taught thepatterns of another

class

Clustering

Patterns belonging toseveral groups are

mixed in the set of

inputs

System has to resolve

them into different

groups


52/86

Feature Mapping

In several patterns, the features may not beunambiguous

May vary over a time-period

Therefore, difficult to cluster

In this case, system learns a feature map-rather than clustering or classifying

Has to employ unsupervised learning

Example: you see a new object - for the first time

never seen it before - & it has some distinctfeatures, as well some features common tomany known classes or groups


53/86

Pattern Recognition Problem

In any pattern recognition task, we have aset of input patterns & a set of desired

output patterns

Depending on the nature of desired outputpatterns & the nature of the task

environment, the problem would be one of

the following three types:

Pattern Association Problem

Pattern Classification Problem

Pattern Mapping Problem


54/86

Pattern Association Problem

Problem: to design an ANN

Input-output pairs are (a1,b1), (a2, b2), (a3,

b3), ., (aL,bL)

al = (al1,al2,,alM) & bl = (bl1, bl2,,blN) are

vectors of dimensions M & N

The ANN should associate the input

patterns with the corresponding output

patterns


55/86

Pattern Association Problem (contd)

If al & bl are distinct, the problem is hetero-

associative

If al = bl, it is auto-associative; al=bl means M=N,

the input & output patterns both refer to thesame point in a N dimensional space

Storing the association of the pairs of input &

output patterns = deciding the weights in the

network, by applying the operations of thenetwork on the input pattern


56/86

Pattern Association Problem (contd)

If a given input pattern = same as whatwas used for training the network, theoutput pattern = same as what was usedduring training

If input pattern is slightly different(incomplete or noisy), output may also bedifferent

If actual input a = al + [ = noise vector] If output is bl [as desired] NW is showingacretive behaviour

If output is b = bl + , and 0 as 0,

NW is showing interpolative behaviour

B i F ti l U it


57/86

Basic Functional Units

Basic functional unit = simplest form in the3 types of NN viz. FF, FB & Combination

NWs

Simplest FF NN is a single-layer NW

Si l t FB NN h N it h


58/86

Simplest FB NN has N units, each

connected to all others & to itself


59/86

Simplest Combination of FF & FB NW [aka

Compet i tive Learning NW] is a single-

layer NW in which the units in output layerhave feedback connections among

themselves

T f ANN & th i it bl t k


60/86

Types of ANN & their suitable tasks

FF NN Pattern Association, Classification & Mapping

FB NN

Auto-Association, Pattern Storage (LTM),Pattern Environment Storage (LTM)

FF & FB (CL) NN

Pattern Storage (STM), Clustering & FeatureMapping

FF NN P tt A i ti


61/86

FF NN Pattern Association

a1

a2

a5

a3

a4

b1

b2

b3

b4

For input pattern ai, the corresponding output pattern is bi.

a5 & a6 are noisy versions of a3.

In a5 the noise is less, it is nearest to a3 - NW outputs b3 [desired], it is

accretive.

In a6 noise is more, it is nearer to a4 than a3 NW may output b4.

a6

R l Lif E l


62/86

Real-Life Example

A 1000001

B 1000010

.

.

.

Z 1011010

A

B

Z

Inputs are 8x8 grids of pixels

of binary values.

Input pattern space is a binary

256-dimensional space.

Outputs are 5-bit binary

numbers (7-bit ascii

characters).

Output pattern space is binary

5-dimensional space.

Noisy versions of input

patterns can occur, whensome values of some pixels

get changed, due to noise in

transmission channel or

dust/stain spots on the

document being scanned.

FF NN P tt Cl ifi ti


63/86

FF NN Pattern Classification

Some of the output patterns may be identical

So, a set of input patterns may correspond to the

same output pattern

No. of distinct output patterns = a class label

Input patterns corresponding to each class =samples of that class

In such cases, the NN has to classify the input

patterns That is: for each input pattern, the NN should

identify the class [output pattern] to which it

belongs

R l Lif E l


64/86

Real-Life Example

A

B

A A

B B

A 1000001

B 1000010

CL NN P tt Cl ifi ti


65/86

CL NN Pattern Classification

Accretive behaviour

FF NN P tt M i


66/86

FF NN Pattern Mapping

NN is trained with some pairs of input-output

patterns, not all possible pairs

When a new input pattern is given, the NN ismade to find the coresponding output pattern[though the NN was not trained with this pair]

Suppose the NN has been trained with i/o pairsan & bn

If the new input pattern am is closer to some

known input pattern am, the NN tries to find anoutput pattern b which is closer to bn

Interpolative behaviour


67/86

Pattern Mapping Action

a1

a2

a6

a3

a4

a5

b1

b2

b6

b3

b4

b5

NN trained with (a1,b1) to (a5,b5) only; not trained with (a6,b6) pair.

a6 closer to a3; so, NN maps it on to b6, which is closer to b3.

FB NN P tt A i ti


68/86

FB NN Pattern Association

If input patterns are identical to output

patterns, input & output spaces are

identical

Problem reduces to auto-association

trivial; the NW merely stores the input patterns

If a noisy pattern arrives at input, NW

outputs the same noisy pattern as outputAbsence of accretive behaviour

FB NN P tt A i ti ( td)


69/86

FB NNPattern Association (contd)

a1

a2

a5

a3

a4

a1

a2

a5

a3

a4

FB NN Pattern Storage (LTM)


70/86

FB NN Pattern Storage (LTM)

Auto-association with accretive behaviour

Input patterns are stored; stored patterns can beretrieved by a noisy/approximate input patternalso

Very useful in practice

Two possibilities : Stored patterns = same as input patterns; input

pattern space is continuous; output pattern space is aset of fixed finite set of patterns that are stored

Stored patterns = some transformed versions of inputpatterns; output space has same dimensions as inputspace

FB NN Pattern Storage (contd)


71/86

FB NNPattern Storage (cont d)

FB NN Pattern Environment Storage


72/86

FB NN Pattern Environment Storage

Pattern Environment = a set of patterns +

the probabilities of their occurrence

NW is designed to recall the patterns with

lowest probability of error

More about this in Unit-VII

CL NN Pattern Storage (STM)


73/86

CL NN Pattern Storage (STM)

STM = short term memory = temporary storage

Given input [as it is or a transformed version] is

stored

As long as same pattern is input, the storedpattern is recalled

When new pattern is input, stored pattern is lost

new pattern is stored

Such NW can be studied on academic interest

only not of practical use

CL NN Pattern Clustering


74/86

g

Patterns are grouped, based on similarities

Input is an individual pattern; ouput is the patternof group to which the input belongs

That is : a group of approximately similar

patterns are identified with one & the same

cluster label & will produce the same output

pattern

Two types possible : new input pattern, not

belonging to any group, is forced to one of the groups (Accretive behaviour)

shown as belonging to a new group

Input is close to some known input pattern x

New group is close to xs group (Interpolative behaviour)

CL NN Pattern Clustering (contd)


75/86

CL NNPattern Clustering (cont d)

Interpolative behaviour

CL NN Feature Mapping


76/86

CL NN Feature Mapping

Similar to clustering; difference is:

Similar inputs produce similar output [not

the same output]

similarities of inputs is retained at the

output

No accretive behaviour; only interpolative

Output patterns are much larger [than for

clustering]


77/86

Types of Learning (contd)


78/86

Types of Learning (cont d)

Reinforcement Learning Bridges the gap between supervised &

unsupervised methods

Output is not known

System receives feedback from environment

Reward for correctness

Punishment for error

System adapts its parameters based on this

feedback

Learning Equation


79/86

Learning Equation

Implementation of Synaptic Dynamics

Expression for updating of weights Express the weight vector ofith processing unit

at time instant t+1, in terms of that weight vectorat time instant t

wi(t+1) = wi(t) +wi(t) wi(t) is the change in the weight vector

Different researchers have proposed differentexpressions for calculatingwi(t); these are

called Learning Laws

Learning Laws


80/86

Learning Laws

Hebbs Law [Hebbian Learning Law]

Perceptron Learning Law

Delta Learning Law

LMS Learning Law

Correlation Learning Law

Instar [winner-take-all] Learning Law Outstar Learning Law

B lt L i


81/86

Boltzmann Learning

Stochastic Learning Algorithm

A Network designed to apply Boltzmann

Learning Rule is called Boltzmann

Machine

The neurons constitute a recurrent

structure & give binary output [+1 or -1]

corresponding to whether the neuron is onor off

M b d L i


82/86

Memory-based Learning

Past experiences = patterns which the NN hasbeen trained to recognise/classify

Each experience is a pair of input & output

patterns All or most of the past experiences are stored in

a large memory

Any new input pattern can be compared with

patterns stored in memory & the corresponding

output pattern can be output

M b d L i ( td)


83/86

Memory-based Learning (contd)

Memory-based learning algorithms involve2 essential ingredients

Criteria applied to define local

neighbourhood [patterns which are similar]

Learning rule applied for training the NN

Algorithms will differe based on how these

2 ingredients are defined

S f L i L


84/86

Summary of Learning Laws

See table 1.2 on page 35 of

Yegnanarayanas book

LearningLaw

WeightUpdate (Wi)

Formula

Initial Weights Type ofLearning

Remarks


85/86

(forj= 1, 2, ..., M)

Hebbian siaj Near zero Unsupervised

Perceptron (bi- si) aj Random Supervised Bipolar OutputFunctions

Delta (bi- si) f(xi) aj Random Supervised

Widrow-Hoff

(LMS) [bi - wi

Ta] ajRandom Supervised

Correlation bi ajNear zero Supervised

Winner-Take-All

(Instar) (aj wkj)

Random, but

normalised

Unsupervised Competitive

Learning;

K is the Winning Unit

Outstar (bi

wjk

) Zero Supervised Grossberg Learning


86/86

End of Units I & II

Documents

Unit-I & II Slides