Upload
siddu
View
222
Download
0
Embed Size (px)
Citation preview
8/18/2019 09 Artificial Neural Networks and Classification
1/43
Artifcial Neural Networks andClassifcation
An artifcial neural network is a simple
brain-like device that can learn byadjusting connections between its
neurons
8/18/2019 09 Artificial Neural Networks and Classification
2/43
The brain as a computer
8/18/2019 09 Artificial Neural Networks and Classification
3/43
The brain’s architecture
Human (and animal) brains have a ‘computer’
architecture which consists o a comple! web o about "#"" hi$hl% inter&connected
processin$ units called neurons
'rocessin$ involves si$nals bein$ sent rom neuron to neuron
b% complicated electrochemical reaction in a hi$hl% parallel manner
8/18/2019 09 Artificial Neural Networks and Classification
4/43
The neuron A neuron is nerve cell consistin$ o
a cell bod% (soma) containin$ a nucleus’
branchin$ out rom the bod% a number o fbres called dendrites a sin$le lon$ fbre called the axon
a centimetre or lon$er’
The a!on branches and connects to thedendrites o other neurons
the connectin$ unction is called the synapse each neuron connects to between a do*en and "##+###
other neurons’
8/18/2019 09 Artificial Neural Networks and Classification
5/43
A real neuron
8/18/2019 09 Artificial Neural Networks and Classification
6/43
,i$nal propa$ation
Chemical transmitter substances are releasedrom the s%napses and enter the dendrites’
This raises or lowers the electrical potential o thecell bod%
s%napses that raise potential are called excitatory those that lower it are inhibitory
-hen a threshold is reached+ an electrical pulse+the action potential+ is sent down the a!on (fring) This spreads into the a!on’s branches reachin$
s%napses and releasin$ transmitters into cell bodieso other neurons
8/18/2019 09 Artificial Neural Networks and Classification
7/43
.rain versus computer ,tora$e capacit%
brain has more neurons than computer has bits
,peed
brain is much slower than a computer a neuron has frin$ speed o "#&/ secs compared to computer switchin$
speed o "#&"" secs
.rain relies on massive parallelism or perormance %ou can reco$nise %our mother in #" secs
The brain is more suited to intelli$ence processin$ and learnin$ 0t is $ood at ormin$ associations
this seems to be the basis o learnin$
0s more ault tolerant neurons die all the time and computation continues
Task perormance e!hibits graceul degradation in contrast to brittleness o computers
8/18/2019 09 Artificial Neural Networks and Classification
8/43
Artifcial neural networks
8/18/2019 09 Artificial Neural Networks and Classification
9/43
-hat is an artifcial neuralnetwork1 An artifcial neural network (ann) is a $rossl%
oversimplifed version o the brain’s architecture
0t has ar ewer ’neurons’ several hundred or thousand
0t has much simpler internal structure
The frin$ mechanism is less comple!
The si$nals consist o real numbers passed romone neuron to another
8/18/2019 09 Artificial Neural Networks and Classification
10/43
How does a network behave1 2ost anns can be re$arded as input&output
devices
numerical input is propa$ated throu$h the network romneuron to neuron till it reaches the output
The connections between neurons havenumerical weights which are used to combinethe si$nals reachin$ a neuron
3earnin$ involves establishin$ the wei$ht values(strengths) to achieve a particular $oal 0n theor% the stren$ths could be pro$rammed rather than
learnt but or the most part this would be impossibl% tedious
8/18/2019 09 Artificial Neural Networks and Classification
11/43
4esi$nin$ a network Creatin$ an ann re5uires the ollowin$ to be
specifed
Network topology the number o units the pattern o interconnectivit% amon$st them
the mathematical t%pe o the wei$hts
Transer unction This combines the inputs impin$in$ on the unit and produces
the unit activation level which then becomes the output si$nal Representation or e!amples earning law
This states how wei$hts are to be modifed to achieve thelearnin$ $oal
8/18/2019 09 Artificial Neural Networks and Classification
12/43
Network topolo$% & neurons and
la%ers ,pecifes how man% nodes (neurons) there are and
how the% are connected in a ully connected network each node is connected to ever%
other 6ten networks are or$anised in layers (slabs) with no
connections between nodes in a la%er & onl% across The frst la%er is the input layer + the last+ the output layer 3a%ers between the input and output la%ers are called hidden
The input units t%picall% do not carr% out internalcomputation+ ie do not have transer unctions the% merel% pass on their si$nal values
The output units send their si$nal directl% to theoutside world
8/18/2019 09 Artificial Neural Networks and Classification
13/43
Network topolo$% & wei$hts -ei$hts are usuall% real&valued
At the start o learnin$+ their values are oten set
randoml% 0 there is a connection rom a to b then a has
in7uence over the activation value o b !xcitatory in"uence
hi$h activation in unit a contributes to hi$h activation in unit b
is modelled b% a positive wei$ht #nhibitory in"uence
hi$h activation in unit a contributes to low activation in unit b
is modelled b% a ne$ative wei$ht
8/18/2019 09 Artificial Neural Networks and Classification
14/43
Network topolo$% & 7ow o
computation Althou$h connections are uni&directional+ some
networks have pairs o units connected in both
directions there is a connection rom unit a to unit b and one
back rom unit b to unit a
Networks in which there is no loopin$ back oconnections are called eed-orward si$nals are 8ed orward8 rom input throu$h to output
Networks in which outputs are eventuall% edback into the network as inputs are calledrecurrent
8/18/2019 09 Artificial Neural Networks and Classification
15/43
9!amples o eed&orwardtopolo$ies
6-node input layer
2-node output layer
Single layer network
4-node input layer
4-node hidden layer 1-node output layer
Two layer network with 1 hidden layer
8/18/2019 09 Artificial Neural Networks and Classification
16/43
The transer unction &
combinin$ input si$nals The input si$nals to a neuron must be combined into
a sin$le value+ the activation level to be output
:suall% this transer takes place in two sta$es frst the inputs are combined
and then passed throu$h another unction to produce the
output
The most common method o combination is the
weighted sum
1 1 ...
n n sum w x w x= + +
Here x i is the si$nal and wi is the wei$ht on
connection # and n is the number o input si$nals
8/18/2019 09 Artificial Neural Networks and Classification
17/43
The transer unction & the
activation level The wei$hted sum is passed throu$h an activation
unction to produce the output si$nal (activation level) %′ Commonl% used unctions are
inear The output is ust the wei$hted sum
inear threshold ($tep unction) The wei$hted sum is thresholded at a value c i it is less than
c+ then y ′ % &+ otherwise y ′ % '
$igmoid response (logistic) unction a continuous version o the step unction which produces
$raceul de$radation around the 8step8 at c
)(
1
1c sum
e
y−−
+=′
8/18/2019 09 Artificial Neural Networks and Classification
18/43
Activation unction$raphs
c0
1
,i$moid
c0
1
,tep
c
0
3inear
8/18/2019 09 Artificial Neural Networks and Classification
19/43
9!ample
w" ; #/
w
8/18/2019 09 Artificial Neural Networks and Classification
20/43
3earnin$ with Anns
8/18/2019 09 Artificial Neural Networks and Classification
21/43
-hat tasks can a network learn1 Networks can be trained or the ollowin$ tasks
2lassifcation 3attern association
e$ 9n$lish verbs mapped to their past tense
2ontent addressable4associative memory e$ can recall>restore whole ima$e when provided with a part o
it
These all involve mappin$s The mappin$ o input to output is determined b% the settin$s
o all the wei$hts in the network (the weight vector ) ? this iswhat is learnt
The network node conf$uration to$ether with the wei$htvector is the knowled$e structure
8/18/2019 09 Artificial Neural Networks and Classification
22/43
3earnin$ laws
3earnin$ provides a means o fndin$ the wei$htsettin$s to implement a mappin$ This is onl% possible i the network is capable o
representin$ the mappin$ The more comple! the mappin$+ the lar$er the network
that will be re5uired includin$ a $reater number ohidden la%ers
0nitiall%+ wei$hts are set at random and altered inresponse to the trainin$ data
A re$ime or wei$ht alteration to achieve there5uired mappin$ is called a learning law
9ven i a network can represent a mappin$+ aparticular learnin$ law ma% not be able to learn it
8/18/2019 09 Artificial Neural Networks and Classification
23/43
@epresentation o trainin$e!amples :nlike decision trees which handle both
discrete and continuous (numeric) attributes+
anns can handle onl% the latter All discrete attributes must be converted
(encoded) to be numeric This also applies to the class
,everal wa%s are available and the choiceaects the success o learnin$
8/18/2019 09 Artificial Neural Networks and Classification
24/43
4escription attributes 0t is desirable or all attributes to have values in
the same ran$e This is usuall% taken to be # to "
Achieved or numeric attributes usin$normalisation
value → (value - min value) 4 (max value - min value)
Bor discrete attributes can use
'-out-o-N encoding (distributed) N binar% (#&") units used to represent the N values o the
attribute+ one or each
local encoding values mapped to numbers in ran$e # to "
more suited to ordered values
8/18/2019 09 Artificial Neural Networks and Classification
25/43
Class attribute
"&out&o&N or local encodin$ can be used or the class
The network output ater learnin$ is usuall% onl%
appro!imate e$ in a binar% class problem with classes represented b% # and "+the network mi$ht output #= and this would be taken as ‘"’
:sin$ "&out&o&N encodin$ allows or a probabilisticinterpretation+ e$
classes or car domain unacc+ acc+ $ood+ v$ood
can be represented with our binar% units
e$ acc → (#+ "+ #+ #)
6utput o (#
8/18/2019 09 Artificial Neural Networks and Classification
26/43
Network conf$uration
9ncodin$ o trainin$ e!amples aects networksi*e
0nput la%er will have one unit or each numeric attribute
one or each locall% encoded discrete attribute
" or each binar% discrete attribute
k or each distributed encodin$ o a discrete attributewhere the attribute has kE< values
:suall% have a small number o hidden la%ers(one or two)
8/18/2019 09 Artificial Neural Networks and Classification
27/43
'%ramid structure
Hidden la%ers are used to reduce thedimensionalit% o the input
A network has a pyramid structure i the frst hidden la%er ewer nodes than the input la%er
each hidden la%er has less than its predecessor
the output la%er has least
The p%ramid structure acilitates learnin$ 0n classifcation each hidden la%er appears to partiall%
classi% the e!amples until the actual classes arereached in the output la%er
8/18/2019 09 Artificial Neural Networks and Classification
28/43
The learnin$ process
Classifcation learnin$ uses a eedback mechanism
An e!ample is ed throu$h the network usin$the e!istin$ wei$hts
The output value is 5F the correct output value+ie the class in the e!ample+ is T (tar$et)
0 5 ≠ T + some or all o the wei$hts are chan$edsli$htl%
The e!tent o the chan$e usuall% depends onT &5+ called the error
8/18/2019 09 Artificial Neural Networks and Classification
29/43
The delta rule
A wei$ht+ wi + on a connection carr%in$ si$nal+
x i + can be modifed b% addin$ an amount ∆wi
proportional to the error∆wi ; η (T-5) x i
where η is the learning rate
η is a positive constant usuall% set at about #"
and $raduall% decreased durin$ learnin$ The update ormula or wi is then
wi ← wi G ∆wi
8/18/2019 09 Artificial Neural Networks and Classification
30/43
Trainin$ epochs
Bor each e!ample in the trainin$ set the description attribute values are ed as input to
the network and propa$ated throu$h to the output each wei$ht is updated
This constitutes one epoch or cycle o learnin$
The process is repeated till it is decided to stop 2an% thousands o epochs ma% be necessar%
The fnal set o wei$hts represent the learnedmappin$
8/18/2019 09 Artificial Neural Networks and Classification
31/43
-orked e!ample & $ol domain Conversion o attributes
Attribute Values
Outlook sunny, overcast, rain
Temperature -0 to 10 !
"umidity lo#, normal, hi$h,%indy
!lass
true, &alse
yes, no
Attribute Values
'unnyOvercast(ain
0, 10, 10, 1
Temperature 0 to 1 T ← (T+50)!00)o# *ormal"i$h%indy
0, 10, 10, 10, 1
+lay $ol& 1, 0
8/18/2019 09 Artificial Neural Networks and Classification
32/43
Network conf$uration
:se a sin$le la%er network (no hidden units) with step unctionto illustrate the delta rule
0nitialise wei$hts as shown
,et η % &*'
,unn%
6vercast
@ain
Temperature3owNormal
Hi$h
-ind%
w&%&*+
(bias)
w' % -&*/
w6 % -&*0
w+ % &*6
w0 % &*+
w/ % &*'
w7 % -&*'
w8 % -&*6w % &*0
-1w1w!w"
w#w5w$ w% w&
8/18/2019 09 Artificial Neural Networks and Classification
33/43
Beedin$ a trainin$e!ample Birst e!ample is
(sunn%+
8/18/2019 09 Artificial Neural Networks and Classification
34/43
The backpropa$ation al$orithm
8/18/2019 09 Artificial Neural Networks and Classification
35/43
3earnin$ in multi&la%erednetworks Networks with one or more hidden la%ers are
necessar% to represent comple! mappin$s
0n such a network the basic delta learnin$ law isinsuKcient 0t onl% defnes how to update wei$hts in output units
(uses T&6)
To update hidden node wei$hts+ we have to defne
their error This is achieved b% the 9ackpropagation al$orithm
8/18/2019 09 Artificial Neural Networks and Classification
36/43
The .ackpropa$ationprocess 0nputs are ed throu$h the network in the usual
wa% this is the orward pass
6utput la%er wei$hts are adusted based onerrors
L then wei$hts in the previous la%er are adustedL
L and so on back to the frst la%er this is the backwards pass (or backpropagation)
9rrors determined in a la%er are used to determinethose in the previous la%er
8/18/2019 09 Artificial Neural Networks and Classification
37/43
0llustratin$ the errorcontribution A hidden node is partiall% ‘credited’ or errors
in the ne!t la%er
these errors are created in the orward passerror '
error 6
error +
error k
w'
wk
error:contribution % w' error ' . ; . wk error k
'
8/18/2019 09 Artificial Neural Networks and Classification
38/43
The backpropa$ational$orithm
A backpropagation network is a multi&la%ered eed&orward network
usin$ the si$moid response activation unction
9ackpropagation algorithm" 0nitialise all network wei$hts to small random numbers
(between #= and ##=)
8/18/2019 09 Artificial Neural Networks and Classification
39/43
Termination conditions
2an% thousands o iterations (epochs or c%cles)ma% be necessar% to learn a classifcation mappin$ The more comple! the mappin$ to be learnt+ the more
c%cles will be re5uired
,everal termination conditions are used stop ater a $iven number o epochs
stop when the error on the trainin$ e!amples (or on aseparate validation set) alls below some a$reed level
,toppin$ too soon results in underftting+ too latein overftting
8/18/2019 09 Artificial Neural Networks and Classification
40/43
.ackpropa$ation as asearch 3earnin$ is a search or a network wei$ht
vector to implement the re5uired mappin$
The search is hill&climbin$ or ratherdescending called steepest gradient descent The heuristic used is the total o (T&6)
8/18/2019 09 Artificial Neural Networks and Classification
41/43
'roblems with the search
The si*e o step is controlled b% the learnin$ rate parameter This must be tuned or individual problems
0 the step is too lar$e search becomes ineKcient
The error surace tends to have e!tensive 7at areas trou$hs with ver% little slope
0t can be diKcult to reduce error in such re$ions
-ei$hts have to move lar$e distances and it can be hard todetermine the ri$ht direction
Hi$h numerical accurac% is re5uired+ e$ /
8/18/2019 09 Artificial Neural Networks and Classification
42/43
The trained network
Ater learnin$+ .ackpropa$ation ma% be usedas a classifer
4escriptions o new e!amples are ed into thenetwork and the class is read rom the output la%er
Bor "&out&o&N output representations+ e!act valueso # and " will not usuall% be obtained
$ensitivity analysis (usin$ test data)
determines which attributes are mostimportant or classifcation An attribute is re$arded as important i small
chan$es in its value aect the classifcation
8/18/2019 09 Artificial Neural Networks and Classification
43/43
.ackpropa$ation versus04/
These two al$orithms are the $iants oclassifcation learnin$
-hich is better1
the ur% is still out There are maor dierences
04/ avours discrete attributes+ .ackprop avourscontinuous (but each handles both t%pes)
.ackprop handles noise well .% usin$ prunin$+ so does04/
.ackprop is much slower than 04/ and ma% $et stuck 04/ tells us which attributes are important .ackprop does
this+ (to some e!tent) with sensitivit% anal%sis .ackprop’s learned knowled$e structure (wei$ht vector) is
not understandable whereas an 04/ tree can becomprehended (althou$h this is diKcult i the tree is lar$e)