Upload
cleopatra-goodman
View
227
Download
2
Tags:
Embed Size (px)
Citation preview
11
ECE 517: Reinforcement Learning in ECE 517: Reinforcement Learning in Artificial IntelligenceArtificial Intelligence
Lecture 13: Artificial Neural Networks –Lecture 13: Artificial Neural Networks – Introduction, Feedforward Neural Networks Introduction, Feedforward Neural Networks
Dr. Itamar ArelDr. Itamar Arel
College of EngineeringCollege of EngineeringElectrical Engineering and Computer Science DepartmentElectrical Engineering and Computer Science Department
The University of TennesseeThe University of TennesseeFall 2012Fall 2012
October 30, 2012October 30, 2012
ECE 517 - Reinforcement Learning
Final projects - logistics Final projects - logistics
Projects can be done individually or in pairsProjects can be done individually or in pairs
Students are encouraged to propose a topicStudents are encouraged to propose a topic
Please email me your top three choices for a project Please email me your top three choices for a project along with a preferred along with a preferred date for your presentationdate for your presentation
Presentation dates:Presentation dates: Nov. 27, 29 and Dec. 4Nov. 27, 29 and Dec. 4
Format: 17 min presentation + 3 min Q&AFormat: 17 min presentation + 3 min Q&A ~7 min for background and motivation~7 min for background and motivation ~10 for description of your work and conclusions~10 for description of your work and conclusions
Written report due: Friday, Dec. 7Written report due: Friday, Dec. 7
Format similar to project reportFormat similar to project report
22
ECE 517 - Reinforcement Learning
Final projects - topicsFinal projects - topics
Teris player using RL (and NN)Teris player using RL (and NN)
Curiosity based TD learning*Curiosity based TD learning*States vs. Rewards in RLHuman reinforcement learning
Reinforcement Learning of Local Shape in the Game of Reinforcement Learning of Local Shape in the Game of GoGoWhere do rewards come from?Efficient Skill Learning using Abstraction Selection
AIBO Playing on a PC using RL*AIBO Playing on a PC using RL*
AIBO learning to walk within a maze* AIBO learning to walk within a maze*
Study of value function definitions for TD learning*Study of value function definitions for TD learning*
33
ECE 517 - Reinforcement Learning 44
OutlineOutline
IntroductionIntroduction
Brain vs. ComputersBrain vs. Computers
The PerceptronThe Perceptron
Multilayer Perceptrons (MLP)Multilayer Perceptrons (MLP)
Feedforward Neural-Networks and BackpropagationFeedforward Neural-Networks and Backpropagation
ECE 517 - Reinforcement Learning 55
Pigeons as art experts Pigeons as art experts (Watanabe (Watanabe et al.et al. 1995) 1995)
ExperimentExperiment:: Pigeon was placed in a closed boxPigeon was placed in a closed box Present paintings of two different artists (e.g. Chagall / Present paintings of two different artists (e.g. Chagall /
Van Gogh)Van Gogh) Reward for peckingReward for pecking
when presented a when presented a particular artist particular artist (e.g. Van Gogh)(e.g. Van Gogh)
Pigeons were able todiscriminate betweenVan Gogh and Chagallwith 95% accuracy(when presented withpictures they had beentrained on)
ECE 517 - Reinforcement Learning 66
Pictures by different artistsPictures by different artists
ECE 517 - Reinforcement Learning 77
Interesting resultsInteresting results
Discrimination still Discrimination still 85% successful85% successful for previously for previously unseen paintings of the artistsunseen paintings of the artists
Conclusions from the experiment:Conclusions from the experiment: Pigeons do not simply memorise the picturesPigeons do not simply memorise the pictures They can extract and recognise patterns (e.g. artistic They can extract and recognise patterns (e.g. artistic
‘style’)‘style’) They generalise from the already seen to make They generalise from the already seen to make
predictionspredictions
This is what neural networks (biological and This is what neural networks (biological and artificial) are good at artificial) are good at (unlike conventional computer)(unlike conventional computer)
Provided further justification for use of ANNsProvided further justification for use of ANNs“Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are
powerful beyond imagination,” Albert Einstein
ECE 517 - Reinforcement Learning 88
The “Von Neumann” architecture vs. Neural NetworksThe “Von Neumann” architecture vs. Neural Networks
Memory for programs Memory for programs and dataand data
CPU for math and logicCPU for math and logic
Control unit to steer Control unit to steer program flowprogram flow
Follows rulesFollows rules
Solution can/must be Solution can/must be formally specifiedformally specified
Cannot generalizeCannot generalize
Not error tolerantNot error tolerant
Learns from dataLearns from data
Rules on data are not Rules on data are not visiblevisible
Able to generalizeAble to generalize
Copes well with noiseCopes well with noise
Von Neumann
Neural Net
ECE 517 - Reinforcement Learning 99
Biological NeuronBiological Neuron
Input builds up on receptors (dendrites)
Cell has an input threshold
Upon breech of cell’s threshold, activation is fired down the axon
Synapses (i.e. weights) exist prior to the dendrites (input) interfaces
ECE 517 - Reinforcement Learning 1010
ConnectionismConnectionism
Connectionist techniques (a.k.a. neural networks) are Connectionist techniques (a.k.a. neural networks) are inspired by the strong interconnectedness of the human inspired by the strong interconnectedness of the human brain. brain. Neural networks are loosely modeled after the biological Neural networks are loosely modeled after the biological processes involved in cognition: processes involved in cognition:
1.1. Information processing involves many simple processing Information processing involves many simple processing elements called neurons. elements called neurons.
2. Signals are transmitted between neurons using 2. Signals are transmitted between neurons using connecting links. connecting links.
3. Each link has a weight that modulates (or controls) the 3. Each link has a weight that modulates (or controls) the strength of its signal. strength of its signal.
4. Each neuron applies an activation function to the input 4. Each neuron applies an activation function to the input that it receives from other neurons. This function that it receives from other neurons. This function determines its output.determines its output.
Links with Links with positivepositive weights are called weights are called excitatoryexcitatory links. links. Links with Links with negativenegative weights are called weights are called inhibitoryinhibitory links. links.
ECE 517 - Reinforcement Learning 1111
Some definitionsSome definitions
A A Neural NetworkNeural Network is an interconnected assembly of is an interconnected assembly of simple processing elements, simple processing elements, unitsunits or or nodesnodes. The long-. The long-term memory of the network is stored in the inter-unit term memory of the network is stored in the inter-unit connection strengths, or connection strengths, or weightsweights, obtained by a process , obtained by a process of adaptation to, or of adaptation to, or learninglearning from, a set of training from, a set of training patterns. patterns.
Biologically inspired learning mechanismBiologically inspired learning mechanism
ECE 517 - Reinforcement Learning 1212
Brain vs. ComputerBrain vs. Computer
Performance tends to degrade gracefully under partial damagePerformance tends to degrade gracefully under partial damageIn contrast, most programs and engineered systems are brittle: In contrast, most programs and engineered systems are brittle: if you remove some arbitrary parts, very likely the whole will if you remove some arbitrary parts, very likely the whole will cease to functioncease to functionIt performs It performs massively parallel computations massively parallel computations extremely extremely efficiently. For example, complex visual perception occurs efficiently. For example, complex visual perception occurs within less than 100 ms, that is, 10 processing steps! within less than 100 ms, that is, 10 processing steps!
ECE 517 - Reinforcement Learning 1313
Dimensions of Neural NetworksDimensions of Neural Networks
Various types of Various types of neuronsneurons
Various network Various network architecturesarchitectures
Various Various learning algorithmslearning algorithms
Various Various applicationsapplications
We’ll focus mainly on supervised learning based We’ll focus mainly on supervised learning based networksnetworks
TheThe architecturearchitecture of a neural network is linked of a neural network is linked with the learning algorithm used to trainwith the learning algorithm used to train
ECE 517 - Reinforcement Learning 1414
ANNs – The basicsANNs – The basics
ANNs incorporate ANNs incorporate the two the two fundamental fundamental components of components of biological neural biological neural nets:nets: NeuronsNeurons – –
computational computational nodesnodes
SynapsesSynapses – – weights or weights or memory storage memory storage devicesdevices
ECE 517 - Reinforcement Learning 1515
Neuron vs. NodeNeuron vs. Node
ECE 517 - Reinforcement Learning 1616
The Artificial NeuronThe Artificial Neuron
Inputsignal
Synapticweights
Summingfunction
Bias
b
Activation functionLocal
Field
vOutput
y
x1
x2
xm
w2
wm
w1
)(
ECE 517 - Reinforcement Learning 1717
Bias as an extra inputBias as an extra input
Bias is an external parameter of the neuron. Can be modeled by adding an extra (fixed-valued) input
Inputsignal
Synapticweights
Summingfunction
ActivationfunctionLocal
Field
vOutput
y
x1
x2
xm
w2
wm
w1
)(
w0x0 = +1
bw
xwv j
m
j
j
0
0
ECE 517 - Reinforcement Learning 1818
Face recognition exampleFace recognition example
90% accurate learning head pose, and recognizing 1-of-20 faces
ECE 517 - Reinforcement Learning 1919
The XOR problemThe XOR problem
A single-layer (linear) neural network cannot solve the A single-layer (linear) neural network cannot solve the XOR problem. XOR problem.
Input Input OutputOutput 00 00 0 0 01 01 1 1 10 10 1 1 11 11 0 0
To see why this is true, we can try to express the problem To see why this is true, we can try to express the problem as a linear equation: as a linear equation: aX + bY = ZaX + bY = Z
a0 + b0 = 0 a0 + b0 = 0 a0 + b1 = 1 -> b = 1 a0 + b1 = 1 -> b = 1 a1 + b0 = 1 -> a = 1 a1 + b0 = 1 -> a = 1 a1 + b1 = 0 -> a1 + b1 = 0 -> a = -ba = -b
ECE 517 - Reinforcement Learning 2020
The XOR problem (cont.)The XOR problem (cont.)
But adding a third bit the problem can be resolved. But adding a third bit the problem can be resolved.
Input Input OutputOutput 000 000 0 0 010 010 1 1 100100 1 1 111 111 0 0
Once again, we express the problem as a linear equation: Once again, we express the problem as a linear equation:
aX + bY + cZ = W aX + bY + cZ = W a0 + b0 + c0 = 0 a0 + b0 + c0 = 0 a0 + b1 + c0 = 1 -> b=1 a0 + b1 + c0 = 1 -> b=1 a1 + b0 + c0 = 1 -> a=1 a1 + b0 + c0 = 1 -> a=1 a1 + b1 + c1 = 0 -> a + b + c = 0 -> 1 + 1 + c = 0 -> c = -2 a1 + b1 + c1 = 0 -> a + b + c = 0 -> 1 + 1 + c = 0 -> c = -2 So the equation: So the equation: X + Y - 2Z = W will solve the problem.X + Y - 2Z = W will solve the problem.
ECE 517 - Reinforcement Learning 2121
A Multilayer Network for the XOR functionA Multilayer Network for the XOR function
Thresholds
ECE 517 - Reinforcement Learning 2222
Hidden UnitsHidden Units
Hidden unitsHidden units are a layer of nodes that are situated are a layer of nodes that are situated between the input nodes and the output nodes between the input nodes and the output nodes
Hidden units allow a network to learn non-linear Hidden units allow a network to learn non-linear functionsfunctions
The hidden units allow the net to represent The hidden units allow the net to represent combinations of the input features combinations of the input features Given Given tootoo many hidden unitsmany hidden units, however,, however,
a net will simply memorize the inputa net will simply memorize the inputpatterns patterns
Given Given too few hidden unitstoo few hidden units, the network, the networkmay not be able to represent all of themay not be able to represent all of thenecessary generalizationsnecessary generalizations
ECE 517 - Reinforcement Learning 2323
Backpropagation NetworksBackpropagation Networks
Backpropagation networks Backpropagation networks are among the most popular are among the most popular and widely used neural networks because they are and widely used neural networks because they are relatively simple and powerfulrelatively simple and powerful
Backpropagation was one of the first general techniques Backpropagation was one of the first general techniques developed to train multilayer networks, which do not developed to train multilayer networks, which do not have many of the inherent limitations of the earlier, have many of the inherent limitations of the earlier, single-layer neural nets criticized by Minsky and Papert. single-layer neural nets criticized by Minsky and Papert.
Backpropagation networks use a Backpropagation networks use a gradient descentgradient descent method to minimize the total squared error of the output. method to minimize the total squared error of the output.
A backpropagation net is a A backpropagation net is a multilayermultilayer, , feedforward feedforward networknetwork that is trained by backpropagating the errors that is trained by backpropagating the errors using the using the generalized delta rulegeneralized delta rule. .
ECE 517 - Reinforcement Learning 2424
The idea behind (error) backpropagation learningThe idea behind (error) backpropagation learning
Feedforward training of input patterns Feedforward training of input patterns Each input node receives a signal, which is broadcasted Each input node receives a signal, which is broadcasted to all of the hidden unitsto all of the hidden unitsEach hidden unit computes its activation, which is Each hidden unit computes its activation, which is broadcasted to all of the output nodesbroadcasted to all of the output nodes
Backpropagation of errors Backpropagation of errors Each output node compares itsEach output node compares itsactivation with the desired outputactivation with the desired outputBased on this difference, the error isBased on this difference, the error ispropagated back to all previous nodespropagated back to all previous nodes
Adjustment of weights Adjustment of weights The weights of all links are computedThe weights of all links are computedsimultaneously based on the errors simultaneously based on the errors that were propagated backwards that were propagated backwards
Multilayer Perceptron (MLP)
ECE 517 - Reinforcement Learning 2525
Activation functionsActivation functions
• Transforms neuron’s input into outputTransforms neuron’s input into output• Features of activation functions:Features of activation functions:
• A squashing effect is requiredA squashing effect is requiredPrevents accelerating growth of activation levels Prevents accelerating growth of activation levels through the networkthrough the network
• Simple and easy to calculateSimple and easy to calculate
ECE 517 - Reinforcement Learning 2626
Backpropagation LearningBackpropagation Learning
We want to train a multi-layer feedforward network by We want to train a multi-layer feedforward network by gradient descent gradient descent to approximate an unknown to approximate an unknown function, based on some training data consisting of function, based on some training data consisting of pairs pairs ((xx,,dd))
Vector Vector xx represents a pattern of represents a pattern of inputinput to the network, to the network, and the vector and the vector dd the corresponding the corresponding targettarget (desired (desired output) output)
BP is a gradient-descent based scheme …BP is a gradient-descent based scheme … The overall gradient with respect to the entire training The overall gradient with respect to the entire training
set is just the sum of the gradients for each patternset is just the sum of the gradients for each pattern We will therefore describe how to compute the gradient We will therefore describe how to compute the gradient
for just a single training patternfor just a single training pattern We will number the units, and denote the weight from We will number the units, and denote the weight from
unit unit jj to unit to unit ii by by xxijij
ECE 517 - Reinforcement Learning 2727
BP – Forward Pass at Layer 1BP – Forward Pass at Layer 1
ECE 517 - Reinforcement Learning 2828
BP – Forward Pass at Layer 2BP – Forward Pass at Layer 2
ECE 517 - Reinforcement Learning 2929
BP – Forward Pass at Layer 3BP – Forward Pass at Layer 3
The last layer produces the network’s outputThe last layer produces the network’s output
We can now derive an error (difference between output and We can now derive an error (difference between output and the target) the target)
ECE 517 - Reinforcement Learning 3030
BP – Back-propagation of error – output layerBP – Back-propagation of error – output layer
We have an error with respect to the target (We have an error with respect to the target (zz))This error signal will be propagated back towards the input This error signal will be propagated back towards the input layer (layer 1)layer (layer 1)
Each neuron will forward error information to the neurons Each neuron will forward error information to the neurons feeding it from the previous layerfeeding it from the previous layer
ECE 517 - Reinforcement Learning 3131
BP – Back-propagation of error towards the hidden BP – Back-propagation of error towards the hidden layerlayer
ECE 517 - Reinforcement Learning 3232
BP – Back-propagation of error towards the input layerBP – Back-propagation of error towards the input layer
ECE 517 - Reinforcement Learning 3333
BP – Illustration of Weight UpdateBP – Illustration of Weight Update