Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Introduction to Computational Neuroscience
Artificial Neural Networks
Tambet Matiisen
15.10.2018
Artificial neural network
NB! Inspired by biology, not based on biology!
Applications Automatic speech recognition Automatic image tagging
Machine translation
Learning objectives
How artificial neural networks work?
What types of artificial neural networks are used for what tasks?
What are the state-of-the-art results achieved with artificial neural networks?
HOW NEURAL NETWORKS WORK? Part 1
Frank Rosenblatt (1957)
Added learning rule to McCulloch-Pitts neuron.
1, if 0
0, otherwise
i i
i
x w bz
Perceptron
Prediction: Learning:
1
( )
( )
i i iw w y z x
b b y z
b
Σ
1x
2xz
1w
2w
Let’s try it out!
x1 x2 y = x1 or x2
0 0 0
0 1 1
1 0 1
1 1 1
1 1 2 2
1 1 1
2 2 2
1, if x 0
0,otherwise
( )
( )
( )
w x w bz
w w y z x
w w y z x
b b y z
repeat
until y=z holds for entire dataset
Algorithm:
Perceptron limitations
Perceptron learning algorithm converges only for linearly separable problems.
Minsky, Papert, “Perceptrons” (1969)
Multi-layer perceptrons
Add non-linear activation functions
Add hidden layer(s)
Universal approximation theorem: Any continous function can be approximated to given precision by feed-forward neural network with single hidden layer containing finite number of neurons.
Forward propagation
+1
x1
x2
+1
Σ
Σ
Σ
xex
1
1)(
1b
2b
11w
12w
21w
22w
c
1 1 11 2 21 1
1 1( )
a x w x w b
h a
2 1 12 2 22 2
2 2( )
a x w x w b
h a
1 1 2 2z h v h v c
1v
2v
Loss function
• Function approximation:
21( )
2L z y
2( 10)z
Now we just need to find weight values that minimize the loss function for all inputs. How do we do that?
Backpropagation
+1
x1
x2
+1
Σ
Σ
Σ
))(1)(()(' xxx
Lz y
z
L L zz y
c z c
1
1 1
( )L L z
z y hv z v
2
2 2
( )L L z
z y hv z v
11 1 1
1 1 1
( ) (1 )hL L z
z y v h ha z h a
22 2 2
2 2 2
( ) (1 )hL L z
z y v h ha z h a
1 1
1 1 1 1
h aL L z
b z h a b
2 2
2 2 2 2
h aL L z
b z h a b
1 1
11 1 1 11
h aL L z
w z h a w
2 2
12 2 2 12
h aL L z
w z h a w
1 1
21 1 1 21
h aL L z
w z h a w
2 2
22 2 2 22
h aL L z
w z h a w
1 1 2 2i i i ia x w x w b ( )i ih a 1 1 2 2z h v h v c 21( )
2L z y
Gradient Descent
• Gradient descent finds weight values that result in small loss.
• Gradient descent is guaranteed to find only local minimum.
• But there is plenty of them and they are often good enough!
{ , , , }
learning rate
ij j jw v b c
L
Other loss functions
• Binary classification:
• Multi-class classification:
( )
log( ) (1 ) log(1 )
p z
L y p y p
softmax( ),
log log
i
j
z
i z
j
i i k
i
ep z p
e
L y p p
log( )p
log(1 )p
xex
1
1)(
Things to remember...
Perceptron was the first artificial neuron model invented in late 1950s.
Perceptron can learn only linearly separable classification problems.
Feed-forward networks with non-linear activation functions and hidden layers can overcome limitations of perceptrons.
Multi-layer artificial neural networks are trained using backpropagation and gradient descent.
NEURAL NETWORKS TAXONOMY
Part 2
Simple feed-forward networks
• Architecture:
– Each node connected to all nodes of previous layer.
– Information moves in one direction only.
• Used for:
– Function approximation
– Simple classification problems
– Not too many inputs (~100)
OUTPUT LAYER
INPUT LAYER
HIDDEN LAYER
Convolutional neural networks
• Architecture:
– Convolutional layer: local connections + weight sharing.
– Pooling layer: translation invariance.
• Used for:
– images and spatial data,
– any other data with locality property, i.e. adjacent characters make up word.
-2
2 2
2 1
0 1 2 -1
POOLING LAYER
INPUT LAYER
1
2
-3
CONVOLUTIONAL LAYER
1 0 -1 weights:
max
Hubel & Wiesel (1959)
• Performed experiments with anesthetized cat.
• Discovered topographical mapping, sensitivity to orientation and hierarchical processing.
Convolution
Convolution matches the same pattern over entire image and calculates score for each match.
Example: edge detector
https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html
Pooling
Pooling achieves translation invariance by taking maximum of adjacent convolution scores.
Example: handwritten digit recognition
Y. LeCun et al., “Handwritten digit recognition: Applications of neural net chips and automatic learning”, 1989.
LeCun et al. (1989)
Recurrent neural networks
• Architecture:
– Hidden layer nodes connected to itself.
– Allows retaining internal state and memory.
• Used for:
– speech recognition,
– machine translation,
– language modeling,
– any time series.
OUTPUT LAYER
INPUT LAYER
HIDDEN LAYER
Backpropagation through time
h1
z1
x1
OUTPUT LAYER
INPUT LAYER
h2
z2
x2
h3
z3
x3
h0 HIDDEN LAYER
time
h4
z4
x4
y4 y3 y2 y1
21( )
2L z y
Different configurations
Autoencoders
• Architecture: – Input and output layers
are the same.
– Hidden layer functions as a “bottleneck”.
– Network is trained to reconstruct input from hidden layer activations.
• Used for: – image semantic hashing
– dimensionality reduction
OUTPUT LAYER = INPUT LAYER
INPUT LAYER
HIDDEN LAYER
We didn’t talk about...
• Long Short Term Memory (LSTMs)
• Restricted Boltzmann Machines (RBMs)
• Echo State Networks / Liquid State Machines
• Hopfield Network
• Self-organizing maps (SOMs)
• Radial basis function networks (RBFs)
• But we covered the most important ones!
Things to remember...
Simple feed-forward networks are usually used for function approximation and classification with few input features.
Convolutional neural networks are mostly used for images and spatial data.
Recurrent neural networks are used for language modeling and time series.
Autoencoders are used for image semantic hashing and dimensionality reduction.
SOME STATE-OF-THE-ART RESULTS Part 3
Deep Learning
• Artificial neural networks and backpropagation have been around since 1980s. What’s all this fuss about “deep learning”?
• What has changed:
– we have much bigger datasets,
– we have much faster computers (think GPUs),
– we have learned few tricks how to train neural networks with very many layers.
Revolution of Depth
(human error ~5.1%)
Neural Image Processing
Instance Segmentation
https://github.com/matterport/Mask_RCNN
https://www.youtube.com/watch?v=OOT3UIXZztE
Image Captioning
Image Captioning Errors
Reinforcement learning
Pong Breakout Space Invaders
Seaquest Beam Rider Enduro
screen score
actions
http://sodeepdude.blogspot.com/2015/03/deepminds-atari-paper-replicated.html
Mnih et al., “Human-level control through deep reinforcement learning” (2015)
Skype Translator
https://www.youtube.com/watch?v=NhxCg2PA3ZI
Adversarial Examples
https://www.youtube.com/watch?v=XaQu7kkQBPc
Things to remember...
Artificial neural networks are state-of-the-art in image recognition, speech recognition, machine translation and many other fields.
Anything that you can do in 1 second, probably we can train a neural network to do the same, i.e. neural nets can do perception.
But in the end they are just reactive function approximators and can be easily fooled. In particular they do not think like humans (yet).
Thank you!