Artificial Neural Networks...Perceptron can learn only linearly separable classification problems. Feed-forward networks with non-linear activation functions and hidden layers can

Introduction to Computational Neuroscience

Artificial Neural Networks

Tambet Matiisen

15.10.2018

Artificial neural network

NB! Inspired by biology, not based on biology!

Applications Automatic speech recognition Automatic image tagging

Machine translation

Learning objectives

How artificial neural networks work?

What types of artificial neural networks are used for what tasks?

What are the state-of-the-art results achieved with artificial neural networks?

HOW NEURAL NETWORKS WORK? Part 1

Frank Rosenblatt (1957)

Added learning rule to McCulloch-Pitts neuron.

1, if 0

0, otherwise

i i

i

x w bz

Perceptron

Prediction: Learning:

1

( )

( )

i i iw w y z x

b b y z

b

Σ

1x

2xz

1w

2w

Let’s try it out!

x1 x2 y = x1 or x2

0 0 0

0 1 1

1 0 1

1 1 1

1 1 2 2

1 1 1

2 2 2

1, if x 0

0,otherwise

( )

( )

( )

w x w bz

w w y z x

w w y z x

b b y z

repeat

until y=z holds for entire dataset

Algorithm:

Perceptron limitations

Perceptron learning algorithm converges only for linearly separable problems.

Minsky, Papert, “Perceptrons” (1969)

Multi-layer perceptrons

Add non-linear activation functions

Add hidden layer(s)

Universal approximation theorem: Any continous function can be approximated to given precision by feed-forward neural network with single hidden layer containing finite number of neurons.

Forward propagation

+1

x1

x2

+1

Σ

Σ

Σ

xex

1

1)(

1b

2b

11w

12w

21w

22w

c

1 1 11 2 21 1

1 1( )

a x w x w b

h a

2 1 12 2 22 2

2 2( )

a x w x w b

h a

1 1 2 2z h v h v c

1v

2v

Loss function

• Function approximation:

21( )

2L z y

2( 10)z

Now we just need to find weight values that minimize the loss function for all inputs. How do we do that?

Backpropagation

+1

x1

x2

+1

Σ

Σ

Σ

))(1)(()(' xxx

Lz y

z

L L zz y

c z c

1

1 1

( )L L z

z y hv z v

2

2 2

( )L L z

z y hv z v

11 1 1

1 1 1

( ) (1 )hL L z

z y v h ha z h a

22 2 2

2 2 2

( ) (1 )hL L z

z y v h ha z h a

1 1

1 1 1 1

h aL L z

b z h a b

2 2

2 2 2 2

h aL L z

b z h a b

1 1

11 1 1 11

h aL L z

w z h a w

2 2

12 2 2 12

h aL L z

w z h a w

1 1

21 1 1 21

h aL L z

w z h a w

2 2

22 2 2 22

h aL L z

w z h a w

1 1 2 2i i i ia x w x w b ( )i ih a 1 1 2 2z h v h v c 21( )

2L z y

Gradient Descent

• Gradient descent finds weight values that result in small loss.

• Gradient descent is guaranteed to find only local minimum.

• But there is plenty of them and they are often good enough!

{ , , , }

learning rate

ij j jw v b c

L

Other loss functions

• Binary classification:

• Multi-class classification:

( )

log( ) (1 ) log(1 )

p z

L y p y p

softmax( ),

log log

i

j

z

i z

j

i i k

i

ep z p

e

L y p p

log( )p

log(1 )p

xex

1

1)(

Things to remember...

Perceptron was the first artificial neuron model invented in late 1950s.

Perceptron can learn only linearly separable classification problems.

Feed-forward networks with non-linear activation functions and hidden layers can overcome limitations of perceptrons.

Multi-layer artificial neural networks are trained using backpropagation and gradient descent.

NEURAL NETWORKS TAXONOMY

Part 2

Simple feed-forward networks

• Architecture:

– Each node connected to all nodes of previous layer.

– Information moves in one direction only.

• Used for:

– Function approximation

– Simple classification problems

– Not too many inputs (~100)

OUTPUT LAYER

INPUT LAYER

HIDDEN LAYER

Convolutional neural networks

• Architecture:

– Convolutional layer: local connections + weight sharing.

– Pooling layer: translation invariance.

• Used for:

– images and spatial data,

– any other data with locality property, i.e. adjacent characters make up word.

-2

2 2

2 1

0 1 2 -1

POOLING LAYER

INPUT LAYER

1

2

-3

CONVOLUTIONAL LAYER

1 0 -1 weights:

max

Hubel & Wiesel (1959)

• Performed experiments with anesthetized cat.

• Discovered topographical mapping, sensitivity to orientation and hierarchical processing.

Convolution

Convolution matches the same pattern over entire image and calculates score for each match.

Example: edge detector

https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html




Pooling

Pooling achieves translation invariance by taking maximum of adjacent convolution scores.

Example: handwritten digit recognition

Y. LeCun et al., “Handwritten digit recognition: Applications of neural net chips and automatic learning”, 1989.

LeCun et al. (1989)

Recurrent neural networks

• Architecture:

– Hidden layer nodes connected to itself.

– Allows retaining internal state and memory.

• Used for:

– speech recognition,

– machine translation,

– language modeling,

– any time series.

OUTPUT LAYER

INPUT LAYER

HIDDEN LAYER

Backpropagation through time

h1

z1

x1

OUTPUT LAYER

INPUT LAYER

h2

z2

x2

h3

z3

x3

h0 HIDDEN LAYER

time

h4

z4

x4

y4 y3 y2 y1

21( )

2L z y

Different configurations

Autoencoders

• Architecture: – Input and output layers

are the same.

– Hidden layer functions as a “bottleneck”.

– Network is trained to reconstruct input from hidden layer activations.

• Used for: – image semantic hashing

– dimensionality reduction

OUTPUT LAYER = INPUT LAYER

INPUT LAYER

HIDDEN LAYER

We didn’t talk about...

• Long Short Term Memory (LSTMs)

• Restricted Boltzmann Machines (RBMs)

• Echo State Networks / Liquid State Machines

• Hopfield Network

• Self-organizing maps (SOMs)

• Radial basis function networks (RBFs)

• But we covered the most important ones!


Simple feed-forward networks are usually used for function approximation and classification with few input features.

Convolutional neural networks are mostly used for images and spatial data.

Recurrent neural networks are used for language modeling and time series.

Autoencoders are used for image semantic hashing and dimensionality reduction.

SOME STATE-OF-THE-ART RESULTS Part 3

Deep Learning

• Artificial neural networks and backpropagation have been around since 1980s. What’s all this fuss about “deep learning”?

• What has changed:

– we have much bigger datasets,

– we have much faster computers (think GPUs),

– we have learned few tricks how to train neural networks with very many layers.

Revolution of Depth

(human error ~5.1%)

Neural Image Processing

Instance Segmentation

https://github.com/matterport/Mask_RCNN

https://www.youtube.com/watch?v=OOT3UIXZztE

https://github.com/matterport/Mask_RCNN

https://www.youtube.com/watch?v=OOT3UIXZztE

Image Captioning

Image Captioning Errors

Reinforcement learning

Pong Breakout Space Invaders

Seaquest Beam Rider Enduro

screen score

actions

http://sodeepdude.blogspot.com/2015/03/deepminds-atari-paper-replicated.html

Mnih et al., “Human-level control through deep reinforcement learning” (2015)

https://media.nature.com/original/nature-assets/nature/journal/v518/n7540/extref/nature14236-sv2.mov








Skype Translator

https://www.youtube.com/watch?v=NhxCg2PA3ZI

https://www.youtube.com/watch?v=NhxCg2PA3ZI

Adversarial Examples

https://www.youtube.com/watch?v=XaQu7kkQBPc

https://www.youtube.com/watch?v=XaQu7kkQBPc


Artificial neural networks are state-of-the-art in image recognition, speech recognition, machine translation and many other fields.

Anything that you can do in 1 second, probably we can train a neural network to do the same, i.e. neural nets can do perception.

But in the end they are just reactive function approximators and can be easily fooled. In particular they do not think like humans (yet).

Thank you!

[email protected]

Documents

Artificial Neural Networks...Perceptron can learn only linearly separable classification problems. Feed-forward networks with non-linear activation functions and hidden layers can