129
Neural Networks Marcello Pelillo University of Venice, Italy Artificial Intelligence a.y. 2017/18

Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Neural Networks

Marcello Pelillo

University of Venice, Italy

Artificial Intelligencea.y. 2017/18

Page 2: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

DARPA Neural Network Study (1989)

“Over the history of computing science, two advances have matured: High speed numerical processing and knowledge processing (Artificial Intelligence).

Neural networks seem to offer the next necessary ingredient for intelligent machines − namely, knowledge formation and organization.”

Page 3: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

DARPA Neural Network Study (1989)

Two key features which, it is widely believed, distinguish neural networks from any other sort of computing developed thus far:

Neural networks are adaptive, or trainable. Neural networks are not so much programmed as they are trained with data − thus many believe that the use of neural networks can relieve today’s computer programmers of a significant portion of their present programming load. Moreover, neural networks are said to improve with experience − the more data they are fed, the more accurate or complete their response.

Neural networks are naturally massively parallel. This suggests they should be able to make decisions at high-speed and be fault tolerant.

Page 4: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

History

Early work (1940-1960)

• McCulloch & Pitts (Boolean logic)

• Rosenblatt (Learning)

• Hebb (Learning)

Transition (1960-1980)

• Widrow – Hoff (LMS rule)

• Anderson (Associative memories)

• Amari

Resurgence (1980-1990’s)

• Hopfield (Ass. mem. / Optimization)

• Rumelhart et al. (Back-prop)

• Kohonen (Self-organizing maps)

• Hinton , Sejnowski (Boltzmann machine)

New resurgence (2012 -)

• CNNs, Deep learning, GAN’s ….

Page 5: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

A Few Figures

The human cerebral cortex is composed of about

100 billion (1011) neurons

of many different types.

Each neuron is connected to other 1000 / 10000 neurons, wich yields

1014/1015 connections

The cortex covers about 0.15 m²and is 2-5 mm thick

Page 6: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Neuron

Cell Body (Soma): 5-10 microns in diameter

Axon: Output mechanism for a neuron; one axon/cell, but thousands of branches and cells possible for a single axon

Dendrites: Receive incoming signals from other nerve axons via synapse

Page 7: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Neural Dynamics

The transmission of signal in the cerebral cortex is a complex process:

electrical chemical electrical

Simplifying :

1) The cellular body performs a “weighted sum” of the incoming signals

2) If the result exceeds a certain threshold value, then it produces an

“action potential” which is sent down the axon (cell has “fired”),

otherwise it remains in a rest state

3) When the electrical signal reaches the synapse, it allows the

“neuro-transmitter” to be released and these combine with the

“receptors” in the post-synaptic membrane

4) The post-synaptic receptors provoke the diffusion of an electrical

signal in the post-synaptic neuron

Page 8: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been
Page 9: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Synapses

SYNAPSE is the relay point where information is conveyed by chemical transmitters from neuron

to neuron. A synapse consists of two parts: the knowblike tip of an axon terminal and the receptor

region on the surface of another neuron. The membranes are separated by a synaptic cleft some

200 nanometers across. Molecules of chemical transmitter, stored in vesicles in the axon terminal,

are released into the cleft by arriving nerve impulses. Transmitter changes electrical state of the

receiving neuron, making it either more likely or less likely to fire an impulse.

Page 10: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Synaptic Efficacy

It’s the amount of current that enters into the post-synaptic neuron,compared to the action potential of the pre-synaptic neuron.

Learning takes place by modifying the synaptic efficacy.

Two types of synapses:

• Excitatory : favor the generation of action potential

in the post-synaptic neuron

• Inhibitory : hinder the generation of action potential

Page 11: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The McCulloch and Pitts Model (1943)

The McCulloch-Pitts (MP) Neuron is modeled as a binary threshold unit

The unit “fires” if the net input reaches (or exceeds) the unit’s threshold T:

If neuron is firing, then its output y is 1, otherwise it is 0.

g is the unit step function:

Weights wij represent the strength of the synapse between neuron j and neuron i

w jj

å I j

y = g w jj

å I j -Tæ

èçç

ö

ø÷÷

g(x) =0 if x < 0

1 if x ³ 0

ìíî

Page 12: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Properties of McCulloch-Pitts Networks

By properly combining MP neurons one can simulate the behavior of anyBoolean circuit.

Three elementary logical operations (a) negation, (b) and, (c) or. In each diagram

the states of the neurons on the left are at time t and those on the right at time t +1.

The construction for the exclusive or

Page 13: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Network Topologies and Architectures

• Feedforward only vs. Feedback loop (Recurrent networks)

• Fully connected vs. sparsely connected

• Single layer vs. multilayer

Multilayer perceptrons, Hopfield networks,Boltzman machines, Kohonen networks, …

(a) A feedforward network and (b) a recurrent network

Page 14: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Classification Problems

Given :

1) some “features”:

2) some “classes”:

Problem :

To classify an “object” according to its features

n21fff ,....,,

m1cc ,....,

Page 15: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example #1

To classify an “object” as :

= “ watermelon ”

= “ apple ”

= “ orange ”

According to the following features :

= “ weight ”

= “ color ”

= “ size ”

Example :

weight = 80 g

color = green

size = 10 cm³

1c

2c

3c

1f

2f

3f

“apple”

Page 16: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example #2

Problem: Establish whether a patient got the flu

• Classes : { “ flu ” , “ non-flu ” }

• (Potential) Features :

: Body temperature

: Headache ? (yes / no)

: Throat is red ? (yes / no / medium)

:

1f

2f

3f

4f

Page 17: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example #3Hand-written digit recognition

Page 18: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example #4:Face Detection

Page 19: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example #5:Spam Detection

Page 20: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Geometric Interpretation

Example:

Classes = { 0 , 1 }

Features = x , y : both taking value in [ 0 , +∞ [

Idea: Objects are represented as “point” in a geometric space

Page 21: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Neural Networks for Classification

A neural network can be used as a classification device .

Input ≡ features values

Output ≡ class labels

Example : 3 features , 2 classes

Page 22: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Thresholds

We can get rid of the thresholds associated to neurons by adding an

extra unit permanently clamped at -1.

In so doing, thresholds become weights and can be adaptively adjusted

during learning.

Page 23: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Perceptron

A network consisting of one layer of M&P neurons connected in a

feedforward way (i.e. no lateral or feedback connections).

• Discrete output (+1 / -1)

• Capable of “learning” from examples (Rosenblatt)

• They suffer from serious computational limitations

Page 24: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Perceptron Learning Algorithm

Page 25: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Perceptron Learning Algorithm

Page 26: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Decision Regions

It’s an area wherein all examples of one class fall.

Examples:

Page 27: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Linear Separability

A classification problem is said to be linearly separable if the decision regions

can be separated by a hyperplane.

Example: ANDX Y X AND Y

0 0 0

0 1 0

1 0 0

1 1 1

Page 28: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Limitations of Perceptrons

It has been shown that perceptrons can only solve linearly separable

problems.

Example: XOR (exclusive OR)

X Y X XOR Y

0 0 0

0 1 1

1 0 1

1 1 0

Page 29: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Perceptron Convergence Theorem

Theorem (Rosenblatt, 1960)

If the training set is linearly separable, the perceptron learning algorithmalways converges to a consistent hypothesis after a finite number ofepochs, for any η > 0.

If the training set is not linearly separable, after a certain number ofepochs the weights start oscillating.

Page 30: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

A View of the Role of Units

Page 31: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Multi–Layer Feedforward Networks

• Limitation of simple perceptron: can implement only linearly separable functions

• Add “ hidden” layers between the input and output layer. A network with just one hidden layer can represent any Boolean functions including XOR

• Power of multilayer networks was known long ago, but algorithms for training or learning, e.g. back-propagation method, became available only recently (invented several times, popularized in 1986)

• Universal approximation power: Two-layer network can approximate any smooth function (Cybenko, 1989; Funahashi, 1989; Hornik, et al.., 1989)

• Static (no feedback)

Page 32: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Continuous-Valued Units

Sigmoid (or logistic)

Page 33: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Continuous-Valued Units

Hyperbolic tangent

Page 34: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Back-propagation Learning Algorithm

• An algorithm for learning the weights in a feed-forward network,

given a training set of input-output pairs

• The algorithm is based on gradient descent method.

Page 35: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Supervised Learning

Supervised learning algorithms require the presence of a “teacher” who

provides the right answers to the input questions.

Technically, this means that we need a training set of the form

where :

is the network input vector

is the desired network output

vector

L = x1, y1( ) , ..... xp, yp( ){ }

xm m =1… p( )

ym m =1… p( )

Page 36: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Supervised Learning

The learning (or training) phase consists of determining a configuration of

weights in such a way that the network output be as close as possible to the

desired output, for all the examples in the training set.

Formally, this amounts to minimizing an error function such as (not only

possible one):

where Okμ is the output provided by the output unit k when the network is

given example μ as input.

E =1

2k

åm

å ykm -Ok

m( )2

Page 37: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Back-Propagation

To minimize the error function E we can use the classic gradient-descent algorithm:

To compute the partial derivates we use the error back propagationalgorithm.

It consists of two stages:

Forward pass : the input to the network is propagated

layer after layer in forward direction

Backward pass : the “error” made by the network is

propagated backward, and weights

are updated properly

η = “learning rate”

Page 38: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Notations

Given pattern µ, hidden unit j receives a net input

and produces as output :

k

kjkj xwh

kkjkjj xwghgV

Page 39: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Back-Prop:Updating Hidden-to-Output Weights

DWij = -h¶E

¶Wij

= -h¶

¶Wij

1

2k

åm

å ykm -Ok

m( )2é

ëêê

ù

ûúú

=h ykm -Ok

m( )k

åm

å¶Ok

m

¶Wij

=h yim -Oi

m( )m

å¶Oi

m

¶Wij

=h yim -Oi

m( )m

å g ' him( ) Vjm

=h dim

m

å Vjm iiii hgOy ' :where

E =1

2k

åm

å ykm -Ok

m( )2

Page 40: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Back-Prop:Updating Input-to-Hidden Weights (1)

jk

iiii

i

jk

i

iii

jk

jk

w

hh'gOy

w

OOy

w

Ew

jk

j

jij

jk

j

ij

jk

j

ij

jk

l

lil

jk

i

w

hh'gW

w

hgW

w

VW

w

VW

w

h

E =1

2k

åm

å ykm -Ok

m( )2

Page 41: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Back-Prop:Updating Input-to-Hidden Weights (2)

Hence, we get:

k

mm

jm

jkjk

j

x

xwww

h

kj

kjij

i

i

kjiji

i

iijk

x

xhgW

xhgWhgOyw

ˆ

'

''

,

,

ij

i

ijj Whg 'ˆ :where

Page 42: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Error Back-Propagation

Page 43: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Locality of Back-Prop

Page 44: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Back-Propagation Algorithm

Page 45: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Back-Propagation Algorithm

Page 46: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Role of the Learning Rate

Page 47: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Momentum Term

Gradient descent may:

• Converge too slowly if η is too small

• Oscillate if η is too large

Simple remedy:

The momentum term allows us to use large values of η thereby avoidingoscillatory phenomena

Typical choice: α = 0.9, η = 0.5

Page 48: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Momentum Term

Page 49: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Problem of Local Minima

Back-prop cannot avoid local minima.

Choice of initial weights is important.

If they are too large the nonlinearities tend to saturate since the beginning ofthe learning process.

Page 50: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

NETtalk

Page 51: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

NETtalk

Page 52: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

• A network to pronounce English text

• 7 x 29 (=203) input units

• 1 hidden layer with 80 units

• 26 output units encoding phonemes

• Trained by 1024 words in context

• Produce intelligible speech after 10 training epochs

• Functionally equivalent to DEC-talk

• Rule-based DEC-talk was the result of a decade effort by many linguists

• NETtalk learns from examples and, require no linguistic knowledge

NETtalk

Page 53: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Theoretical / Practical Questions

▪ How many layers are needed for a given task?

▪ How many units per layer?

▪ To what extent does representation matter?

▪ What do we mean by generalization?

▪ What can we expect a network to generalize?

• Generalization: performance of the network on data not included in the training set

• Size of the training set: how large a training set should be for “good” generalization?

• Size of the network: too many weights in a network result in poor generalization

Page 54: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

True vs Sample Error

The true error is unknown (and will remain so forever…).On which sample should I compute the sample error?

Page 55: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Training vs Test Set

Page 56: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Cross-validation

Leave-one-out: using as many test folds as there are examples (size of test fold = 1)

Page 57: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

(a) A good fit to noisy data.(b) Overfitting of the same data: the fit is perfect on the

“training set” (x’s), but is likely to be poor on “test set” represented by the circle.

Overfitting

Page 58: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Early Stopping

Page 59: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

• The size (i.e. the number of hidden units) of an artificial

neural network affects both its functional capabilities and

its generalization performance

• Small networks could not be able to realize the desired

input / output mapping

• Large networks lead to poor generalization performance

Size Matters

Page 60: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Pruning Approach

Train an over-dimensioned net and then remove redundant nodes

and / or connections:

• Sietsma & Dow (1988, 1991)

• Mozer & Smolensky (1989)

• Burkitt (1991)

Adavantages:

• arbitrarily complex decision regions

• faster training

• independence of the training algorithm

Page 61: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

An Iterative Pruning Algorithm

Consider (for simplicity) a net with one hidden layer:

Suppose that unit h is to be removed:

IDEA: Remove unit h (and its in/out connections) and adjust the

remaining weights so that the I/O behavior is the same

G. Castellano, A. M. Fanelli, and M. Pelillo, An iterative pruning algorithm for feedforward neural networks, IEEETransactions on Neural Networks 8(3):519-531, 1997.

Page 62: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

This is equivalent to solving the system:

before after

which is equivalent to the following linear system (in the unknown δ’s):

wijj=1

nh

å y j(m ) = wij +dij( )

j=1j¹h

nh

å y j(m )

)()( hihj

hj

ij ywy

i =1… nO , m =1… P

i =1… nO , m =1… P

An Iterative Pruning Algorithm

Page 63: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

In a more compact notation:

where

But solution does not always exists.

Least-square solution :

Ax = b

AÎ ÂPno´ no nh-1( )

minx

Ax-b

An Iterative Pruning Algorithm

Page 64: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Detecting Excessive Units

• Residual-reducing methods for LLSPs start with an initial solution

x0 and produces a sequences of points {xk} so that the residuals

decrease:

• Starting point:

• Excessive units can be detected so that is minimum

kk rbAx

1 kk rr

brx 00 0

b

Page 65: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Pruning Algorithm

1) Start with an over-sized trained network

2) Repeat

2.1) find the hidden unit h for which is minimum

2.2) solve the corresponding system

2.3) remove unit h

Until Perf(pruned) – Perf(original) < epsilon

3) Reject the last reduced network

b

Page 66: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example: 4-bit parity

Ten initial 4-10-1 networks

nine 4-5-1

Pruned nets 5 hidden nodes (average)

one 4-4-1

0

10

20

30

40

50

60

70

80

90

100

10 9 8 7 6 5 4 3 2 1

number of hidden units

rec

og

nit

ion

ra

te (

%)

0

0,05

0,1

0,15

0,2

0,25

MS

E

recognition rate

MSE

MINIMUM

NET

Page 67: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Example: 4-bit simmetry

Ten initial 4-10-1 networks

Pruned nets 4.6 hidden nodes (average)

0

10

20

30

40

50

60

70

80

90

100

10 9 8 7 6 5 4 3 2 1

number of hidden units

rec

og

nit

ion

ra

te (

%)

0

0,05

0,1

0,15

0,2

0,25

MS

E

recognition rate

MSE

MINIMUM

NET

Page 68: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Deep Neural Networks

Page 69: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Age of “Deep Learning”

Page 70: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Deep Learning “Philosophy”

• Learn a feature hierarchy all the way from pixels to classifier

• Each layer extracts features from the output of previous layer

• Train all layers jointly

Page 71: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Shallow vs Deep Networks

From. R. E. Turner

Shallow architectures are inefficient at representing deep functions

Page 72: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Performance Improves with More Data

Page 73: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Old Idea… Why Now?

1. We have more data - from Lena to ImageNet.

1. We have more computing power, GPUs are really good at this.

1. Last but not least, we have new ideas

Page 74: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Image Classification

Predict a single label (or a distribution over labels as shown here to indicate our confidence)for a given image. Images are 3-dimensional arrays of integers from 0 to 255, of size Width xHeight x 3. The 3 represents the three color channels Red, Green, Blue.

From: A. Karpathy

Page 75: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Challenges

From: A. Karpathy

Page 76: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Data-Driven Approach

An example training set for four visual categories.

In practice we may have thousands of categories andhundreds of thousands of images for each category. From: A. Karpathy

Page 77: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Inspiration from Biology

From. R. E. Turner

Biological vision is hierachically organized

Page 78: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Hierarchy of Visual Areas

From. D. Zoccolan

Page 79: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Retina

Page 80: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Retina

Page 81: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Receptive Fields

“The region of the visual field in which light stimuli evoke responses ofa given neuron.”

Page 82: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Cellular Recordings

Kuffler, Hubel, Wiesel, …

1953: Discharge patterns and functional organization of mammalian retina

1959: Receptive fields of single neurones in the cat's striate cortex

1962: Receptive fields, binocular interaction and functional architecture in the cat's visual cortex

1968 ..

Page 83: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Retinal Ganglion Cell Response

Page 84: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Beyond the Retina

Page 85: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Simple Cells

Orientation selectivity: Most V1 neurons are orientation selective meaning that theyrespond strongly to lines, bars, or edges of a particular orientation (e.g., vertical) butnot to the orthogonal orientation (e.g., horizontal).

Page 86: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Complex Cells

Page 87: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Hypercomplex Cells (end-stopping)

Page 88: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Take-Home Message:Visual System as a Hierarchy of Feature Detectors

Page 89: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Convolution

Page 90: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Convolution

Page 91: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Mean Filters

Page 92: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Gaussian Filters

Page 93: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Gaussian Filters

Page 94: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Effect of Gaussian Filters

Page 95: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

The Effect of Gaussian Filters

Page 96: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Kernel Width Affects Scale

Page 97: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Edge detection

Page 98: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Edge detection

Page 99: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Using Convolution for Edge Detection

Page 100: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

A Variety of Image Filters

Laplacian of Gaussians (LoG) (Marr 1982)

Page 101: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Gabor filters (directional) (Daugman 1985)

A Variety of Image Filters

Page 102: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

From: M. Sebag

A Variety of Image Filters

Page 103: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

From: M. Sebag

Traditional vs Deep Learning Approach

Page 104: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Convolutional Neural Networks (CNNs)

(LeCun 1998)

(Krizhevsky et al. 2012)

Page 105: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Fully- vs Locally-Connected Networks

From. M. A. Ranzato

Fully-connected: 400,000 hidden units = 16 billion parameters

Locally-connected: 400,000 hidden units 10 x 10 fields = 40 million parameters

Local connections capture local dependencies

Page 106: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Using Several Trainable Filters

Normally, several filters are packed together and learnt automaticallyduring training

Page 107: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Pooling

Max pooling is a way to simplify the network architecture, bydownsampling the number of neurons resulting from filtering operations.

Page 108: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Combining Feature Extraction and Classification

Page 109: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

AlexNet (2012)

• 8 layers total

• Trained on Imagenet Dataset (1000 categories, 1.2M training images, 150k test images)

Page 110: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

AlexNet Architecture

• 1st layer: 96 kernels (11 x 11 x 3)• Normalized, pooled• 2nd layer: 256 kernels (5 x 5 x 48)• Normalized, pooled• 3rd layer: 384 kernels (3 x 3 x 256)• 4th layer: 384 kernels (3 x 3 x 192)• 5th layer: 256 kernels (3 x 3 x 192)• Followed by 2 fully connected layers, 4096 neurons each• Followed by a 1000-way SoftMax layer

650,000 neurons60 million parameters

Page 111: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Training on Multiple GPU’s

Page 112: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Rectified Linear Units (ReLU’s)

Problem: Sigmoid activation takes on values in (0,1). Propagating thegradient back to the initial layers, it tends to become 0 (vanishinggradient problem).

From a practical perspective, this slows down the training procedure of the initial layers of the network.

Page 113: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Rectified Linear Units (ReLU’s)

Page 114: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Mini-batch Stochastic Gradient Descent

Loop:

1. Sample a batch of data

2. Forward prop it through the graph, get loss

3. Backprop to calculate the gradients

4. Update the parameters using the gradient

Page 115: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Data Augmentation

The easiest and most common method to reduce overfitting onimage data is to artificially enlarge the dataset using label-preservingtransformations

AlexNet uses two forms of this data augmentation.

• The first form consists of generating image translations andhorizontal reflections.

• The second form consists of altering the intensities of the RGBchannels in training images.

Page 116: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Dropout

Set to zero the output of each hidden neuron with probability 0.5.

The neurons which are “dropped out” in this way do not contribute tothe forward pass and do not participate in backpropagation.

So every time an input is presented, the neural network samples adifferent architecture, but all these architectures share weights.

Reduces complex co-adaptations of neurons,since a neuron cannotrely on the presence ofparticular other neurons.

Page 117: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

ImageNet

Page 118: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

ImageNet Challenges

Page 119: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

ImageNet Challenge 2012

Page 120: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Revolution of Depth

Page 121: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

A Hierarchy of Features

From: B. Biggio

Page 122: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Layer 1

Each 3x3 block showsthe top 9 patches forone filter

Page 123: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Layer 2

Page 124: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Layer 3

Page 125: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Layer 3

Page 126: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been
Page 127: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

• A well-trained ConvNet is an excellent feature extractor.

• Chop the network at desired layer and use the output as a feature representation to train an SVM on some other dataset (Zeiler-Fergus 2013):

• Improve further by taking a pre-trained ConvNet and re-training it on a different dataset (Fine tuning).

Feature Analysis

Page 128: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

Today deep learning, in its several manifestations, is being appliedin a variety of different domains besides computer vision, such as:

• Speech recognition

• Optical character recognition

• Natural language processing

• Autonomous driving

• Game playing (e.g., Google’s AlphaGo)

• …

Other Success Stories of Deep Learning

Page 129: Presentazione di PowerPointpelillo/Didattica/Artificial... · many believe that the use of neural networks can relieve today’s ... = “ apple ” = “ orange ” ... It has been

References

• http://neuralnetworksanddeeplearning.com

• http://deeplearning.stanford.edu/tutorial/

• http://www.deeplearningbook.org/

• http://deeplearning.net/

Platforms:

• Theano

• Torch

• TensorFlow

• …