Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation

$Page 1: Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation$
Artificial Intelligence Krzysztof Ślot, 2008

Artificial Intelligence

Krzysztof Ślot

Institute of Applied Computer Science,

Technical University of Lodz, Poland

Introduction to computing

with neural networks –

feedforward nets


Introduction

To solve “hard” problems – computationally intense, w/unclear rules etc.

To mimic “higher” brain functions - recognition, classification etc.

Sample

recognition

task

Motivation


Neurons are living cells that consume energy to operate


• Another illusion - same mechanism

– reveals three-channel visual input

Researching neural networks


Synapse

Synapse

Dendrites Cell

Axon

Ranvier gap

Shield

Nucleus

Biological reference

Architecture of a neuron

1 ms 0 V

-70 mV

Activity Refraction

t

Neuron’s operation

Firing frequency is

proportional to total

excitation

Simple multi-input,

single-output unit

Neural Networks: background


Modeling a neuron

• Physical modeling

– Pulse propagation phenomenon

Hodgkin-Huxley model (Nobel prize)

• Functional modeling

– Of interest to AI: provides means for simulating/emulating neural nets


x1

x2

xn

1

Activation function

f s

y = f (wTx`)

Mc Culloch-Pitts model

S

w2

wn

w1

w0= T

f Linear

Activation functions

f O

1 f(x)

x

Non-linear,

differentiable s

s

e

esthsf

1

1)()(

Hyperbolic

tangent

)1...1()( sf

sesf

1

1)(

Sigmoid

)1...0()( sf

f Non-linear, non-

differentiable

Sign

)1...1()( sf

Step

f(s) = 1(s) f(s) = sgn(s)

)1...0()( sf

)()(01

n

i

ii

n

i

ii xwfxwfy

Perceptron


Size

Distance

– Assume only two inputs - activation: linear equation 2211 xwxws

5.2,1,1 21 ww

05.2),( 2121 xxxxf

Sample parameters:

A line

0)(,3.0

2.0

sfBx y =0 B

A

0)(,5.1

3

sfAx y =1

Neuron’s function

• How to interpret neuron’s outcome?

– Outcome: assessment of activation level (dot product)

s

ssfy

1

0)(

• Interpretation

– Outcome: a decision (e.g. hunt or not)

– Data classification

Frog’s world


Supervised learning algorithms

Training neural networks

Determine weights which provide desired

network operation

Objective

:

x0 w0

xi f

y i = 1

xn

wn

i

i

d i = 0

Training vector set : { x i}

Desired network responses: { d i }

Actual neuron outputs: { y i }

Given

...

di Expected output for

“i”-th training vector Actual output

...

yi

ei = c ( d i - y i ) 2 Error k=0

y = f ( xk wk ) n

ei = f(w)


ei = c ( d i - y i ) 2 y = f(s) = f (wTx)

k

ik

w

ew

Gradient-descent methods

(differentiable error function)

Supervised learning

Basic idea – adjust weights to minimize an error

k

i

k

ik

w

s

s

y

y

e

w

ew

i

i

i

i

i

k

iii

k xsfydw )(')(


Non-linear activation functions

sesf

β21

1)(

Sigmoid

i

k

ii

k xydsfw ))(('

y = S wlxl

Delta - rule

Linear activation function

k

ii

k

ik

w

yd

w

ew

2)(

i

k

ii

k xydw )(k

n

l

i

llii

k

i

w

xw

ydηw

s

s

y

y

eη

i

i

i

i

11)('

)1(β2)(' ffsf

i

k

ii

k xffydw )1()(βη2

No differentiation

required !

i

k

ii

k xydw )(Step and sign functions


x0

x1

1

iii yd xw )(

0

xi

wT(i-1)

Direction of change: vector xi

w

wT(i)

Delta - rule learning - geometrical interpretation

Note: di - y i is negative


Data classification

Application domain Linearly-separable tasks

Minsky, Pappert (1963): recess in ANN research

Neuron function

?


x0

x1

y0

y1

y

Multi-layer ANN

+ =

x0 x1

y0 0 1 1 y1

0 0 1


Neuron function - linearly separable

function - logical OR or AND Binary input

y0 yn

Y

yi

x0 x1

x0

x1

Decision

region

1

0

1 0

0

1 1

0

Multi-layer ANN

Decision regions for 2D NN are convex


ML ANN decision regions in classification tasks

Source: Lippman “Introduction to neural computeing”

Multi-layer ANN


Data processing in multi-layer ANNs

X

Input layer

neurons 1,,1

1

111 , Ni

M

k

kkiii xwfsfy

Input vector

N NN M

j

M

k

jk

N

ji

M

j

N

j

N

ji

N

i

N

i fwfwfywfsfy0 00

11

)(Output layer

neurons

:

Y


x k x

1 x

O ....

f f f f

-1

f/l f/l Output

layer

Y 1 Y M .......................

-1 y 1

Input

layer

......

yj

yN

wjk

xk

Wij

Yi

Learning in multi-layer ANNs - notation


i

j jY

1Y

mYDx

1x

ny

1y

kx

MLP Training

• Supervised setup

• Criterion: mean-squared error (MSE)

ik

ikw

Ew

,

,

jiW ,ikw ,

2)( jj

j Yt ij

j

ji ysYW )(',

Weight update for output neurons: delta rule

For hidden units the error cannot be directly estimated

Solution: basic calculus ?i

2

11

)(

YtEmm

2

1

)(

YtEn

)( ,

1

yWfYn

)( ,

1

xwfyn

)( ,ikwgE

ki ,

21


MLP training

Derivative for compound function: chain rule

ki

i

ik xsfw )(',

ik

i

iik w

y

y

Y

Y

E

w

E

,,

)(', SfW

y

Yi

i

)('

,

ik

ik

i sfxw

y

ik

ikw

Ew

,

,

E )( ,

1

yWfYn

)( ,

1

xwfyn

2

1

)(

YtEn

)(21

YtY

E n

kii

m

ik xsfSfWYtw )(')(')( ,

1

,

Weight update interpretation

Analogous to delta rule

Error: back-projected from the upper layer

)(')( ,

1

SfWYt i

mi

i

f’(.) jE

1E

mEmiW ,

f’(.)

f’(.)

1,iW

Error Back-Propagation algorithm (BP) Krzysztof Ślot: Głębokie sieci

neuronowe

22


)()( px frfy

Radial functions – distance-dependent output

Typical example: Gaussian

)()( 1mxmx S

t

eCy

Net’s architecture

w1 wN Ouput unit (linear) i

ii yWY

RBF units

x Input vector

Networks with radial-basis units


Regression (function approximation)

x

y

:

x y

1

2

3

Linear

neuron

Feed-forward ANN applications

RBF

+ +

+

x

y Neuron 1

Neuron 2

Neuron 3 :

+

+

+

x

y Neuron 1

Neuron 2

Neuron 3 :

Sigmoid


Network training

• Parameters to be determined

– Number of hidden neurons – number of approximating functions

– RBF function parameters (means, covariance matrices) if RBF neurons are

used in the hidden layer

– Sigmoid parameters if sigmoid units are used

– Weights of the output neuron

• Learning strategies

– Supervised

– Mixed: unsupervised learning of hidden-layer units’ parameters, supervised

learning of output weights


N

k

i

k

i

ki YdE1

2)(

M

j

jjkk wY

1

)( μx

N – number of output units

M – number of hidden units

i – training sample index

M

j

μx

jkkj

j

ewY

1

)(

2

2

RBF supervised training

• Training criterion

– To minimize approximation error

• Sample network

– 1D RBF (e.g. Gaussian)

– One output unit

2

2)(

2)( s

si μx

s

s

i

s

ii

s

i

i

ii

s eμx

wYdY

Y

Ec

is

RBF parameters

iii

x

ii

s

i

i

ii

s YYdeYdw

Y

Y

Ecw s

s

)()(2

2)(

Output weights

Gradient descent approach


Feedforward NNs: problems

• Overfitting

– For overly complex net and insufficient amount of data, a model learns

training samples, not a rule. A model should generalize well

20 training samples

5 hidden RBF units 50 hidden RBF units

Overfitting


Limitations of ML-FF ANNs

• Local minima

– Only if error function is convex one can expect correct training outcome

(gradient descent gets us to the minimum). Unfortunately, error functions for

multiple-layer fedforward ANNs are rarely convex …

– Possible ways to alleviate the problem:

• Boltzman machines

• Multiple initial points

• Regularization

• Overfitting

– If learning set is not significantly larger than parameter set, network learns

examples not the rule (there are many well-fitting units)

• Capabilities of multiple-layer networks trained using BP algorithm

and its descendendants for solving real-life problems are limited

… Face … Face


Summary of multilayer feed-forward ANNs

• Drawbacks

– Learning is a challenge: local minima of error function result in non-optimal

solutions as gradient-descent methods cannot find global minima of non-

monotonous functions: possible solution – stochastic methods (simulated

annealing – global minima search)

– Convergence speed of BP algorithm: possible solution: consider second

order derivatives in error approximation (Levenbergh-Marquardt)

– Fundamental difficulties with VLSI implementations of nets

– ANNs are hard to analyze (feedback nets)

• Advantages

– Theoretically, capable of solving hard problems

– Extremely fast execution (if implemented in hardware, but also, if simulated)

– Can constantly learn and improve, even after deployment

• Practical applications

– Rare …

– Until recently …


Deep Neural Networks and Deep Learning

• Deep neural networks: breakthrough in performance of intelligent data

processing

– Recognition of contents of Rn data: images (object recognition, scene

analysis, image classification)

– Recognition of contents of Rn data sequences: video (action recognition),

speech (recognition, trascription, translation), NLP (document classification,

analysis)

– Generation of Rn data: image objects, textures

– Generation of Rn data sequences: control, description, speech


Recognition

Humans CNN

Accuracy 96% 99.6 %

Humans CNN

Accuracy 82% 86.1 %

Categories: 40, examples: 30 000 Categories: 100, examples: 400 000

39

• Classification of image objects

– DNN perform better than humans


Recognition and generation

Application: autonomous vehicles

Scene understanding, vehicle control

Nvidia https://www.youtube.com/watch?v=qhUvQiKec2U 40


Generation

Robot motion control

Boston Dynamics: https://www.youtube.com/watch?v=-e9QzIkP5qI 41


DCGAN creations Style: Van Gogh

Painting style

Learning abstract concepts Input image

http://www.boredpanda.com/computer-

deep-learning-algorithm-painting-masters/

Style: Munch


Convolutional Neural Networks

• Automated image annotation


Deep Learning and Convolutional Neural Networks

• Deep

– Multiple layers (dozens, hundreds, thousands)

– Huge amounts of parameters

– Appropriate measures for training

Conv 1

ReLU

Filter Si Filter Sj

Pooling

1 - MAX

Conv 2

ReLu

Pooling

2 - MAX

Conv n

ReLU Data

O

u

t

p

u

t

Fully-connected ANN

• Convolutional neural networks

Documents

Bez tytułu slajdukslot.iis.p.lodz.pl/AI/nn1.pdf · 2018-11-07 · 1 ms 0 V -70 mV Activity Refraction t Neuron’s operation Firing frequency is proportional to total excitation