Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Artificial Intelligence Krzysztof Ślot, 2008
Artificial Intelligence
Krzysztof Ślot
Institute of Applied Computer Science,
Technical University of Lodz, Poland
Introduction to computing
with neural networks –
feedforward nets
Artificial Intelligence Krzysztof Ślot, 2008
Introduction
To solve “hard” problems – computationally intense, w/unclear rules etc.
To mimic “higher” brain functions - recognition, classification etc.
Sample
recognition
task
Motivation
Artificial Intelligence Krzysztof Ślot, 2008
Neurons are living cells that consume energy to operate
Artificial Intelligence Krzysztof Ślot, 2008
• Another illusion - same mechanism
– reveals three-channel visual input
Researching neural networks
Artificial Intelligence Krzysztof Ślot, 2008
Synapse
Synapse
Dendrites Cell
Axon
Ranvier gap
Shield
Nucleus
Biological reference
Architecture of a neuron
1 ms 0 V
-70 mV
Activity Refraction
t
Neuron’s operation
Firing frequency is
proportional to total
excitation
Simple multi-input,
single-output unit
Neural Networks: background
Artificial Intelligence Krzysztof Ślot, 2008
Modeling a neuron
• Physical modeling
– Pulse propagation phenomenon
Hodgkin-Huxley model (Nobel prize)
• Functional modeling
– Of interest to AI: provides means for simulating/emulating neural nets
Artificial Intelligence Krzysztof Ślot, 2008
x1
x2
xn
1
Activation function
f s
y = f (wTx`)
Mc Culloch-Pitts model
S
w2
wn
w1
w0= T
f Linear
Activation functions
f O
1 f(x)
x
Non-linear,
differentiable s
s
e
esthsf
1
1)()(
Hyperbolic
tangent
)1...1()( sf
sesf
1
1)(
Sigmoid
)1...0()( sf
f Non-linear, non-
differentiable
Sign
)1...1()( sf
Step
f(s) = 1(s) f(s) = sgn(s)
)1...0()( sf
)()(01
n
i
ii
n
i
ii xwfxwfy
Perceptron
Artificial Intelligence Krzysztof Ślot, 2008
Size
Distance
– Assume only two inputs - activation: linear equation 2211 xwxws
5.2,1,1 21 ww
05.2),( 2121 xxxxf
Sample parameters:
A line
0)(,3.0
2.0
sfBx y =0 B
A
0)(,5.1
3
sfAx y =1
Neuron’s function
• How to interpret neuron’s outcome?
– Outcome: assessment of activation level (dot product)
s
ssfy
1
0)(
• Interpretation
– Outcome: a decision (e.g. hunt or not)
– Data classification
Frog’s world
Artificial Intelligence Krzysztof Ślot, 2008
Supervised learning algorithms
Training neural networks
Determine weights which provide desired
network operation
Objective
:
x0 w0
xi f
y i = 1
xn
wn
i
i
d i = 0
Training vector set : { x i}
Desired network responses: { d i }
Actual neuron outputs: { y i }
Given
...
di Expected output for
“i”-th training vector Actual output
...
yi
ei = c ( d i - y i ) 2 Error k=0
y = f ( xk wk ) n
ei = f(w)
Artificial Intelligence Krzysztof Ślot, 2008
ei = c ( d i - y i ) 2 y = f(s) = f (wTx)
k
ik
w
ew
Gradient-descent methods
(differentiable error function)
Supervised learning
Basic idea – adjust weights to minimize an error
k
i
k
ik
w
s
s
y
y
e
w
ew
i
i
i
i
i
k
iii
k xsfydw )(')(
Artificial Intelligence Krzysztof Ślot, 2008
Non-linear activation functions
sesf
β21
1)(
Sigmoid
i
k
ii
k xydsfw ))(('
y = S wlxl
Delta - rule
Linear activation function
k
ii
k
ik
w
yd
w
ew
2)(
i
k
ii
k xydw )(k
n
l
i
llii
k
i
w
xw
ydηw
s
s
y
y
eη
i
i
i
i
11)('
)1(β2)(' ffsf
i
k
ii
k xffydw )1()(βη2
No differentiation
required !
i
k
ii
k xydw )(Step and sign functions
Artificial Intelligence Krzysztof Ślot, 2008
x0
x1
1
iii yd xw )(
0
xi
wT(i-1)
Direction of change: vector xi
w
wT(i)
Delta - rule learning - geometrical interpretation
Note: di - y i is negative
Artificial Intelligence Krzysztof Ślot, 2008
Data classification
Application domain Linearly-separable tasks
Minsky, Pappert (1963): recess in ANN research
Neuron function
?
Artificial Intelligence Krzysztof Ślot, 2008
x0
x1
y0
y1
y
Multi-layer ANN
+ =
x0 x1
y0 0 1 1 y1
0 0 1
Artificial Intelligence Krzysztof Ślot, 2008
Neuron function - linearly separable
function - logical OR or AND Binary input
y0 yn
Y
yi
x0 x1
x0
x1
Decision
region
1
0
1 0
0
1 1
0
Multi-layer ANN
Decision regions for 2D NN are convex
Artificial Intelligence Krzysztof Ślot, 2008
ML ANN decision regions in classification tasks
Source: Lippman “Introduction to neural computeing”
Multi-layer ANN
Artificial Intelligence Krzysztof Ślot, 2008
Data processing in multi-layer ANNs
X
Input layer
neurons 1,,1
1
111 , Ni
M
k
kkiii xwfsfy
Input vector
N NN M
j
M
k
jk
N
ji
M
j
N
j
N
ji
N
i
N
i fwfwfywfsfy0 00
11
)(Output layer
neurons
:
Y
Artificial Intelligence Krzysztof Ślot, 2008
x k x
1 x
O ....
f f f f
-1
f/l f/l Output
layer
Y 1 Y M .......................
-1 y 1
Input
layer
......
yj
yN
wjk
xk
Wij
Yi
Learning in multi-layer ANNs - notation
Artificial Intelligence Krzysztof Ślot, 2008
i
j jY
1Y
mYDx
1x
ny
1y
kx
MLP Training
• Supervised setup
• Criterion: mean-squared error (MSE)
ik
ikw
Ew
,
,
jiW ,ikw ,
2)( jj
j Yt ij
j
ji ysYW )(',
Weight update for output neurons: delta rule
For hidden units the error cannot be directly estimated
Solution: basic calculus ?i
2
11
)(
YtEmm
2
1
)(
YtEn
)( ,
1
yWfYn
)( ,
1
xwfyn
)( ,ikwgE
ki ,
21
Artificial Intelligence Krzysztof Ślot, 2008
MLP training
Derivative for compound function: chain rule
ki
i
ik xsfw )(',
ik
i
iik w
y
y
Y
Y
E
w
E
,,
)(', SfW
y
Yi
i
)('
,
ik
ik
i sfxw
y
ik
ikw
Ew
,
,
E )( ,
1
yWfYn
)( ,
1
xwfyn
2
1
)(
YtEn
)(21
YtY
E n
kii
m
ik xsfSfWYtw )(')(')( ,
1
,
Weight update interpretation
Analogous to delta rule
Error: back-projected from the upper layer
)(')( ,
1
SfWYt i
mi
i
f’(.) jE
1E
mEmiW ,
f’(.)
f’(.)
1,iW
Error Back-Propagation algorithm (BP) Krzysztof Ślot: Głębokie sieci
neuronowe
22
Artificial Intelligence Krzysztof Ślot, 2008
)()( px frfy
Radial functions – distance-dependent output
Typical example: Gaussian
)()( 1mxmx S
t
eCy
Net’s architecture
w1 wN Ouput unit (linear) i
ii yWY
RBF units
x Input vector
Networks with radial-basis units
Artificial Intelligence Krzysztof Ślot, 2008
Regression (function approximation)
x
y
:
x y
1
2
3
Linear
neuron
Feed-forward ANN applications
RBF
+ +
+
x
y Neuron 1
Neuron 2
Neuron 3 :
+
+
+
x
y Neuron 1
Neuron 2
Neuron 3 :
Sigmoid
Artificial Intelligence Krzysztof Ślot, 2008
Network training
• Parameters to be determined
– Number of hidden neurons – number of approximating functions
– RBF function parameters (means, covariance matrices) if RBF neurons are
used in the hidden layer
– Sigmoid parameters if sigmoid units are used
– Weights of the output neuron
• Learning strategies
– Supervised
– Mixed: unsupervised learning of hidden-layer units’ parameters, supervised
learning of output weights
Artificial Intelligence Krzysztof Ślot, 2008
N
k
i
k
i
ki YdE1
2)(
M
j
jjkk wY
1
)( μx
N – number of output units
M – number of hidden units
i – training sample index
M
j
μx
jkkj
j
ewY
1
)(
2
2
RBF supervised training
• Training criterion
– To minimize approximation error
• Sample network
– 1D RBF (e.g. Gaussian)
– One output unit
2
2)(
2)( s
si μx
s
s
i
s
ii
s
i
i
ii
s eμx
wYdY
Y
Ec
is
RBF parameters
iii
x
ii
s
i
i
ii
s YYdeYdw
Y
Y
Ecw s
s
)()(2
2)(
Output weights
Gradient descent approach
Artificial Intelligence Krzysztof Ślot, 2008
Feedforward NNs: problems
• Overfitting
– For overly complex net and insufficient amount of data, a model learns
training samples, not a rule. A model should generalize well
20 training samples
5 hidden RBF units 50 hidden RBF units
Overfitting
Artificial Intelligence Krzysztof Ślot, 2008
Limitations of ML-FF ANNs
• Local minima
– Only if error function is convex one can expect correct training outcome
(gradient descent gets us to the minimum). Unfortunately, error functions for
multiple-layer fedforward ANNs are rarely convex …
– Possible ways to alleviate the problem:
• Boltzman machines
• Multiple initial points
• Regularization
• Overfitting
– If learning set is not significantly larger than parameter set, network learns
examples not the rule (there are many well-fitting units)
• Capabilities of multiple-layer networks trained using BP algorithm
and its descendendants for solving real-life problems are limited
… Face … Face
Artificial Intelligence Krzysztof Ślot, 2008
Summary of multilayer feed-forward ANNs
• Drawbacks
– Learning is a challenge: local minima of error function result in non-optimal
solutions as gradient-descent methods cannot find global minima of non-
monotonous functions: possible solution – stochastic methods (simulated
annealing – global minima search)
– Convergence speed of BP algorithm: possible solution: consider second
order derivatives in error approximation (Levenbergh-Marquardt)
– Fundamental difficulties with VLSI implementations of nets
– ANNs are hard to analyze (feedback nets)
• Advantages
– Theoretically, capable of solving hard problems
– Extremely fast execution (if implemented in hardware, but also, if simulated)
– Can constantly learn and improve, even after deployment
• Practical applications
– Rare …
– Until recently …
Artificial Intelligence Krzysztof Ślot, 2008
Deep Neural Networks and Deep Learning
• Deep neural networks: breakthrough in performance of intelligent data
processing
– Recognition of contents of Rn data: images (object recognition, scene
analysis, image classification)
– Recognition of contents of Rn data sequences: video (action recognition),
speech (recognition, trascription, translation), NLP (document classification,
analysis)
– Generation of Rn data: image objects, textures
– Generation of Rn data sequences: control, description, speech
Artificial Intelligence Krzysztof Ślot, 2008
Recognition
Humans CNN
Accuracy 96% 99.6 %
Humans CNN
Accuracy 82% 86.1 %
Categories: 40, examples: 30 000 Categories: 100, examples: 400 000
39
• Classification of image objects
– DNN perform better than humans
Artificial Intelligence Krzysztof Ślot, 2008
Recognition and generation
Application: autonomous vehicles
Scene understanding, vehicle control
Nvidia https://www.youtube.com/watch?v=qhUvQiKec2U 40
Artificial Intelligence Krzysztof Ślot, 2008
Generation
Robot motion control
Boston Dynamics: https://www.youtube.com/watch?v=-e9QzIkP5qI 41
Artificial Intelligence Krzysztof Ślot, 2008
DCGAN creations Style: Van Gogh
Painting style
Learning abstract concepts Input image
http://www.boredpanda.com/computer-
deep-learning-algorithm-painting-masters/
Style: Munch
Artificial Intelligence Krzysztof Ślot, 2008
Convolutional Neural Networks
• Automated image annotation
Artificial Intelligence Krzysztof Ślot, 2008
Deep Learning and Convolutional Neural Networks
• Deep
– Multiple layers (dozens, hundreds, thousands)
– Huge amounts of parameters
– Appropriate measures for training
Conv 1
ReLU
Filter Si Filter Sj
Pooling
1 - MAX
Conv 2
ReLu
Pooling
2 - MAX
Conv n
ReLU Data
O
u
t
p
u
t
Fully-connected ANN
• Convolutional neural networks