36
8. Lecture Neural Networks Learning Process Soft Control (AT 3, RMA)

8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

8. Lecture

Neural Networks

Learning Process

Soft Control

(AT 3, RMA)

Page 2: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

198 WS 17/18 Georg Frey

Contents of the 8th lecture

1. Introduction of Soft Control: Definition and Limitations, Basics of

“Intelligent" Systems

2. Knowledge representation and Knowledge Processing (Symbolic AI)

Application: Expert Systems

3. Fuzzy-Systems: Dealing with Fuzzy Knowledge

Application : Fuzzy-Control

4. Connective Systems: Neuronale Networks

Application: Identification and neural Control

1. Basics

2. Learning

5. Genetic Algorithms: Stochastic Optimization

Application: Optimization

6. Summary & Literature References

Page 3: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

199 WS 17/18 Georg Frey

Contents of 7th Lecture

Learning in Neural Networks

Supervised (monitored) learning

Solid Learning Task:

Geg.: Input E, Output A

Un-Supervised (un-monitored)

learning

Free Learning Task :

Geg.: Input E

Example: Backpropagation Example: Competitive Learning

Page 4: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

200 WS 17/18 Georg Frey

Unsupervised Learning

Learning in Neural Networks

Supervised (monitored) learning

Solid Learning Task:

Geg.: Input E, Output A

Un-Supervised (un-monitored)

learning

Free Learning Task :

Geg.: Input E

Example : Backpropagation Example : Competitive Learning

Source: Carola Huthmacher

Page 5: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

201 WS 17/18 Georg Frey

Principle of Competitive Learning in the problem of clustering

Objectives of the clustering:

• Differences between

objects of a cluster are

minimal

• Differences between

objects of different

clusters are maximum

Learning through competition

• Competition principle

(Competition)

• Objective: Each group

will activate an output

neuron (binary)

Page 6: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

202 WS 17/18 Georg Frey

Architecture of a Competitive Learning Network

...

...

0 1 1 ) = x

0 )

Input

...

...

( 1 0 1 1 ) = x Rn

Output ( 1 0 ) = y Bm

Input Layer

Competitive Layer

3 1 2 n

Page 7: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

203 WS 17/18 Georg Frey

Processes in the Competitive Layer

j

( x1 x2 xn ) = x Rn

wj1 wj2 wjn

• Measure of the distance (displacement/offset)

between input and weighting vector

Sj = i wij xi = |w||x|cos

S is large for small displacement

• Winner: Neuron j with

Sj > Sk for all k j

• Output:

y winner = 1

y loser = 0

(„winner takes all“)

Page 8: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

204 WS 17/18 Georg Frey

Unsupervised Learning Algorithms

• Initialization:

Early Random weighting (normalized weight

vectors)

Vectors from training inputs (normalized) as

initial weights

• Competitive process

• Learning:

Input is a Vector x

Recalculate the weightings of the winner

neuron :

wj(t+1) = wj(t) + (t) [x - wj(t)]

(t) is the Learning rate (0,01 -0,3)

in the process the learning is gradually

reduced

Normalization(Standardization)

• Termination:

At the end the fulfillment of a Termination criterion

wj (t)

wj (t+1)

x

(t) [ x – wj (t) ]

0 1

1

x – wj (t)

Page 9: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

205 WS 17/18 Georg Frey

Advantages and Dis-Advantages

• Disadvantages:

difficult to find good initialization

Unstable

Problem: # Neurons in Competitive

Layer

• Advantages:

good clustering

easier and faster algorithm

Building block for more complex

networks

Page 10: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

206 WS 17/18 Georg Frey

Supervised Learning

Learning in Neural Networks

Supervised (monitored) learning

Solid Learning Task:

Geg.: Input E, Output A

Un-Supervised (un-monitored)

learning

Free Learning Task :

Geg.: Input E

Example: Back propagation Example: Competitive Learning

Source: Dr. Van Bang Le

Page 11: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

207 WS 17/18 Georg Frey

The Back propagation-Learning algorithms

History

• Werbos (1974)

• Rumelharts, Hintons, Williams (1986)

• Very important and well-known supervised learning for forward

networks

Idea:

• Minimizing the error function by Gradient relegation (descend)

Consequences

• Back propagation is a Gradient base procedure

• Learning here is math, no biological motivation!

Page 12: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

208 WS 17/18 Georg Frey

Task and aims of back propagation-learning

• Learning Task:

Quantity of input / output examples (training set):

L = {(x1, t1), ..., (xk, tk)}, where:

xi = Input Example (input pattern)

ti = Solution (Desired task, target) with input xi

• Learning Objective:

Each task (x, t) from L should be from the network with as little error as

can be calculated. .

Page 13: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

209 WS 17/18 Georg Frey

BP general approach to learning

• Subdivision of existing data

in

Trainings data

Validation data

• Training to achieve desired

error

• Validation

• Problem: Optimal end point

for training

Underfitting

Overfitting

Trainings-Iterations

Error

Validation

Training

Page 14: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

210 WS 17/18 Georg Frey

The Back propagation-Learning algorithms

• Error measurement:

Let (x, t) L and y is actual output of the network when input is x.

• Error concerning the pair (x, t):

Ex,t = ( = ½ || t –y ||2)

• Total Error :

• Note: :

The factor ½ is not relevant (|| t –y ||2 is then exactly minimum, If ½

|| t –y ||2 is minimum), but later leads to simplify the formulas.

L ) ,( i

2

ii

L ) ,(

, )y(tEE21

txtx

t x

i

2

ii )yt(21

Page 15: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

211 WS 17/18 Georg Frey

The gradient method

1. Consider the error as

a function of weights

2. To the weight vector

w = (W11, W12, ...)

belongs to the point (w, E (w))

on the error surface

3 Since E is differentiable, so at point w the gradient of the error area

is possible, and the gradient descends at a fraction New weight

vector w ‘

4. Repeat the Procedure at the Point w´ ...

E(w)

w w´

Fehler

Gewichte

Page 16: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

212 WS 17/18 Georg Frey

Gradient

Let f : ℝn → ℝ eine real Value Function.

• f(x1, ..., xn) show ,,in the direction of the highest growth rate ‘‘

of f and instead (x1, ..., xn).

Towards the relegation : –f

Example: f(x1, x2) = ½ x12 – x2 , f(x1, x2) = (x1, –1)

• Partial derivative of f after xi :

• Gradient of f :

Towards the descent into xi-direction: −∂

∂ x i

f

f) ..., f, f,( fnx

2x

1x

fi

x

Page 17: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

213 WS 17/18 Georg Frey

BP to multiple networks

Designations:

The network with input x was

completely broken into shares!

• A:= {i : i is Output neuron} the quantity of output neurons

For (x, t) L is then y =(oi)i A is the output when input is x

• Output of neuron i: oi

• Input for neuron j: netj :=

wij

i j

Viewing multiple-networks without abbreviation

(pure Feed-forward networks with connections between

Successive layers)

ji : i

iji wo

Page 18: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

214 WS 17/18 Georg Frey

BP to multiple networks: : Notation: Error Function

Error function:

f is differentiable, so is Ex,t and E is also differentiable, and gradient

relegation method can be applied!

• oj = f(netj), where f is the activation function of neurons.

• netj =

Offline-Version: Weight change after calculation of total error E (Batch

Learning)

Online-Version: Weight change under the current calculation error Ex,t

E = Ex,t =

ji : i

iji wo

L ) ,(

,Etx

t x

A j

2

jj )ot(21

Page 19: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

215 WS 17/18 Georg Frey

Sigmoid as the activation function

Until now, the

Activation function f was

the staircase function

So not everywhere

differentiable :

1 1

As an activation function for all neurons is

Now the sigmoid function s (x) = s1 (x)

Everywhere differentiable

Function:

1

1+e− cxsc(x) =

It is: s´(x) = s(x)(1 – s(x))

s2

s1

Page 20: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

216 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version

(1) Initialize the weights with random values wij

(2) Choose a pair (x, t) L

(3) Calculate the output y when input is x

(4) Consider the error Ex,t as a function of weights :

Ex,t = ½ || t –y ||2 = Ex,t(w11, w12, ...)

(5) Fractionally change wij (Learning rate) in the steepest descent

direction of the error :

(6) If there is no termination then repeat from (2) criterion

wij := wij + ·( ) −∂ E x , t

∂ w ij

Page 21: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

217 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (2)

For a fixed pair i, j Ex,t is considered as a Function of wij

(all other weights are included in this calculation constant )

• Ex,t depends on network output y (i.e. oj, j A)

• oj, j A, depends on the input of neuron j , netj, ab

• netj depends on wkj and ok , for all Connections kj

• ...

Backpropagation

Calculation of wij

i j

−∂ E x , t

∂ w ij

So backward is determined by the network! −∂ E x , t

∂ w ij

Page 22: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

218 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (3)

Dependency: Ex,t(wij) depends on net, netj depends on wij ab.

Application of the chain rule:

= oi

∂ net j

∂ w ij j := ,, Error Signal ‘‘ −

∂ E x , t

∂ net j

Calculation of wij

i j

−∂ E x , t

∂ w ij

ij

j

j

,

ij

,

w

net

net

E

w

E

txtx

Page 23: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

219 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (4)

Dependency: Ex,t(netj) depends on oj , oj depends on netj .

Application of the chain rule:

• = f´(netj) = ...

For f = sigmoid Activation function s shall continue :

... = s´(netj) = s(netj)·(1 – s(netj)) = oj·(1 – oj)

j

j

j

,

j

,

net

o

o

E

net

E

txtx

j

)j

j

j

net

f (net

net

o

Page 24: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

220 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (5)

wij

i j

Calculation of ∂ E x , t

∂ o j

Case 1: j is a output neuron.

= 2 ½ (tj – oj) (–1)

= – (tj – oj)

))(( A k

2

kk21

jj

,ot

oo

E

tx

Page 25: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

221 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (6)

Case 2: j is not an output neuron.

wij

i j

Calculation of ∂ E x , t

∂ o j

Dependency: oj will be presented at all follow-up of neurons, k and j

redirected and Ex,t depends on!

Application of the chain rule :

j

k

kj k:k

,

j

,

o

net

net

E

o

E

txtx

jk

kj k:

k w

Page 26: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

222 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (7)

Summary:

Error signal: j

−∂ E x , t

∂ w ijRelegation(descend) direction wij : = oi · j

Correction for wij: wij = wij + · oi · j

j to be calculated, all k must be known for all connections

kj

Back propagation

sonst,w)o1(o

Aj ),ot()o1(o

jk

kj k:

kjj

jjjj

Page 27: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

223 WS 17/18 Georg Frey

The Back propagation-Learning algorithm: Online-Version (8)

• Initialize the weights with random values

• Determination of abort criterion for total failure (error) E

• Determination of maximum Epoch number emax

E:= 0; e:= 1

repeat

for all (x, t) L do

• compute

• E:= E + Ex,t

• calculate backward, layerwise starting with the

output layer of the error signals j

• wij = wij + · oi · j

endfor

e:= e + 1

until (E meets ) or (e > emax)

Ex,t =

A j

2

jj )ot(21

Page 28: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

224 WS 17/18 Georg Frey

The Back propagation-Learning algorithm : Offline-Version

Offline means that the error for all input data

should also be minimized

In this mode, the weights after Presentation of all

tasks (x, t) L are modified:

)(ij

ijij wEww

))((ij

,

L ),(

ij w

Ew

tx

tx

L ),(

ijwtx

xx )(

j

)(

io

Page 29: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

225 WS 17/18 Georg Frey

Online vs. Offline

• When offline learning (Batch Learning) is in a corrective step, the

total error function (for all data) is optimized .

• There is a descent in the direction of the real Gradient direction the

total error function

• When online learning are the weights after the presentation of each

date adapt immediately.

• The direction of adjustment is in general not in agreement with the

Gradient direction.

• If the entries are selected in a random order, it is the middle of the

gradient that is followed.

• The online version is necessary, if not all pairs (x, t) at the beginning

of learning are known (adapting to new data, adaptive systems), or

if the offline version is too burdensome.

Page 30: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

226 WS 17/18 Georg Frey

Problems of Backpropagation: Symmetry Breaking

For complete layers, forward-affiliated networks, the weights may not give

equal value to be initialized. Otherwise, the weights between two layers

through back-propagation will always give the same values .

1

2

3

4

5

6

7

8

Ini: wij = a for all i, j

After the Forward-Phase:

o4 = o5 = o6 4 = 5 = 6

w14 = w15 = w16, w24 = w25 = w26,

w34 = w35 = w36, w47 = w57 = w67,

w48 = w58 = w68

This situation applies forward after each phase. Through such initialization

is therefore certain symmetry, which no longer be broken!

Solution: Small, random values for top weights.

Network input neti for all Neurons i is almost Null

s´(neti) size, and the Network adapts quickly.

Page 31: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

227 WS 17/18 Georg Frey

Problems of back propagation: Local minima

As with all gradient may be in back propagation

a local minimum area of error remains :

E

w w0 w1 w2 w3

There is no guarantee that a global

minimum (optimal weights) will be

found .

With a growing number of connections ( the dimension of the weight room is

great ) the surface error greater jagged. In a local minimum is likely to land !

Way out:

• Learning rate not to be chosen too small

• Several different initialization of the weights to try According to experience, the one minimum found for the concrete

application is acceptable solution

Page 32: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

228 WS 17/18 Georg Frey

Problems of Backpropagation: Leave (abandon) good minima

Leave good Minima:

• The size of the weight change depends on the amount of gradients .

• A good minimum is in a steep valley, the amount of the is gradient

so large that the good and minimize skipped in the vicinity of where

a worse minimum will be landed will:

E

w Way out:

• Learning rate not to be chosen very large

• Several different initialization of the weights to try

According to experience, the one minimum found for the concrete

application is an acceptable solution

Page 33: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

229 WS 17/18 Georg Frey

Problems of Backpropagation: Flat plateau

Flat plateau :

• At the very shallow surface, the error of the gradient is small and the

weights change according marginally .

• Especially many iteration step (high time for training)

• In extreme cases, do not fix the weights instead !

E

w

Page 34: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

230 WS 17/18 Georg Frey

Problems of Backpropagation: Oscillation

Oscillation

• In steep ravines (gorges), the procedure oscillate.

• At the edges of a steep ravine, the weight change cause from one

page to another is cracked, because the gradient is the same

amount but the reverse sign holds :

E

w

Page 35: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

231 WS 17/18 Georg Frey

Modification 0f Backpropagation

• There are many modifications to remedy the problems addressed.

All are based on heuristics: they cause in many cases, a rapid

acceleration of convergence .

• But there are cases where the adoption of heuristics is not present,

and a deterioration compared to the traditional procedure occurs

back propagation .

• Some popular modifications :

Momentum-Term (also conjugated Gradient relegation): The alleged problems

at the shallow plateaus and steep canyons. Idea: Increase the Learning rate to

shallow levels and reduction in the valleys. .

Weight Decay Large weights are neurobiological look implausible and cause

steep errors and rugged area. Error functions usually change at the same time

minimizing the weights (weight decay).

Quickprop Heuristic: A Valley of the fault surface (about a local minimum) may

be replaced by a top open parabolic approximate described. Idea: In a step

toward the vertex of the parabola (expected minimum of error function) jump .

Page 36: 8. Lecture Neural Networks - uni-saarland.de€¦ · SC WS 17/18 Georg Frey199 Contents of 7th Lecture Learning in Neural Networks Supervised (monitored) learning Solid Learning Task:

SC

232 WS 17/18 Georg Frey

Summary and learning from the 8th Lecture

To know basic forms of learning in neural networks

Supervised

Unsupervised

To know the idea of learning without teachers based on the

concurrent learning

To know the idea of learning by minimizing errors (with "teacher")

Example Back propagation

To know Back propagation

Procedure

Possible Problems