sorry.vse.cz - Neural networksberka/docs/4iz451/sl08-nn-en.pdf1. initialize weight in the network with small random numbers (e.g. from interval [-0.05,0.05]) 2. repeat 2.1. for every

Knowledge Discovery in Databases T8: neural networks

P. Berka, 2019 1/23

Neural networks

Biological neuron

Biological neuron

Models of a single neuron

Scheme of a neuron

1. Logical neuron (McCulloch, Pitts, 1943)

w R, x, y {0, 1}


P. Berka, 2019 2/23

2. ADALINE (Widrow, 1960)

x, w R, y {0, 1}

SUM = i=1

n wixi

y’= 1 for i=1

n wixi > w0

y’= 0 for i=1

n wixi < w0

Prostor atributů

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 20000 40000 60000 80000 100000

konto

příjem

A

A

A A

A

A A

A

n

n

n

n

příjem + 0.2 konto – 16000 = 0

(analogy with linear regression)


P. Berka, 2019 3/23

3. Current models

x, w R, y [0, 1] (or [-1,1])

Transfer (activation) functions

sigmoidal function f(SUM) = 1

1 + e - SUM ; output of neuron

y' is in the range [0, 1],

hyperbolic tangens f(SUM) = tanh(SUM); output of neuron

y' is in the range [-1, 1].

Sometimes is nonlinear transformation missing;

i.e. f(SUM) = SUM.

In this case the output of neuron is weighted sum of inputs


P. Berka, 2019 4/23

Learning ability

Modification of weights w using training data [xk, yk]

Learning as approximation – looking for parameters

of given function f(x)

Hebb’s law

wk+1 = wk + yk xk

gradient method

Err(w) = 1

2 k=1

N (yk - f(xk))2

d

dq k=1

N (yk - f(xk))2 = 0

wk+1 = wk - Err(w)

w

for y’k = f(xk) = wk xk = i wik xik

wk+1 = wk + (yk - y’k ) xk

F

wi =

1

2

k=1

n

wi(yk -y’k)

2 = 1

2

k=1

n

2(yk -y’k)

wi(yk - y’k) =

k=1

n

(yk - y’k)

wi(yk - wkxk) =

i=1

n

(yk - y’k)(-xik)


P. Berka, 2019 5/23

Error function for linear activation function

Error function for threshold activation function


P. Berka, 2019 6/23

Perceptron

Rosenblatt 1957, model of individual neurons in the

visual cortex of cats

(Kotek a kol., 1980)

Hierarchical system composed of three layers:

receptors (0, 1 outputs)

associative elements (fixed weights +1, -1)

reacting elements (weighted sum i wixi )

Learning only on the layer of reacting elements


P. Berka, 2019 7/23

The non-equivalence problem: A B

more general, the problem with tasks that are not

linearly separable

Minsky M., Pappert S.: Perceptrons, an introduction

to computational geometry, MIT Press 1969 –

criticism of neural networks

„Silent years“

A

B

0 1

1 F

F T

T


P. Berka, 2019 8/23

Rebirth of neural networks, 1980th

More structured networks - Hopfield, Hecht-Nielsen,

Rumelhart a Kohonen

1. the ability to approximate an arbitrary continuous

function

an arbitrary logical function (in DNF form) can be expressed

using three-layer network consisting of neurons for

conjunction, disjunction and negation

conjunction

disjunction

negation

E.g. nonequivalence A B (A B) (A B)

2. new learning algorithms

0

-1

2

1 1

1

1 1


P. Berka, 2019 9/23

Multilayer perceptron (MLP)

Network used for classification resp. prediction

3 layers:

input – distributes data to next layer

hidden

output – gives the results

sigmoidal activation function for neurons in hidden and

output layer


P. Berka, 2019 10/23

Backpropagation – supervised learning

Minimalization of error F(w) = 1

2 k=1

N

vvýstupy

(yvk - y’vk)2

Stopping criteria:

number of iterations

value of the error

change of error between iterations

Backpropagation algorithm

1. initialize weight in the network with small random numbers (e.g. from

interval [-0.05,0.05])

2. repeat 2.1. for every example [x,y]

2.1.1. compute output ou for each neuron u in the network

2.1.2. for each neuron v in output layer compute error

errorv = ov (1 - ov) (yv - ov) 2.1.3. for each neuron h in hidden layer compute error

errorh = oh (1 - oh) vvýstup (wh,v errorv )

2.1.4. for each link from neuron j to neuron k modify weight of

the link

wj,k = wj,k + wj,k , where wj,k = errork xj,k until stopping criterion is not satisfied


P. Berka, 2019 11/23

Implementation (Weka)


P. Berka, 2019 12/23

RBF network

Network used for classification resp. prediction

3 layers:

input – distributes data to next layer

hidden - RBF activation function (e.g.

2

2)(

2

1exp cx

)

output – linear activation function

2

2)(

2

1exp cx


P. Berka, 2019 13/23

Implementation (EM)


P. Berka, 2019 14/23

Diferences between MLP a RBF (Nauck, Klawonn, Kruse, 1997)

MLP is a „global“ classifier , RBF builds „local“ model

MLP more suitable for linear separable tasks,

RBF more suitable for isolated elliptic clusters


P. Berka, 2019 15/23

Kohonen map (SOM)

Network used for segmentation and clustering, competition

among neurons (winner takes all).

2 layers:

input – distributes input data to next layer

Kohonen map – activation function ||x – w||

lateral inhibition

unsupervised learning:

F(w) = 1

2 k=1

N (xk - w)2

wk+1 = wk + (xk - wk) y’k

vzdálenost od neuronu


P. Berka, 2019 16/23

Implementation (Clementine)

Kohonen map

Found clusters


P. Berka, 2019 17/23

Application of neural networks

select the type of network (topology, neurons) –

hyperparameters tuning

(see e.g. https://playground.tensorflow.org)

choose training data

learn the network

Prostor atributů

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 20000 40000 60000 80000 100000

konto

příjem

A

A

A A

A

A A

A

n

n

n

n

Expressive power of MLP

neural networks suitable for numeric attributes

neural networks can express complex clusters in the

attribute space

found models hard to interpret

https://playground.tensorflow.org/


P. Berka, 2019 18/23

Example 1 - Classification

Task: credit risk assessment of a bank client

Input: data about loan applicant

Output: decision „grant/don’t grant the loan“

Solution:

Network topology: MLP

Input layer: one neuron for each characteristics

of the applicant

Output layer: single neuron (decision)

Hidden layer: ??

Learning: training data consists of information

about past applications together with the decision

of the bank


P. Berka, 2019 19/23

Example 2 - Prediction

Task: exchange rate prediction

Input: past values of exchange rate

Output: future values of exchange rate

Solution:

Network topology: MLP

Input layer: one neuron for each past value

Output layer: one neuron for each future value

Hidden layer: ??

Learning: past and current data to build model

t t0 t1 t2 t3 t4 t5 t6 t7 . . .

y

inputs output

y(t0) y(t1) y(t2) y(t3) y(t4) [sign(y(t4) - y(t3))]

y(t1) y(t2) y(t3) y(t4) y(t5) [sign(y(t5) - y(t4))] y(t2) y(t3) y(t4) y(t5) y(t6) [sign(y(t6) - y(t5))]

. . .


P. Berka, 2019 20/23

New Trend – Deep Learning

Neural network architectures with more hidden

layers.

Convolutional neural network: deep (multi-

layered), feed-forward artificial neural

networks typically used to analyzing images the layers are organized in 3 dimensions: width,

height and depth

the neurons in one layer do not connect to all the

neurons in the next layer but only to a small

region of it

the final output is a single vector of probability

scores, organized along the depth dimension

Convolutional network performs both feature

construction and classification


P. Berka, 2019 21/23

multi-layer vs. convolutional network

Convolution (combination of

two functions): e.g.

combines input with a filter

performing a matrix

multiplication

RELU: activation function

f(x) = max(0,x)

Pooling: non-linear down-

sampling (data

compression), e.g. max

pooling


P. Berka, 2019 22/23

SVM

1. change (using data transformations) classification

tasks, where classes are not linearly separable into

tasks where, classes are linearly separable

zx ),2,(),()( 2

221

2

121 xxxxxxΦΦ

2. Find separating hyperplane which has maximal distance

from transformed examples in training data (maximal

margin hyperplane). It is sufficient to work only with

examples closest to the hyperplane (support vectors)


P. Berka, 2019 23/23

we are looking for a linear discriminant function

0)()( wΦf xwx where

i

iii y )(xw

(maximum margin hyperplane)

i

iii wyf 0)()()( xxx

Kernel trik – special functions (kernel functions) can be used

during computations

i

iii wKyf 0),()( xxx

E.g..

Polynomial kernel K(xi, x) = (xi x)d

Gaussian kernel K(xi, x) = exp(- ||xi – x||2)

Tanh kernel K(xi, x) = tanh(xi x + c)

Documents

sorry.vse.cz - Neural networksberka/docs/4iz451/sl08-nn-en.pdf1. initialize weight in the network with small random numbers (e.g. from interval [-0.05,0.05]) 2. repeat 2.1. for every