nn perceptron 1 - cneuro.rmki.kfki.hucneuro.rmki.kfki.hu/sites/default/files/nn_perceptron_1_0.pdf · 6 Terminology Supervised learning: labelled data is provided, i.e. an input-output

Idegrendszeri modellezés

Orbán Gergő

http://golab.wigner.mta.hu

1

Neurons and neural networks I.

2

High-level vs. low level modeling

Previously in this course:

* Mechanistic description of neurons* Descriptive models of neuronal operations

Goals of the next classes:

Are these models compatible with the neural architecture?How can neural computations interpreted in this framework?

3

Outline

• single neuron• neural network

• supervised learning• unsupervised learning

4

Neural networks, basic concepts

Goal of the neural network can be phrased:learning about its inputforming memories

Address based memory• not associative• not fault tolerant• not distributed

Biological memory• associative• error tolerant

• cue error• hardware fault

• parallel and distributed

5

Terminology

Architecture: what variables are involved, how they interact(weights, activities)

Activity rule: short time-scale behaviorhow activitiy of neurons depends on inputs

Learning rule: long time-scle behaviorhow weights change as a function of activities ofother neurons, time, or other factors

6

Terminology

Supervised learning: labelled data is provided, i.e. an input-output relationship is given;the ‘teacher’ tells whether the response ofthe neuron to a given input was correct

Unsupervised learning: unlabelled data is available;learns the statistics of input:generating a table of occurencessome more sophisticated representationis learned

7

Single neuron as a classifier DefinitionsArchitecture: I inputs xi

weights wi

Activity rule:1. activation is calculated

2. output is calculated• linear• logistic

• tanh• threshold

7

Single neuron as a classifier DefinitionsArchitecture: I inputs xi

weights wi

Activity rule:1. activation is calculated

2. output is calculated• linear• logistic

• tanh• threshold

8

Input space

Single neuron as a classifier Concepts

8

Input space


8

Input space

Weight space


9


The supervised learning paradigm:given example inputs x and target outputs tlearning the mapping between them

the trained network is supposed to give ‘correct response’ for any given input stimulus

training is equivalent to learning the appropriate weights

in order to achieve this, an objective function (or error function) is defined, which is minimized during training

10

Binary classification in a neuronError function:

10


1. information content of outcomes2. relative entropy of distributions (t,

1-t) and (y, 1-y)

10


1. information content of outcomes2. relative entropy of distributions (t,

1-t) and (y, 1-y)

Learning: back-propagation algorithm

Here is the error

11

Binary classification The algorithm

1. Get the data2. Calculate activations:3. Calculate the output of the neuron:4. Calculate error:5. Update weight(s):

11



alternatively, batch learning can be done

11




11




gradient descent

11




gradient descent

stochastic gradient descent

12

Binary classification Demonstration

12


12


12


12


Evolution of weights

12


Evolution of weights

Evolution of the errorfunction

13

Binary classification Are we happy?

13


w’s seem to grow unbounded

13



Is this bad for us?

13



Is this bad for us?

Generic pain killer: regularization

(weight decay)

14

Binary classification Effect of regularization

14

Binary classification Effect of regularization

15

Capacity of the neuron

How much information can the neuron transmit?

16

Interpreting learning as inferenceSo far: optimization wrt. an objective function

where

16

Interpreting learning as inferenceSo far: optimization wrt. an objective function

where

What’s this quirky regularizer, anyway?

17

Interpreting learning as inferenceLet’s interpret y(x,w) as a probability:

17


in a compact form:

17


in a compact form:

the likelihood of the input data can be expressed with the original error function function

17


in a compact form:


the regularizer has the form of a prior!

17


in a compact form:


the regularizer has the form of a prior!

what we get in the objective function M(w): the posterior distribution of w:

18

Interpreting learning as inference Relationship between M(w) and the posterior

interpretation: minimizing M(w) leads to finding the maximum a posteriori estimate wMP

The log probability interpretation of the objective function retains:additivity of errors, whilekeeping the multiplicativity of probabilities

19

Interpreting learning as inference Properties of the Bayesian estimate

19

Interpreting learning as inference Properties of the Bayesian estimate

• The probabilistic interpretation makes our assumptions explicit:by the regularizer we imposed a soft constraint on the learned parameters, which expresses our prior expecations.

• An additional plus:beyond getting wMP we get a measure for learned parameter uncertainty

Documents

nn perceptron 1 - cneuro.rmki.kfki.hucneuro.rmki.kfki.hu/sites/default/files/nn_perceptron_1_0.pdf · 6 Terminology Supervised learning: labelled data is provided, i.e. an input-output