Artificial Neural Networks

Artificial Neural Artificial Neural NetworksNetworks

An IntroductionAn Introduction

Outline

• Introduction

• Biological and artificial neurons

• Perceptrons (problems)

• Backpropagation network

• Training

• Other ANNs (examples in HEP)

Introduction - What are ANNs?

• Artificial Neural Networks:– data analysis tools (/computational modelling tools)– model complex real-world problems– structures comprised of densely interconnected simple

processing elements– each element is linked to neighbours with varying strengths– learning is accomplished by adjusting these strengths to cause

network to output appropriate results– learn from experience (rather than being explicitly programmed

with rules)– inspired by biological neural networks (ANN’s idea is not to

replicate operation of bio systems, but use what’s known of their functionality to solve complex problems)

• Information processing characteristics :– nonlinearity (allows better fit

to data)– fault and failure tolerance (for

uncertain data and measurement errors)

– learning and adaptivity (allows system to update its internal structure in response to changing environment)

– generalization (enables application of model to unlearned data)

• Generally ANNs outperform other computational tools in solving a variety of problems:

• Pattern classification; categorizes set of input patterns in terms of different features

• Clustering; clusters formed by exploring similarities between input patterns based on their inter-correlations

• Function approximation; training ANN to approx. the underlying rules relating the inputs to the outputs

Biological Neuron• 3 major functional units

• Dendrites• Cell body• Axon

• Synapse• Amount of signal passing through a neuron

depends on:• Intensity of signal from feeding neurons• Their synaptic strengths• Threshold of the receiving neuron

• Hebb rule (plays key part in learning) • (A synapse which repeatedly triggers the

activation of a postsynaptic neuron will grow in strength, others will gradually weaken.)

• Learn by adjusting magnitudes of synapses’ strengths

y

w1 w2 wn

x1x2 xn

ξ

g(ξ)

Artificial Neurons (basic computational entities of an ANN)

• Analogy between artificial and biological (connection weights represent synapses)

• In 1958 Rosenblatt introduced mechanics (perceptron)

• Input to output (y=g(∑iwixj)• Only when sum exceeds the

threshold limit will neuron fire• Weights can enhance or inhibit• Collective behaviour of neurons is

what’s interesting for intelligent data processing

w1w2

w3

x1

x2

x3

y

∑w.x

g( )

Perceptrons• Can be trained on a set of examples using a

special learning rule (process)• Weights are changed in proportion to the

difference (error) between target output and perceptron solution for each example.

• Minimize summed square error function:

E = 1/2 ∑p∑i(oi(p) - ti

(p))2

with respect to the weights.

• Error is function of all the weights and forms an irregular multidimensional complex hyperplane with many peaks, saddle points and minima.

• Error minimized by finding set of weights that correspond to global minimum.

• Done with gradient descent method – (weights incrementally updated in proportion to δE/δwij)

• Updating reads: wij(t + 1) = wij(t) – Δwij

• Aim is to produce a true mapping for all patterns

oi

xj

wij

ξ

g(ξ)

threshold

Summary of Learning for Perceptron

1. Initialize wij with random values.

2. Repeat until wij(t + 1) ≈ wij(t):• Pick pattern p from training set.• Feed input to network and calculate the output.• Update the weights according to

wij(t + 1) = wij(t) – Δwij

where Δwij = -η δE/δwij.

1. When no change (within some accuracy) occurs, the weights are frozen and network is ready to use on data it has never seen.

Example

x1 x2 t x1 x2 t

1 1 1 1 1 1

1 0 0 1 0 1

0 1 0 0 1 1

0 0 0 0 0 0

AND OR

• Perceptron learns these rules easily (ie sets appropriate weights and threshold)

(to w=(w0,w1,w2) = (-1.5,1.0,1.0) and (-0.5,1.0,1.0) where w0 corresponds to the threshold term)

Problems

• Perceptrons can only perform accurately with linearly separable classes (linear hyperplane can place one class of objects on one side of plane and other class on other)

• ANN research put on hold for 20yrs.

• Solution: additional (hidden) layers of neurons, MLP architecture

• Able to solve non-linear classification problems

x1

x2

x1

x2

MLPs

• Learning procedure is extension of simple perceptron algorithm

• Response function:

oi=g(∑iwijg(∑kwjkxk))Which is non-linear so network able to

perform non-linear mappings

• (Theory tells us that a neural network with at least 1 hidden layer can represent any function)

• Vast number of ANN types exist

oi

wij

wjk

xk

hj

Backpropagation ANNs

• Most widely used type of network

• Feedforward

• Supervised (learns mapping from one data space to another using examples)

• Error propagated backwards

• Versatile. Used for data modelling, classification, forecasting, data and image compression and pattern recognition.

BP Learning Algorithm• Like Perceptron, uses gradient

descent to minimize error (generalized to case with hidden layers)

• Each iteration constitutes two sweeps

• To minimize Error we need δE/δwij but also need δE/δwjk (which we get using the chain rule)

• Training of MLP using BP can be thought of as a walk in weight space along an energy surface, trying to find global minimum and avoiding local minima

• Unlike for Perceptron, there is no guarantee that global minimum will be reached, but most cases energy landscape is smooth

Summary of BP learning algorithm

1. Initialize wij and wjk with random values.

2. Repeat until wij and wjk have converged or the desired performance level is reached:

• Pick pattern p from training set.• Present input and calculate the output.• Update weights according to:

wij(t + 1) = wij(t) – Δwij

wjk(t + 1) = wjk(t) – Δwjk

where Δw = -η δE/δw.

(…etc…for extra hidden layers).

Training

• Generalization; network’s performance on a set of test patterns it has never seen before. (lower than on training set)

• Training set used to let ANN capture features in data or mapping.

• Initial large drop in error is due to learning, but subsequent slow reduction is due to:

1. Network memorization (too many training cycles used).

2. Overfitting (too many hidden nodes).

(network learns individual training examples and loses generalization ability)

Error (eg SSE)

No. of hidden nodes or training cycles

Optimum network

Testing

Training

Other Popular ANNs

Some applications may be solved using variety of ANN types, some only via specific. (problem logistics)

• Hopfield networks; optimization. Presented with incomplete/noisy pattern, network responds by retrieving an internally stored pattern it most closely resembles.

• Kohonen networks; (self-organizing)Trained in an unsupervised manner to form clusters in the data. Used for pattern classification and data compression.

HEP Applications

ANNs applied from off-line data analysis to low-level experimental triggers

• Signal to background ratios reduced. (BP…)– ie in flavour tagging, Higgs detection

• Feature recognition problems in track finding. (feed-back)

• Function approximation tasks (feed-back)– ie reconstructing the mass of a decayed particle from

calorimeter information

• http://www.doc.ic.ac.uk/~nd/surprise_96.journal/vol4/cs11/report.html

• http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html

• Carsten Peterson and Thorsteinn Rognvaldsson, An Introduction to Artificial Neural Networks, LU TP 91-23, September 1991 (Lectures given at the 1991 Cern School of Computing, Sweden)

http://www.doc.ic.ac.uk/~nd/surprise_96.journal/vol4/cs11/report.html

http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html

Documents

Artificial Neural Networks