AI NN Introduction - osp.mans.edu.egosp.mans.edu.eg/elbeltagi/AI NN Introduction.pdf · 3 5 Introduction ¾A set of connections brings in activations from other neurons ¾A processing

1

Introduction to Neural Networks

2

Introduction

Artificial Neural Networks (ANN) also named “Connectionist

Models” or “Parallel Distributed Processing (PDP)”

ANN consists of a pool of simple processing units “called

neurons or cells” which communicate by sending signals to

each other over a large number of weighted connections

2

3

IntroductionANN behave like a human brain. It

demonstrates the ability to learn,

recall, and generalize from training

pattern or data

The processing element in ANN also

called “Neuron”

A human brain consists of 10 billions

neurons

Each biological neuron is connected to

several thousands of other neurons,

similar to the connectivity in ANN

4

Introduction

Dendrites receive activation from other neurons

The neuron’s cell body (soma) processes the incoming activations and converts them into output activations

Axons acting as transmission lines that send activation to other neurons

Biological and artificial neurons

3

5

Introduction

A set of connections brings in activations from other neurons

A processing unit sums the inputs, and then applies an activation functionAn output line transmits the result to other neurons

Biological and artificial neurons

Inputs

Weights Processing

Outputs

6

Introduction

Input

layer

General Structure of ANN

Output

layerHidden

layer 1

Hidden

layer 2

x1

x2

xn

y1

y2

w11

w21

4

7

Introduction

Topology or Architecture How information flow from input to output

Type of neuron characterized by the activation function used

Examoles: Feedforward or recurrent

Classification of ANN: Architecture

8

IntroductionClassification of ANN: Activation Function

5

9

Introduction

Learning Algorithm (to define the weights)

Examples: Supervised vs unsupervised

Classification of ANN: Learning Algorithm

Supervised learning

Both inputs and outputs are provided. The network processes the

inputs and compares its resulting outputs against the desired

outputs. Errors are calculated to adjust the weights

Unsupervised learning

the network is provided with inputs but not with outputs. The

system itself must decide what features it will use to group the

input data. This is often referred to as self-organization

Single-Layer Feed-Forward NN

6

11

ANN Neuron Model

O = ƒ ( Σ xj wj – θ)

McCulloch-Pitts Neurons Model (M-P)

J=1

x1

xn

O

w1

wn

n

1 if ƒ ≥ 0a(ƒ) =

0 OtherwiseStep function

This was the first attempt in Mid forties, there is no learning as there is no updating to the weights

12

ANN Neuron Model

Perceptron Model (Early sixties)

An arrangement of one input layer of M-P neurons feeding forward to one output layer of M-P neurons is known as a Perceptron (Single layer feed forward)

1 if ƒ ≥ 0a(ƒ) =

-1 if ƒ < 0Signum function

7

13

ANN Neuron Model

Training of Perceptron

Initialize weights and threshold to small random numbers

Choose an input-output pattern from the training data set

Compute the output O = ƒ ( Σ xj wj – θ)

Adjust the weights: w(k+1) = w (k) + µ[t (k) – O(k)] x(k)

T is output vector, and µ is a positive number from 0 to 1 and

represents the learning rate

Repeat until NN outputs converge to actual outputs

14

ANN Neuron Model

Convergence of Perceptron Learning

The weight changes need to be applied repeatedly. One pass

through all the weights for the whole training set is called one

epoch of training

When all outputs match the targets for all training patterns,

all the ∆wij will be zero and the process of training will cease.

We then say that the training process has converged

8

15

ANN Neuron Model

Consider a 4-input single neuron to be trained with input vector x

An initial weight vector w

Example

1

-2

0

1.5

1

w = -1

0.5

0

O (1) = Sgn[w(1)Tx] = Sgn([1 -1 0.5 0] ) = Sgn(3) = 1

First iteration: w(2) = w(1) - µ[O(1)]x= - =

1

-2

x= 0

1.5

1

-1

0.5

0

1

-2

0

1.5

0

1

0.5

-1.5

16

Example

9

17

Example

18

Example

10

19

Practical Considerations

Training Data Pre-processing

We can just use any raw data to train our networks. However,

it is necessary to carry out some preprocessing of the training

data before feeding it to the network

We should make sure that the training data is representative,

it should not contain too many examples of one type.

On the other hand, if one class of pattern is easy to learn,

having large numbers of patterns from that class in the

training set will only slow down the learning process

20


Training Data Pre-processing

If the training data is continuous, it is good idea to rescale

the input values. Simply shifting the zero of the scale so that

the mean value of each input is near zero, and normalizing so

that the standard deviation of the values for each input are

roughly the same, can make a big difference.

11

21


Scaling the Input Data

Data is scaled to a range of -1 to 1.

Scaled value = 2(unscaled value – min value)

For example: an input of year of construction ranges from

1991 till 1998. For a given value of 1995, its scaled vale =

[2 (1995 – 1991) / (1998 – 1991) ] – 1 = 0.14

(max value – min value) - 1

22


Choosing the Initial Weights

Do not start weights with the same value, all the hidden units

will end up doing the same thing and the network will never

learn properly

For that reason, we generally start off all the weights with

small random values around zero

We usually train the network from a number of different

random initial weight sets

In networks with hidden layers, we can expect different final

sets of weights to emerge from the learning process for

different choices of random initial weights

12

23


Choosing the Learning RateChoosing the learning rate µ is constrained by two opposing facts: If µ is too small, it will take too long to get anywhere near the minimum of the error function. If µ is too large, the weight updates will over-shoot the error minimum and the weights will oscillate, or even diverge

The optimal value is network dependent, so one cannot formulate general weight. Generally, one should try a range of different values (e.g. h = 0.1, 0.01, 1.0, 0.0001) and use the results as a guide

There is no necessity to keep the learning rate fixed throughout the learning process

24


Choosing the Transfer Function

In terms of computational efficiency, the standard sigmoid is

better than the step function of the Simple Perceptron

A convenient alternative to the logistic function is then the

hyperbolic tangent: f(x) = tanh(x)

When the outputs are required to be non-binary, i.e.

continuous real values, having sigmoidal transfer functions no

longer makes sense. In these cases, a simple linear transfer

function f(x) = x is appropriate.

13

25


Batch Training vs Online Training

When we add up the weight changes for all the training

patterns, and apply them in one go, it is called Batch Training.

A natural alternative is to update all the weights immediately

after processing each training pattern. This is called On-line

Training (or Sequential Training).

On-line learning, Normally a much lower learning rate will be

necessary than for batch learning. However, the learning is

often much quicker.

Documents

AI NN Introduction - osp.mans.edu.egosp.mans.edu.eg/elbeltagi/AI NN Introduction.pdf · 3 5 Introduction ¾A set of connections brings in activations from other neurons ¾A processing