Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Introduction to Neural Networks
2
Introduction
Artificial Neural Networks (ANN) also named “Connectionist
Models” or “Parallel Distributed Processing (PDP)”
ANN consists of a pool of simple processing units “called
neurons or cells” which communicate by sending signals to
each other over a large number of weighted connections
2
3
IntroductionANN behave like a human brain. It
demonstrates the ability to learn,
recall, and generalize from training
pattern or data
The processing element in ANN also
called “Neuron”
A human brain consists of 10 billions
neurons
Each biological neuron is connected to
several thousands of other neurons,
similar to the connectivity in ANN
4
Introduction
Dendrites receive activation from other neurons
The neuron’s cell body (soma) processes the incoming activations and converts them into output activations
Axons acting as transmission lines that send activation to other neurons
Biological and artificial neurons
3
5
Introduction
A set of connections brings in activations from other neurons
A processing unit sums the inputs, and then applies an activation functionAn output line transmits the result to other neurons
Biological and artificial neurons
Inputs
Weights Processing
Outputs
6
Introduction
Input
layer
General Structure of ANN
Output
layerHidden
layer 1
Hidden
layer 2
x1
x2
xn
y1
y2
w11
w21
4
7
Introduction
Topology or Architecture How information flow from input to output
Type of neuron characterized by the activation function used
Examoles: Feedforward or recurrent
Classification of ANN: Architecture
8
IntroductionClassification of ANN: Activation Function
5
9
Introduction
Learning Algorithm (to define the weights)
Examples: Supervised vs unsupervised
Classification of ANN: Learning Algorithm
Supervised learning
Both inputs and outputs are provided. The network processes the
inputs and compares its resulting outputs against the desired
outputs. Errors are calculated to adjust the weights
Unsupervised learning
the network is provided with inputs but not with outputs. The
system itself must decide what features it will use to group the
input data. This is often referred to as self-organization
Single-Layer Feed-Forward NN
6
11
ANN Neuron Model
O = ƒ ( Σ xj wj – θ)
McCulloch-Pitts Neurons Model (M-P)
J=1
x1
xn
O
w1
wn
n
1 if ƒ ≥ 0a(ƒ) =
0 OtherwiseStep function
This was the first attempt in Mid forties, there is no learning as there is no updating to the weights
12
ANN Neuron Model
Perceptron Model (Early sixties)
An arrangement of one input layer of M-P neurons feeding forward to one output layer of M-P neurons is known as a Perceptron (Single layer feed forward)
1 if ƒ ≥ 0a(ƒ) =
-1 if ƒ < 0Signum function
7
13
ANN Neuron Model
Training of Perceptron
Initialize weights and threshold to small random numbers
Choose an input-output pattern from the training data set
Compute the output O = ƒ ( Σ xj wj – θ)
Adjust the weights: w(k+1) = w (k) + µ[t (k) – O(k)] x(k)
T is output vector, and µ is a positive number from 0 to 1 and
represents the learning rate
Repeat until NN outputs converge to actual outputs
14
ANN Neuron Model
Convergence of Perceptron Learning
The weight changes need to be applied repeatedly. One pass
through all the weights for the whole training set is called one
epoch of training
When all outputs match the targets for all training patterns,
all the ∆wij will be zero and the process of training will cease.
We then say that the training process has converged
8
15
ANN Neuron Model
Consider a 4-input single neuron to be trained with input vector x
An initial weight vector w
Example
1
-2
0
1.5
1
w = -1
0.5
0
O (1) = Sgn[w(1)Tx] = Sgn([1 -1 0.5 0] ) = Sgn(3) = 1
First iteration: w(2) = w(1) - µ[O(1)]x= - =
1
-2
x= 0
1.5
1
-1
0.5
0
1
-2
0
1.5
0
1
0.5
-1.5
16
Example
9
17
Example
18
Example
10
19
Practical Considerations
Training Data Pre-processing
We can just use any raw data to train our networks. However,
it is necessary to carry out some preprocessing of the training
data before feeding it to the network
We should make sure that the training data is representative,
it should not contain too many examples of one type.
On the other hand, if one class of pattern is easy to learn,
having large numbers of patterns from that class in the
training set will only slow down the learning process
20
Practical Considerations
Training Data Pre-processing
If the training data is continuous, it is good idea to rescale
the input values. Simply shifting the zero of the scale so that
the mean value of each input is near zero, and normalizing so
that the standard deviation of the values for each input are
roughly the same, can make a big difference.
11
21
Practical Considerations
Scaling the Input Data
Data is scaled to a range of -1 to 1.
Scaled value = 2(unscaled value – min value)
For example: an input of year of construction ranges from
1991 till 1998. For a given value of 1995, its scaled vale =
[2 (1995 – 1991) / (1998 – 1991) ] – 1 = 0.14
(max value – min value) - 1
22
Practical Considerations
Choosing the Initial Weights
Do not start weights with the same value, all the hidden units
will end up doing the same thing and the network will never
learn properly
For that reason, we generally start off all the weights with
small random values around zero
We usually train the network from a number of different
random initial weight sets
In networks with hidden layers, we can expect different final
sets of weights to emerge from the learning process for
different choices of random initial weights
12
23
Practical Considerations
Choosing the Learning RateChoosing the learning rate µ is constrained by two opposing facts: If µ is too small, it will take too long to get anywhere near the minimum of the error function. If µ is too large, the weight updates will over-shoot the error minimum and the weights will oscillate, or even diverge
The optimal value is network dependent, so one cannot formulate general weight. Generally, one should try a range of different values (e.g. h = 0.1, 0.01, 1.0, 0.0001) and use the results as a guide
There is no necessity to keep the learning rate fixed throughout the learning process
24
Practical Considerations
Choosing the Transfer Function
In terms of computational efficiency, the standard sigmoid is
better than the step function of the Simple Perceptron
A convenient alternative to the logistic function is then the
hyperbolic tangent: f(x) = tanh(x)
When the outputs are required to be non-binary, i.e.
continuous real values, having sigmoidal transfer functions no
longer makes sense. In these cases, a simple linear transfer
function f(x) = x is appropriate.
13
25
Practical Considerations
Batch Training vs Online Training
When we add up the weight changes for all the training
patterns, and apply them in one go, it is called Batch Training.
A natural alternative is to update all the weights immediately
after processing each training pattern. This is called On-line
Training (or Sequential Training).
On-line learning, Normally a much lower learning rate will be
necessary than for batch learning. However, the learning is
often much quicker.