Upload
cheryl
View
90
Download
1
Embed Size (px)
DESCRIPTION
Chapter 3 Neural Network. Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University [email protected] http:// cs.tju.edu.cn/faculties/gongxj/course/ai /. Outline. Introduction Training a single TLU Network of TLUs—Artificial Neural Network Pros & Cons of ANN - PowerPoint PPT Presentation
Citation preview
Chapter 3
Neural Network
Xiu-jun GONG (Ph. D)School of Computer Science and Technology, Tianjin
University
http://cs.tju.edu.cn/faculties/gongxj/course/ai/
Outline
Introduction
Training a single TLU
Network of TLUs—Artificial Neural Network
Pros & Cons of ANN
Summary
Biological /Artificial Neural Network
SMI32-stained pyramidal neurons in
cerebral cortex.
Structure of a typical neuron
x2 w2
wn…
x1 w1
f(s)
F(s)
xn
Artificial IntelligenceRecognition modelingNeuroscience
Definition of ANN Stimulate Neural Network: SNN, NN
It is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network.
Applications of ANN Function approximation, or regression
analysis, including time series prediction and modeling.
Classification, including pattern and sequence recognition, novelty detection and sequential decision making.
Data processing, including filtering, clustering, blind signal separation and compression.
Extension of a TLU Threshold Logic Unit -> Perceptron (Neuron)
Inputs are not limited be boolean values
Outputs are not limited be binary functions
Output functions of a perceptron
θ
s 0
s 1f
sef
1
1
Characters of sigmoid function Smooth, continuous, and monotonically
increasing (derivative is always positive) Bounded range - but never reaches max or
min The logistic function is often used
sef
1
1)1(' fff
Linear Separable function by TLU
3
___
2132\1 ),,( xxxxxxf
2
___
1
___
212\1 ),( xxxxxxf
A network of TLUs
x1
x2
y1
y2
f
1
-1
-1
1
1
0.5
0.5
0.5
XOR
2
___
1
___
212\1 ),( xxxxxxf
Even-Parity Function
2121 xxxxf
Training single neuron What is the learning/training
The methods The Delta Procedure The Generalized Delta Procedure The Error-Correction Procedure
Reform the representation of a perceptron
x1
x2
…
xn
xn+1 ≡ 1
W1
W2
Wn
Wn+`
f = f (s)S=WX
1n
1i ii
n
1i ii xwxws
1
2
1
121 ......
n
n
nn
w
w
w
w
xxxxs
Summing Junction
Activation Function
output
Gradient Decent Methods Minimizing the squared error of desired
response and neuron output Squared error function: ε = (d - f)2
121
,,...,,nn
def
wwww
W XWS
XS
ffd
S
S
f
f
)(2WW
The Delta Procedure Using linear function f = s Weight update:
W ← W + c (d – f ) X
Delta rule (Widrow-Hoff rule)
The Generalized Delta Procedure Using sigmoid function f (s) = 1 /
(1+e-s) Weight update
W ← W + c (d – f ) f (1-f ) X
Generalized delta procedure:f (1– f ) → 0 , where f → 0 or f → 1Weight change can occur only within ‘fuzzy’
region surrounding the hyperplane near the point f = 0.5
The Error-Correction Procedure Using threshold function (output : 0,1) The weight change rule
W ← W + c (d – f ) X W ← W ± c X
In the linearly separable case, after finite iterations, W will be converged to the solution.
In the nonlinearly separable case, W will never be converged.
An example
x1=S2+S3 x2=S4+S5 x3=S6+S7 x4=S8+S9
x1
x2
x3
x4
1
W11
W21
W41
W31
W51
east
ANN: Its topologies
Context Layer
Recurrent ANN
Inp
uts
Feedback
Outp
uts
Feedforward
Inp
uts
Outp
uts
Training Neural Network Supervised method
Trained by matching input and output patterns Input-output pairs can be provided by an external teacher, or
by the system Unsupervised method (Self-organization)
An (output) unit is trained to respond to clusters of pattern within the input.
There is no a priori set of categories Enforcement learning
An intermediate form of the above two types of learning. The learning machine does some action on the environment
and gets a feedback response from the environment. The learning system grades its action good (rewarding) or bad
(punishable) based on the environmental response and accordingly adjusts its parameters.
Supervised training
Back-propagation—Notations
1px
2px
0pNx
0j 1j mj
1pO
2pO
MpNO
1pT
2pT
MpNT
#Layerj
jlayerinneuronsN j #
inputN #0
patternsnofpatternpththep :
outputNM #
jLayerinneuronithofoutputY ji :
jLayerinneuroniththe
ji withassociated eerror valu the:
jLayerinneuroniththe
W jik
to1)-(jlayer in
neuronkth from weight connection the:
Back-propagation: The method1. Initialize connection weights into small random values.
2. Present the pth sample input vector of pattern and the corresponding output target to the network
),....,(021 pNppp xxxX
),....,( 21 MpNppp YYYY
3. Pass the input values to the first layer, layer 1. For every input node i in layer 0, perform:
pii xY 0
4 For every neuron i in every layer j = 1, 2, ..., M, from input to output layer, find the output from the neuron:
)(1
1)1(
jN
kjikkjji WYfY
5. Obtain output values. For every output node i in layer M, perform:
Mipi YO
6.Calculate error value for every neuron i in every layer in backward order j = M, M-1, ... , 2, 1
The method cont. 6.1 For the output layer, the error value is:
))(1( MipiMiMiMi YTYY
1
1)1()1()1(
jN
kkijkjjijiji WYY
6.2 For the hidden layer, the error value is:
6.3 The weight adjustment can be done for every connection from neuron k in layer (i-1) to every neuron i in every layer i:
jijiijkijk YWW
The actions in steps 2 through 6 will be repeated for every training sample pattern p, and repeated for these sets until the root mean square (RMS) of output errors is minimized.
MN
jpjpjp OTE
1
2)(
Generalization vs. specialization Optimal number of hidden neurons
Too many hidden neurons : you get an over fit, training set is memorized, thus making the network useless on new data sets
Not enough hidden neurons:network is unable to learn problem concept
Overtraining: Too much examples, the ANN memorizes the
examples instead of the general idea Generalization vs. specialization trade-off
K-fold cross validation is often used
Unsupervised method No help from the outside No training data, no information available on
the desired output Learning by doing Used to pick out structure in the input:
Clustering Reduction of dimensionality compression
Kohonen’s Learning Law (Self-Organization Map) Winner takes all (only update weights of winning
neuron)
SOM algorithm
An example: Kohonen Network.
Reinforcement learning Teacher: training data The teacher scores the performance of the
training examples Use performance score to shuffle weights
‘randomly’ Relatively slow learning due to
‘randomness’
Anatomy of ANN learning algorithm
Pros & Cons of ANNPros:
A neural network can perform tasks that a linear program can not.
When an element of the neural network fails, it can continue without any problem by their parallel nature.
A neural network learns and does not need to be reprogrammed.
It can be implemented in any application.
Cons : The neural network
needs training to operate.
The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated.
Requires high processing time for large neural networks.
Summary The capability of ANN representations
Training a single perceptron
Training neural networks
The ability of Generalization vs. specialization should be memorized