Upload
emil-daniel
View
219
Download
3
Embed Size (px)
Citation preview
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization ProblemTerm presentation for CSC7333 Machine Learning Xiaoxi Xu
May 3, 2006
Outline
Problem description & analysis Neural Network Genetic Algorithm Implementations & experiments Results Remarks Conclusion
Problem Description
We have plentiful data gathered over time; We are not aware the underlying relationship
between the data input ( some are human controllable) and its output;
We expect to minimize or maximize the output in the future;
We hope to know that what kind of input would generate a minimum or maximum output, so that we could adjust the input to achieve our end.
Problem Analysis
The characteristics of this problem are: a .Unknown exact nature of relationship
between input and output, likely non-linear b. Inputs are likely in N-dimension (N>10)
In addition…. d. The global optimum is expected to
obtain
Solution for Function Approximation This solution should meet the following
requirements:
a. Have the parallel structure to handle N dimension variables
b. Be able to model the nonlinear relation between variables and their responses
c. Had better to be fault-tolerant (noisy data could appear when data set is large)
Solution for Function Approximation (Cont’d) Neural Network could be one. Why? (Any function can be approximated to arbitrary accuracy by a network with 3
layers with linear transfer function in output layer, and sigmoid function in hidden layer)
We want to train a NN as such: Topology: Multiple Layer Network Connection Type: Feed-forward (From a mathematical point of
view, a feed-forward neural network is a function. It takes an input and produces an output.)
Transfer Function: Logsigmoid, linear Training Algorithm: Back-propagation
Solution for Optimization Problem To solve it mathematically?
In mathematics, LOCAL optima can ONLY be found for functions with good properties such as convex or unimodel. These line search methods include Conjugate gradient descent, quasi-Newton and so on.
This solution should meet the following requirements:
a. Be able to recognize the expression of the objective function
b. Be able to solve the function c. Should have a better chance to find a global
optimum
Solution for Optimization Problem (Cont’d) Genetic Algorithm could be one. Why? ( GA have most been applied to optimization problems)
We can use GA as such: a. Representation ( Any real number can be represented
by a string of numerical numbers in base 10)
b. Fitness function ( Neural Net)
c. Genetic operators ( Crossover,Selection,Mutation)
Implementation & Experiment of NN for 2D Function Approximation Initialization with random selected weights Multiple layer with one hidden layer of 20 hidden nodes Transfer function: sigmoid function (hidden layer),
purelin function( output layer) Back-propagation algorithm Learning rate: 0.05 Momentum: 0.1 Stop criteria: 1.MSE below 0.01% 2.Exceed epochs ( training times) 100 Test function: tan(sin(x)) - sin(tan(x)) Training data: [-4,4] -4, -3.6, -3.2, -2.8
Implementation & Experiment of Genetic Algorithm for 2D Function Optimization Representation: A string of numerical number in base 10 to
represent a real number and its sign; Random initialization; Range: [-4,4]; Population size: 25; Chromosome length: 6; One point crossover pr: 0.7; Mutation pr: 0.3; Roulette wheel selection: preference to best-fit individual ; Fitness function: Neural Network (represented by
inputs,weights and biases as follows: weight_oh*sigmoid(weight_hi* input + bias_hi*1)+ bias_oh*1 Elitism: best-fit individual goes to next generation; Stop criteria 1. Value of the fitness function changes less than 0.01 after 10
consecutive generations 2. Maximum generation is 30
Experiment Result-2D
Real function--red solid line Approximation by NN –blue dash line
Optimal by GA for approximation function --- magnate star
Implementation of NN for 3D Function Approximation One hidden layer of 30 hidden nodes Learning rate: 0.06 Momentum: 0.5 Stop criteria: 1.MSE below 0.01% 2.Exceed epochs ( training times) 250 Test function: 1.85*sin(x)*exp(-3*(y-1.5)^2)+0.7*x*exp(-4*(x-1.2)^2)-1.4*cos(x+
y)*exp(-5*(y+1.3)^2)-1.9*exp(-8*(x+0.5)^2) Training data: [-1,3] (difference between numbers is 0.17)
Implementation & Experiment of Genetic Algorithm for 3D Function Optimization Random initialization Range: [-1,3] Population size: 40 Chromosome length: 12 One point crossover pr: 0.8 Mutation pr: 0.6 Stop criteria: 1. Fitness function changes less than 0.01,after 10 consecutive
generations 2. Maximum generation is 100
Remarks How to adjust some parameters for NN?
It’s really application-dependent.
With regard to my experiments: Learning rate:
too small, no apparent decrease of MSE over long while
too large, MSE jumped between decrease and increase Momentum:
compared to smaller, larger value could model better
but keep it in a proper degree Hidden nodes:
keep it in a proper degree,otherwise will be overfitting
more nodes, better performance
but our computer was down before we knew how best the performance
would be Epochs:
Longer, better performance, but avoid overfittng
trade off between the accuracy and the training time
Remarks (cont’d) How to adjust some parameters for GA? Population size: bigger is better, keep in a proper degree, otherwise will be overfitting Crossover probability: typically, [0.6,0.9] it works in my experiment Mutation probability: typically, [1/pop_size,1/chromosome_length] larger value in my experiment Generation size: larger, better performance,but avoid overfitting trade off between the accuracy and the time
Random initialization influences on the performance success We used random selected weights in training NN and used random selected
individuals for the first generation of GA. We found that sometimes, random initialization value determines the success of the performance
Room for Improvements
Random data would be used to train the neural network with noisy data added
More complex examples would be tested for the performance of NN
Building up more knowledge on adjusting NN & GA parameters
Error surface would be shown Time complexity would be analyzed
Conclusion
Integrating GA&NN by using NN as the fitness function of GA could approximate a good global optimum pretty close to that found by GA using the real function
GA performs well in searching for the global optimum no matter what the fitness function would be
In practical, Multiple Layer NNs with one hidden layer have an overall good performance in function approximation, but sometimes they still have the difficulty
The random initialization value could sometimes determine the performance success