Upload
escom
View
3.875
Download
2
Tags:
Embed Size (px)
Citation preview
1
CogNovaTechnologies
Network Design Network Design & Training& Training
2
CogNovaTechnologies
Network Design & Network Design & Training IssuesTraining Issues
Design:Design: Architecture of networkArchitecture of network Structure of artificial neuronsStructure of artificial neurons Learning rules Learning rules
Training:Training: Ensuring optimum trainingEnsuring optimum training Learning parametersLearning parameters Data preparationData preparation and more ....and more ....
3
CogNovaTechnologies
Network DesignNetwork Design
4
CogNovaTechnologies
Network DesignNetwork Design
Architecture of the network: Architecture of the network: How many nodes?How many nodes? Determines number of network weightsDetermines number of network weights How many layers? How many layers? How many nodes per layer?How many nodes per layer?
Input Layer Hidden Layer Output LayerInput Layer Hidden Layer Output Layer
Automated methods: Automated methods: – augmentation (cascade correlation)augmentation (cascade correlation)– weight pruning and eliminationweight pruning and elimination
5
CogNovaTechnologies
Network DesignNetwork Design
Architecture of the network: Architecture of the network: Connectivity?Connectivity?
Concept of model or Concept of model or hypothesishypothesis space space Constraining the number of hypotheses:Constraining the number of hypotheses:
– selective connectivityselective connectivity– shared weightsshared weights– recursive connectionsrecursive connections
6
CogNovaTechnologies
Network DesignNetwork DesignStructure of artificial neuron nodesStructure of artificial neuron nodes
Choice of input integration:Choice of input integration:– summed, squared and summedsummed, squared and summed– multipliedmultiplied
Choice of activation (transfer) function:Choice of activation (transfer) function:– sigmoid (logistic)sigmoid (logistic)– hyperbolic tangenthyperbolic tangent– GaussianGaussian– linearlinear– soft-maxsoft-max
7
CogNovaTechnologies
Network DesignNetwork Design
Selecting a Learning Rule Selecting a Learning Rule Generalized delta rule Generalized delta rule (steepest (steepest
descent)descent) Momentum descentMomentum descent Advanced weight space search Advanced weight space search
techniquestechniques Global Error function can also varyGlobal Error function can also vary
- normal - quadratic - cubic- normal - quadratic - cubic
8
CogNovaTechnologies
Network TrainingNetwork Training
9
CogNovaTechnologies
Network TrainingNetwork TrainingHow do you ensure that a network has How do you ensure that a network has
been well trained?been well trained? Objective: Objective: To achieve good generalizationTo achieve good generalization
accuracy on new examples/cases accuracy on new examples/cases Establish a maximum acceptable error rate Establish a maximum acceptable error rate Train the network using a Train the network using a validation test set validation test set to to
tune ittune it Validate the trained network against a separate Validate the trained network against a separate
test set which is usually referred to as a test set which is usually referred to as a production test setproduction test set
10
CogNovaTechnologies
Network TrainingNetwork Training
Available Examples
TrainingSet
ProductionSet
Approach #1: Approach #1: Large SampleLarge SampleWhen the amount of available data is When the amount of available data is
large ...large ...
70% 30%
Used to develop one ANN modelComputeTest error
Divide randomly
Generalization error= test error
TestSet
11
CogNovaTechnologies
Network TrainingNetwork Training
Available Examples
TrainingSet
Pro.Set
Approach #2: Approach #2: Cross-validationCross-validationWhen the amount of available data is When the amount of available data is
small ...small ...
10%90%
Repeat 10 times
Used to develop 10 different ANN models Accumulatetest errors
Generalization errordetermined by meantest error and stddev
TestSet
12
CogNovaTechnologies
Network TrainingNetwork TrainingHow do you select between two ANN How do you select between two ANN
designs ? designs ? A statistical test of hypothesis is required to A statistical test of hypothesis is required to
ensure that a significant difference exists ensure that a significant difference exists between the error rates of two ANN modelsbetween the error rates of two ANN models
If If Large Sample Large Sample method has been used method has been used then apply then apply McNemar’s test*McNemar’s test*
If If Cross-validationCross-validation then use a then use a paired paired tt test test for difference of two proportionsfor difference of two proportions
*We assume a classification problem, if this is function approximation then use paired t test for difference of means
13
CogNovaTechnologies
Network TrainingNetwork Training
Mastering ANN ParametersMastering ANN Parameters TypicalTypical RangeRange
learning rate - learning rate - 0.1 0.01 - 0.990.1 0.01 - 0.99
momentum - momentum - 0.8 0.1 - 0.90.8 0.1 - 0.9
weight-cost - weight-cost - 0.1 0.001 - 0.50.1 0.001 - 0.5
Fine tuning : Fine tuning : -- adjust individual parameters at adjust individual parameters at each node and/or connection weighteach node and/or connection weight– automatic adjustment during trainingautomatic adjustment during training
14
CogNovaTechnologies
Network TrainingNetwork Training
Network weight initializationNetwork weight initialization Random initial values +/- some rangeRandom initial values +/- some range Smaller weight values for nodes with Smaller weight values for nodes with
many incoming connectionsmany incoming connections Rule of thumb: initial weight range Rule of thumb: initial weight range
should be approximatelyshould be approximately
coming into a nodecoming into a node
1
# weights
15
CogNovaTechnologies
Network TrainingNetwork Training
Typical Problems During TrainingTypical Problems During TrainingE
# iter
E
# iter
E
# iter
Would like:
But sometimes:
Steady, rapid declinein total error
Seldom a local minimum - reduce learning or momentum parameter
Reduce learning parms.- may indicate data is not learnable
16
CogNovaTechnologies
Data PreparationData Preparation
17
CogNovaTechnologies
Data PreparationData PreparationGarbage in Garbage out Garbage in Garbage out
The quality of results relates directly to The quality of results relates directly to quality of the dataquality of the data
50%-70% of ANN development time will be 50%-70% of ANN development time will be spent on data preparationspent on data preparation
The three steps of data preparation:The three steps of data preparation:– Consolidation and CleaningConsolidation and Cleaning– Selection and PreprocessingSelection and Preprocessing– Transformation and EncodingTransformation and Encoding
18
CogNovaTechnologies
Data PreparationData Preparation
Data Types and ANNsData Types and ANNs Three basic data types:Three basic data types:
– nominal nominal discrete symbolic (discrete symbolic (A, yes, smallA, yes, small))– ordinal ordinal discrete numeric (-5, 3, 24)discrete numeric (-5, 3, 24)– continuouscontinuous numeric (0.23, -45.2, 500.43) numeric (0.23, -45.2, 500.43)
bp ANNs accept only continuous bp ANNs accept only continuous numeric values numeric values (typically 0 - 1 range)(typically 0 - 1 range)
19
CogNovaTechnologies
Data PreparationData Preparation
Consolidation and CleaningConsolidation and Cleaning Determine appropriate input attributes Determine appropriate input attributes Consolidate data into working databaseConsolidate data into working database Eliminate or estimate missing valuesEliminate or estimate missing values Remove Remove outliersoutliers (obvious exceptions) (obvious exceptions) Determine prior probabilities of Determine prior probabilities of
categories and deal with categories and deal with volume biasvolume bias
20
CogNovaTechnologies
Data PreparationData Preparation
Selection and PreprocessingSelection and Preprocessing Select examples random samplingSelect examples random sampling
Consider number of training examples? Consider number of training examples? Reduce attribute dimensionalityReduce attribute dimensionality
– remove redundant and/or correlating attributesremove redundant and/or correlating attributes– combine attributes (sum, multiply, difference)combine attributes (sum, multiply, difference)
Reduce attribute value rangesReduce attribute value ranges– group symbolic discrete valuesgroup symbolic discrete values– quantize continuous numeric values quantize continuous numeric values
mW
21
CogNovaTechnologies
Data PreparationData PreparationTransformation and EncodingTransformation and Encoding
Discrete symbolic or numeric valuesDiscrete symbolic or numeric values Transform to discrete numeric valuesTransform to discrete numeric values Encode the value Encode the value 44 as follows: as follows:
– one-of-N code (one-of-N code (0 1 0 0 00 1 0 0 0) - five inputs) - five inputs– thermometer code ( thermometer code ( 1 1 1 1 01 1 1 1 0) - five inputs) - five inputs– real value (real value (0.40.4)* - one input)* - one input
Consider relationship between valuesConsider relationship between values– ((single, married, divorcesingle, married, divorce) ) vs. vs. ((youth, adult, youth, adult,
seniorsenior))
* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
22
CogNovaTechnologies
Data PreparationData PreparationTransformation and EncodingTransformation and Encoding
Continuous numeric valuesContinuous numeric values De-correlate example attributes via De-correlate example attributes via
normalization of values:normalization of values:– Euclidean: Euclidean: n = x/sqrt(sum of all x^2)n = x/sqrt(sum of all x^2)– Percentage: Percentage: n = x/(sum of all x)n = x/(sum of all x)– Variance based: Variance based: n = (x - (mean of all x))/variancen = (x - (mean of all x))/variance
Scale values Scale values using a linear transform if data is using a linear transform if data is uniformly distributed or use non-linear (log, power) if uniformly distributed or use non-linear (log, power) if skewed distributionskewed distribution
23
CogNovaTechnologies
Data PreparationData PreparationTransformation and EncodingTransformation and Encoding
Continuous numeric valuesContinuous numeric values
Encode the value Encode the value 1.61.6 as:as:– Single real-valued number (Single real-valued number (0.160.16)* - )* - OK!OK!
– Bits of a binary number (Bits of a binary number (010000010000) - ) - BAD!BAD!
– one-of-N quantized intervals (one-of-N quantized intervals (0 1 0 0 00 1 0 0 0) ) - - NOT GREAT! - discontinuitiesNOT GREAT! - discontinuities
– distributed (fuzzy) overlapping intervals distributed (fuzzy) overlapping intervals ( ( 0.3 0.8 0.1 0.0 0.00.3 0.8 0.1 0.0 0.0) - ) - BEST!BEST!
* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range
24
CogNovaTechnologies
TUTORIAL #5TUTORIAL #5
Develop and train a BP network on Develop and train a BP network on real-world datareal-world data
25
CogNovaTechnologies
Post-Training Post-Training AnalysisAnalysis
26
CogNovaTechnologies
Post-Training Post-Training AnalysisAnalysis
Examining the neural net model:Examining the neural net model: Visualizing the constructed modelVisualizing the constructed model Detailed network analysisDetailed network analysis
Sensitivity analysis of input Sensitivity analysis of input attributes:attributes:
Analytical techniques Analytical techniques Attribute eliminationAttribute elimination
27
CogNovaTechnologies
Post-Training Post-Training AnalysisAnalysis
Visualizing the Constructed ModelVisualizing the Constructed Model Graphical tools can be used to display Graphical tools can be used to display
output response as selected input output response as selected input variables are changedvariables are changed
Response
Size
Temp
28
CogNovaTechnologies
Post-Training Post-Training AnalysisAnalysis
Detailed network analysisDetailed network analysis Hidden nodes form internal Hidden nodes form internal
representationrepresentation Manual analysis of weight values often Manual analysis of weight values often
difficult - graphics very helpfuldifficult - graphics very helpful Conversion to equation, executable codeConversion to equation, executable code Automated ANN to symbolic logic Automated ANN to symbolic logic
conversion is a hot area of researchconversion is a hot area of research
29
CogNovaTechnologies
Post-Training Post-Training AnalysisAnalysis
Sensitivity analysis of input attributesSensitivity analysis of input attributes Analytical techniques Analytical techniques
– factor analysisfactor analysis– network weight analysisnetwork weight analysis
Feature (attribute) eliminationFeature (attribute) elimination– forward feature eliminationforward feature elimination– backward feature eliminationbackward feature elimination
30
CogNovaTechnologies
The ANN Application The ANN Application Development ProcessDevelopment Process
Guidelines for using neural networksGuidelines for using neural networks1. 1. Try the best existing method firstTry the best existing method first
2. 2. Get a Get a bigbig training settraining set
3. 3. Try a net without hidden unitsTry a net without hidden units
4. 4. Use a sensible coding for input variablesUse a sensible coding for input variables
5. 5. Consider methods of constraining networkConsider methods of constraining network
6. 6. Use a test set to prevent over-trainingUse a test set to prevent over-training
7. 7. Determine confidence in generalization Determine confidence in generalization through cross-validationthrough cross-validation
31
CogNovaTechnologies
Example Example ApplicationsApplications
Pattern Recognition Pattern Recognition (reading zip codes)(reading zip codes) Signal Filtering Signal Filtering (reduction of radio noise)(reduction of radio noise) Data Segmentation Data Segmentation (detection of seismic (detection of seismic
onsets)onsets) Data Compression Data Compression (TV image (TV image
transmission)transmission) Database Mining Database Mining (marketing, finance (marketing, finance
analysis)analysis) Adaptive Control Adaptive Control (vehicle guidance)(vehicle guidance)
32
CogNovaTechnologies
Pros and Cons of Pros and Cons of Back-PropBack-Prop
33
CogNovaTechnologies
Pros and Cons Pros and Cons of Back-Propof Back-Prop
Cons:Cons: Local minimum - but not generally a concernLocal minimum - but not generally a concern Seems biologically implausibleSeems biologically implausible Space and time complexity:Space and time complexity:
lengthy training timeslengthy training times It’s a black box! It’s a black box! I can’t see how it’s making I can’t see how it’s making
decisions?decisions? Best suited for supervised learningBest suited for supervised learning Works poorly on dense data with few input Works poorly on dense data with few input
variablesvariables
O W( )3
34
CogNovaTechnologies
Pros and Cons Pros and Cons of Back-Propof Back-Prop
Pros:Pros: Proven training method for multi-layer netsProven training method for multi-layer nets Able to learn any arbitrary function (Able to learn any arbitrary function (XORXOR)) Most useful for non-linear mappingsMost useful for non-linear mappings Works well with noisy dataWorks well with noisy data Generalizes well given sufficient examplesGeneralizes well given sufficient examples Rapid recognition speedRapid recognition speed Has inspired many new learning algorithmsHas inspired many new learning algorithms
35
CogNovaTechnologies
Other Networks Other Networks and and
Advanced IssuesAdvanced Issues
36
CogNovaTechnologies
Other Networks andOther Networks and Advanced Issues Advanced Issues
Variations in feed-forward architectureVariations in feed-forward architecture– jump connections to output nodesjump connections to output nodes– hidden nodes that vary in structurehidden nodes that vary in structure
Recurrent networks with feedback Recurrent networks with feedback connectionsconnections
Probabilistic networksProbabilistic networks General Regression networksGeneral Regression networks Unsupervised self-organizing networksUnsupervised self-organizing networks
37
CogNovaTechnologies
THE ENDTHE END
Thanks for your Thanks for your participation!participation!
38
CogNovaTechnologies