NEURAL Network Design Training

1

CogNovaTechnologies

Network Design Network Design & Training& Training

2

CogNovaTechnologies

Network Design & Network Design & Training IssuesTraining Issues

Design:Design: Architecture of networkArchitecture of network Structure of artificial neuronsStructure of artificial neurons Learning rules Learning rules

Training:Training: Ensuring optimum trainingEnsuring optimum training Learning parametersLearning parameters Data preparationData preparation and more ....and more ....

3

CogNovaTechnologies

Network DesignNetwork Design

4

CogNovaTechnologies


Architecture of the network: Architecture of the network: How many nodes?How many nodes? Determines number of network weightsDetermines number of network weights How many layers? How many layers? How many nodes per layer?How many nodes per layer?

Input Layer Hidden Layer Output LayerInput Layer Hidden Layer Output Layer

Automated methods: Automated methods: – augmentation (cascade correlation)augmentation (cascade correlation)– weight pruning and eliminationweight pruning and elimination

5

CogNovaTechnologies


Architecture of the network: Architecture of the network: Connectivity?Connectivity?

Concept of model or Concept of model or hypothesishypothesis space space Constraining the number of hypotheses:Constraining the number of hypotheses:

– selective connectivityselective connectivity– shared weightsshared weights– recursive connectionsrecursive connections

6

CogNovaTechnologies

Network DesignNetwork DesignStructure of artificial neuron nodesStructure of artificial neuron nodes

Choice of input integration:Choice of input integration:– summed, squared and summedsummed, squared and summed– multipliedmultiplied

Choice of activation (transfer) function:Choice of activation (transfer) function:– sigmoid (logistic)sigmoid (logistic)– hyperbolic tangenthyperbolic tangent– GaussianGaussian– linearlinear– soft-maxsoft-max

7

CogNovaTechnologies


Selecting a Learning Rule Selecting a Learning Rule Generalized delta rule Generalized delta rule (steepest (steepest

descent)descent) Momentum descentMomentum descent Advanced weight space search Advanced weight space search

techniquestechniques Global Error function can also varyGlobal Error function can also vary

- normal - quadratic - cubic- normal - quadratic - cubic

8

CogNovaTechnologies

Network TrainingNetwork Training

9

CogNovaTechnologies

Network TrainingNetwork TrainingHow do you ensure that a network has How do you ensure that a network has

been well trained?been well trained? Objective: Objective: To achieve good generalizationTo achieve good generalization

accuracy on new examples/cases accuracy on new examples/cases Establish a maximum acceptable error rate Establish a maximum acceptable error rate Train the network using a Train the network using a validation test set validation test set to to

tune ittune it Validate the trained network against a separate Validate the trained network against a separate

test set which is usually referred to as a test set which is usually referred to as a production test setproduction test set

10

CogNovaTechnologies


Available Examples

TrainingSet

ProductionSet

Approach #1: Approach #1: Large SampleLarge SampleWhen the amount of available data is When the amount of available data is

large ...large ...

70% 30%

Used to develop one ANN modelComputeTest error

Divide randomly

Generalization error= test error

TestSet

11

CogNovaTechnologies


Available Examples

TrainingSet

Pro.Set

Approach #2: Approach #2: Cross-validationCross-validationWhen the amount of available data is When the amount of available data is

small ...small ...

10%90%

Repeat 10 times

Used to develop 10 different ANN models Accumulatetest errors

Generalization errordetermined by meantest error and stddev

TestSet

12

CogNovaTechnologies

Network TrainingNetwork TrainingHow do you select between two ANN How do you select between two ANN

designs ? designs ? A statistical test of hypothesis is required to A statistical test of hypothesis is required to

ensure that a significant difference exists ensure that a significant difference exists between the error rates of two ANN modelsbetween the error rates of two ANN models

If If Large Sample Large Sample method has been used method has been used then apply then apply McNemar’s test*McNemar’s test*

If If Cross-validationCross-validation then use a then use a paired paired tt test test for difference of two proportionsfor difference of two proportions

*We assume a classification problem, if this is function approximation then use paired t test for difference of means

13

CogNovaTechnologies


Mastering ANN ParametersMastering ANN Parameters TypicalTypical RangeRange

learning rate - learning rate - 0.1 0.01 - 0.990.1 0.01 - 0.99

momentum - momentum - 0.8 0.1 - 0.90.8 0.1 - 0.9

weight-cost - weight-cost - 0.1 0.001 - 0.50.1 0.001 - 0.5

Fine tuning : Fine tuning : -- adjust individual parameters at adjust individual parameters at each node and/or connection weighteach node and/or connection weight– automatic adjustment during trainingautomatic adjustment during training

14

CogNovaTechnologies


Network weight initializationNetwork weight initialization Random initial values +/- some rangeRandom initial values +/- some range Smaller weight values for nodes with Smaller weight values for nodes with

many incoming connectionsmany incoming connections Rule of thumb: initial weight range Rule of thumb: initial weight range

should be approximatelyshould be approximately

coming into a nodecoming into a node

1

# weights

15

CogNovaTechnologies


Typical Problems During TrainingTypical Problems During TrainingE

# iter

E

# iter

E

# iter

Would like:

But sometimes:

Steady, rapid declinein total error

Seldom a local minimum - reduce learning or momentum parameter

Reduce learning parms.- may indicate data is not learnable

16

CogNovaTechnologies

Data PreparationData Preparation

17

CogNovaTechnologies

Data PreparationData PreparationGarbage in Garbage out Garbage in Garbage out

The quality of results relates directly to The quality of results relates directly to quality of the dataquality of the data

50%-70% of ANN development time will be 50%-70% of ANN development time will be spent on data preparationspent on data preparation

The three steps of data preparation:The three steps of data preparation:– Consolidation and CleaningConsolidation and Cleaning– Selection and PreprocessingSelection and Preprocessing– Transformation and EncodingTransformation and Encoding

18

CogNovaTechnologies


Data Types and ANNsData Types and ANNs Three basic data types:Three basic data types:

– nominal nominal discrete symbolic (discrete symbolic (A, yes, smallA, yes, small))– ordinal ordinal discrete numeric (-5, 3, 24)discrete numeric (-5, 3, 24)– continuouscontinuous numeric (0.23, -45.2, 500.43) numeric (0.23, -45.2, 500.43)

bp ANNs accept only continuous bp ANNs accept only continuous numeric values numeric values (typically 0 - 1 range)(typically 0 - 1 range)

19

CogNovaTechnologies


Consolidation and CleaningConsolidation and Cleaning Determine appropriate input attributes Determine appropriate input attributes Consolidate data into working databaseConsolidate data into working database Eliminate or estimate missing valuesEliminate or estimate missing values Remove Remove outliersoutliers (obvious exceptions) (obvious exceptions) Determine prior probabilities of Determine prior probabilities of

categories and deal with categories and deal with volume biasvolume bias

20

CogNovaTechnologies


Selection and PreprocessingSelection and Preprocessing Select examples random samplingSelect examples random sampling

Consider number of training examples? Consider number of training examples? Reduce attribute dimensionalityReduce attribute dimensionality

– remove redundant and/or correlating attributesremove redundant and/or correlating attributes– combine attributes (sum, multiply, difference)combine attributes (sum, multiply, difference)

Reduce attribute value rangesReduce attribute value ranges– group symbolic discrete valuesgroup symbolic discrete values– quantize continuous numeric values quantize continuous numeric values

mW

21

CogNovaTechnologies

Data PreparationData PreparationTransformation and EncodingTransformation and Encoding

Discrete symbolic or numeric valuesDiscrete symbolic or numeric values Transform to discrete numeric valuesTransform to discrete numeric values Encode the value Encode the value 44 as follows: as follows:

– one-of-N code (one-of-N code (0 1 0 0 00 1 0 0 0) - five inputs) - five inputs– thermometer code ( thermometer code ( 1 1 1 1 01 1 1 1 0) - five inputs) - five inputs– real value (real value (0.40.4)* - one input)* - one input

Consider relationship between valuesConsider relationship between values– ((single, married, divorcesingle, married, divorce) ) vs. vs. ((youth, adult, youth, adult,

seniorsenior))

* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range

22

CogNovaTechnologies


Continuous numeric valuesContinuous numeric values De-correlate example attributes via De-correlate example attributes via

normalization of values:normalization of values:– Euclidean: Euclidean: n = x/sqrt(sum of all x^2)n = x/sqrt(sum of all x^2)– Percentage: Percentage: n = x/(sum of all x)n = x/(sum of all x)– Variance based: Variance based: n = (x - (mean of all x))/variancen = (x - (mean of all x))/variance

Scale values Scale values using a linear transform if data is using a linear transform if data is uniformly distributed or use non-linear (log, power) if uniformly distributed or use non-linear (log, power) if skewed distributionskewed distribution

23

CogNovaTechnologies


Continuous numeric valuesContinuous numeric values

Encode the value Encode the value 1.61.6 as:as:– Single real-valued number (Single real-valued number (0.160.16)* - )* - OK!OK!

– Bits of a binary number (Bits of a binary number (010000010000) - ) - BAD!BAD!

– one-of-N quantized intervals (one-of-N quantized intervals (0 1 0 0 00 1 0 0 0) ) - - NOT GREAT! - discontinuitiesNOT GREAT! - discontinuities

– distributed (fuzzy) overlapping intervals distributed (fuzzy) overlapping intervals ( ( 0.3 0.8 0.1 0.0 0.00.3 0.8 0.1 0.0 0.0) - ) - BEST!BEST!

* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range* Target values should be 0.1 - 0.9 , not 0.0 - 1.0 range

24

CogNovaTechnologies

TUTORIAL #5TUTORIAL #5

Develop and train a BP network on Develop and train a BP network on real-world datareal-world data

25

CogNovaTechnologies

Post-Training Post-Training AnalysisAnalysis

26

CogNovaTechnologies


Examining the neural net model:Examining the neural net model: Visualizing the constructed modelVisualizing the constructed model Detailed network analysisDetailed network analysis

Sensitivity analysis of input Sensitivity analysis of input attributes:attributes:

Analytical techniques Analytical techniques Attribute eliminationAttribute elimination

27

CogNovaTechnologies


Visualizing the Constructed ModelVisualizing the Constructed Model Graphical tools can be used to display Graphical tools can be used to display

output response as selected input output response as selected input variables are changedvariables are changed

Response

Size

Temp

28

CogNovaTechnologies


Detailed network analysisDetailed network analysis Hidden nodes form internal Hidden nodes form internal

representationrepresentation Manual analysis of weight values often Manual analysis of weight values often

difficult - graphics very helpfuldifficult - graphics very helpful Conversion to equation, executable codeConversion to equation, executable code Automated ANN to symbolic logic Automated ANN to symbolic logic

conversion is a hot area of researchconversion is a hot area of research

29

CogNovaTechnologies


Sensitivity analysis of input attributesSensitivity analysis of input attributes Analytical techniques Analytical techniques

– factor analysisfactor analysis– network weight analysisnetwork weight analysis

Feature (attribute) eliminationFeature (attribute) elimination– forward feature eliminationforward feature elimination– backward feature eliminationbackward feature elimination

30

CogNovaTechnologies

The ANN Application The ANN Application Development ProcessDevelopment Process

Guidelines for using neural networksGuidelines for using neural networks1. 1. Try the best existing method firstTry the best existing method first

2. 2. Get a Get a bigbig training settraining set

3. 3. Try a net without hidden unitsTry a net without hidden units

4. 4. Use a sensible coding for input variablesUse a sensible coding for input variables

5. 5. Consider methods of constraining networkConsider methods of constraining network

6. 6. Use a test set to prevent over-trainingUse a test set to prevent over-training

7. 7. Determine confidence in generalization Determine confidence in generalization through cross-validationthrough cross-validation

31

CogNovaTechnologies

Example Example ApplicationsApplications

Pattern Recognition Pattern Recognition (reading zip codes)(reading zip codes) Signal Filtering Signal Filtering (reduction of radio noise)(reduction of radio noise) Data Segmentation Data Segmentation (detection of seismic (detection of seismic

onsets)onsets) Data Compression Data Compression (TV image (TV image

transmission)transmission) Database Mining Database Mining (marketing, finance (marketing, finance

analysis)analysis) Adaptive Control Adaptive Control (vehicle guidance)(vehicle guidance)

32

CogNovaTechnologies

Pros and Cons of Pros and Cons of Back-PropBack-Prop

33

CogNovaTechnologies

Pros and Cons Pros and Cons of Back-Propof Back-Prop

Cons:Cons: Local minimum - but not generally a concernLocal minimum - but not generally a concern Seems biologically implausibleSeems biologically implausible Space and time complexity:Space and time complexity:

lengthy training timeslengthy training times It’s a black box! It’s a black box! I can’t see how it’s making I can’t see how it’s making

decisions?decisions? Best suited for supervised learningBest suited for supervised learning Works poorly on dense data with few input Works poorly on dense data with few input

variablesvariables

O W( )3

34

CogNovaTechnologies

Pros and Cons Pros and Cons of Back-Propof Back-Prop

Pros:Pros: Proven training method for multi-layer netsProven training method for multi-layer nets Able to learn any arbitrary function (Able to learn any arbitrary function (XORXOR)) Most useful for non-linear mappingsMost useful for non-linear mappings Works well with noisy dataWorks well with noisy data Generalizes well given sufficient examplesGeneralizes well given sufficient examples Rapid recognition speedRapid recognition speed Has inspired many new learning algorithmsHas inspired many new learning algorithms

35

CogNovaTechnologies

Other Networks Other Networks and and

Advanced IssuesAdvanced Issues

36

CogNovaTechnologies

Other Networks andOther Networks and Advanced Issues Advanced Issues

Variations in feed-forward architectureVariations in feed-forward architecture– jump connections to output nodesjump connections to output nodes– hidden nodes that vary in structurehidden nodes that vary in structure

Recurrent networks with feedback Recurrent networks with feedback connectionsconnections

Probabilistic networksProbabilistic networks General Regression networksGeneral Regression networks Unsupervised self-organizing networksUnsupervised self-organizing networks

37

CogNovaTechnologies

THE ENDTHE END

Thanks for your Thanks for your participation!participation!

38

CogNovaTechnologies

Education

NEURAL Network Design Training