2/24

Analogy between neural nets

and the nervous system

Neural nets based on nodes and connectionsAnalogous to a nerve cell - 1012 neurons and 1014 synapticconnections in the human brain

Nodes have input signalsDendrites carry an impulse to the neuron

Nodes have one output signalAxons carry signal out of neuron and synapses are localregions where signals are transmitted from the axon of oneneuron to dendrites of another.

Input signal weights are summed at each nodeNerve impulses are binary; they are go or no go.Neurons sum up the incoming signal and fire if a thresholdvalue is reached.


3/24

History of Neural Networks

Attempts to mimic the human brain date back towork in the 1930s, 1940s, & 1950s by Alan Turing,Warren McCullough, Walter Pitts, Donald Hebb and

James von Neumann1957 Rosenblatt at Cornell developed Perceptron, ahardware neural net for character recognition

1959 Widrow and Hoff at Stanford developed

Adaline for adaptive control of noise on telephonelines


4/24

History of Neural Networks

1960s & 1970s period hindered by inflated claimsand criticisms of the early work

1982 Hopfield, a Caltech physicist, mathematically

tied together many of the ideas from previousresearch.

Since then, growth has exploded. Over 80% ofFortune 500 have neural net R&D programs.

Thousands of research papers Commercial software applications


5/24

Neural Network Layers

OutputLayer

InputLayer

HiddenLayers


6/24

Mathematical Model of a Node

Incomingactivation

Outgoingactivation

a0

ai

an

wi

wn

w0

Adder Function_

Threshold Function_


7/24

Mathematical Model of a Node Adder Fn

Incomingactivation

Outgoingactivation

a0

ai

an

wi

wn

w0

Adder Function_

Threshold Function_

x ai

n

wi


8/24

Mathematical Model of a Node Threshold fn

Incomingactivation

Outgoingactivation

a0

ai

an

wi

wn

w0

Adder Function_

Threshold Function_

f(x) 1 if x > 0,

0 if x


9/24

How Neural Nets Work

Implementation Hardware - electronic circuits mimic neurons

Software - linkages of nodes, inputs, and outputs can beprogrammed

Uses a trial and error method of learning Finds patterns associating inputs and outputs using a large

set of training data where both inputs and outputs are known(e.g. use the intermarket relationship among the Standard &Poors 500 index, 30-year Treasury bonds, and the

commodity research bureau index to predict direction of theS&P 500 index trend 5 weeks into the future)

Initially begins with random weights and corrects mistakes bymodifying the weight that it has given each input item.


10/24


Feedback network

A given nodes output can be transmitted back to itself orto other previous nodes as another input

Feedforward network

All outputs only go forward

Parallel distributed processingversus serial symbolic processing


11/24

How Neural Nets Work:

Learning

Tradeoff between training speed and weight quality if too fast, weights may not be effective for new data if too slow, network may memorize the data and not predict

well for new data

Models and rules for learning are based in biology and

psychology Hebbs rule - changes in synaptic strengths are proportional to

neuron activation (Hebb 1949). Basis for neural nets.

Grossberg learning - self-training and self-organization allownet to adapt to changes in input data over time (Grossberg1982)

Kohonens learning law - two-layer network with contentaddressable associative memory for unsupervised learning(Kohonen 1984)


12/24


Unsupervised Learning

Nets are self-learning BAM (bi-directional associative memory) used for OCR, speller

checker, voice recognition

Weight adjustments are not from comparison with known values

Based on the input pattern, only weights for the winning node or afew nodes are modified

Wij Ai Aj where:

Ai is the a ctivation of the ith node in one la ye r

Aj is the a ctiva tion of the j th node in another laye rWij is the connection strength between two no


13/24


Supervised Learning

Gradually train weights to meet desired outputs inputs presented to the network weights adjusted to achieve desired output for training data corrections based on difference between actual and desired output

which is computed for each training cycle if average error is within tolerance- stop, else continue training weights are locked in and the network is ready to use

Wij Ai Cj (

,

- Bj) where is the learning rate,

A i is the activation of the ith node in one layerBj is actual activation of the jth node in recalled pattern,

Cj is desired activation of the jth node, andWij is the connection strength between two nodes


14/24


Back Propagation

Input is presented to net and output is produced

Compute differences between actual and desired outputs

Adjust output layer weights using discrepancies betweendesired outputs and actual outputs

Then adjust hidden layer weights (if there is a hidden layer)Then adjust input layer weights

Repeat steps 1 - 5 until desired accuracy level is achieved

Advantage:

ability to learn any arbitrarily complex nonlinear mapping

Disadvantages:

extremely long - potentially infinite - learning times

Speed up using parallel hardware


15/24

Neural Network Layers:

Back Propagation

OutputLayer

InputLayer

HiddenLayers


16/24

Common Questions About

Neural Networks

What is a hidden layer? Group of nodes between the input and output layer

Hidden layers increase the ability of the networkto memorize the data

How many hidden layers should I use?As problem complexity increases, number of hidden layers should also

increase

Start with none. Add hidden layers one at a time if training or testingresults do not achieve target accuracy levels

What is a hidden node?A node in a hidden layer is called a hidden node

A hidden node contains much of the knowledge in the network andact as filters to remove noise moving through the network


17/24

Common Questions About

Neural Networks

How do I know if network modifications are needed?

Low accuracy of training or test data indicates that a new hidden layeror more hidden nodes are needed

if number of hidden nodes exceeds number of inputs and outputs,

then add another hidden layer decrease the total hidden nodes by 50% in each successive hiddenlayer [ if 10 nodes in first layer, then use 5 in the second layer and2 in the third layer ]

If Braincel performs well on the Training and Test ranges, but poorly

on new records, then it is treating each record as a special case andhas memorized the data

use fewer hidden nodes or remove the hidden layer

Could also need more training cases per connection


18/24

Application Examples:

Finance and Banking

Firm failure prediction (Koster, Sondak, & Bourbia 1991;Wilson & Sharda 1993)

Bank failure prediction (Cinar & Lash 1992; Tam & King1992)

Bond rating (Utans & Moody 1991)

Mortgage credit approval (Reilly et al. 1990)

Credit card fraud prevention at Chase Manhattan Bank,American Express, and Mellon Bank examine unusual credit-charge

patterns over a history of usage and compute a fraud potentialrating. [ For example, the Fraud Detection System by Nestor Corp.and a system by HNC Inc. (Rochester 1990) ].

Takeover target prediction (Sen & Gibbs 1992)


19/24


Finance and Banking

Country risk rating for early warning of financialrisk(Roy and Cosset 1990)

Stock price prediction (Fishman, Barr, & Loick 1991; Yoon& Stein 1991)

Commodity, futures, and currency trading atMerrill Lynch, Salomon Brothers, ShearsonLehman Brothers, & the World Bank. Citibankclaims 25% returns in currency trading using GAtrained neural nets (Business Week March 2, 1992)

Asset allocation (Steiger & Sharda 1991)

Corporate merger prediction (Sen, Oliver, & Sen 1992)


20/24


Manufacturing

Quality control

Predict tool breakage in milling operations

Force and / or wear analysis

Mechanical equipment fault diagnosis

Process management and control - maintainefficiency of electric arc furnaces in steel-making;uniformity in pulp & paper process management


21/24


Marketing

Customer mailing list management (Hall 1992)Spiegel Inc. mail order catalog targets saved $1 millionfrom reduced costs and increased sales (Business Week March 2,1992)

Airline seating allocation and passenger demand forNationair Canada and US Air (IEEE Expert Dec 1992)

Customer purchasing behavior and merchandising-mixstrategies

Hotel room pricing - yield management (Relihan, W. 1989)


22/24


Medicine

Analysis of electrocardiogram dataImproved prosthetic devices

Pap smear detection of cancerous cells to drasticallyreduce errors

RNA & DNA sequencing in proteinsMedical image enhancement

Drug development without animal testing


23/24


Pattern recognition

Signature validation (Francett 1989; Mighell 1989)

OCR scanning for machine printed character recognition;also used at Post Office to sort mail

Hand printed character recognition (i.e. insurance forms)to reduce clerical data entry costs

Cursive handwriting recognition (i.e. for pen-basedcomputing)

Airport bomb detection (1989 JFK International in NY)analyzes gamma ray patterns of various objects after beingstruck with neutrons


24/24

Summary

Parallel distributed processing (especially a hardware basedneural net) is a good approach for complex patternrecognition(e.g. image recognition, forecasting, text retrieval, optimization)

Less need to determine relevant factors a prioriwhenbuilding a neural network

Lots of training data are needed

High tolerance to noisy data. In fact, noisy data enhancepost-training performance

Difficult to verify or discern learned relationships even withspecial knowledge extraction utilities developed for neuralnets

Documents

Introduction of artificial neural network