45
CSC 562 Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/2011 1 Business Intelligence

CSC 562 Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

Embed Size (px)

Citation preview

Page 1: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Lecture 9Chapter 6 – Artificial Neural Networks for Data Mining

Business Intelligence

1/31/2011 1 Business Intelligence

Page 2: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Learning Objectives

Understand the concept and definitions of artificial neural networks (ANN)

Know the similarities and differences between biological and artificial neural networks

Learn the different types of neural network architectures

Learn the advantages and limitations of ANN

Understand how backpropagation learning works in feedforward neural networks

1/31/2011 2 Business Intelligence

Page 3: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Learning Objectives

Understand the step-by-step process of how to use neural networks

Appreciate the wide variety of applications of neural networks; solving problem types of Classification Regression Clustering Association Optimization

1/31/2011 3 Business Intelligence

Page 4: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Opening Vignette:(Page 242)

“Predicting Gambling Referenda with Neural Networks”

Using NeuroSolutions, this study developed and tested models to predict community support for commercial gaming.

The study examined the role of factors that contribute to legalization and/or probation of gambling activities using neural networks.

It attempted to use Neural Network technology to predict various counties voting outcome on this subject.

1/31/2011 4 Business Intelligence

Page 5: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Opening Vignette:

On average, the models accurately predicted the voting results for 4 out of every 5 counties (approximately 82% accuracy) on a sample data set. (1287 records of data)

Interestingly, and contrary to popular belief, the counties financial characteristics and age distribution were not found to be significant factors in determining ballot outcome. Dominant factors are identified on Page 244

The study demonstrates that demographic data can be used to accurately predict voting outcomes on controversial issues.

1/31/2011 5 Business Intelligence

Page 6: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562Opening Vignette:Opening Vignette:Predicting Gambling Referenda…Predicting Gambling Referenda…

INPUTLAYER

HIDDENLAYER

OUTPUTLAYER

.

.

....

Voted “yes” or “no” to legalizing gaming

Predicted vs. Actual=

Socio-demographic

Religious

Financial

Other

1/31/2011 6 Business Intelligence

Page 7: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Opening Vignette:

NeuroSolutions is offered by NeuroDimension and offers algorithms for the in the field of artificial intelligence.

NeuroDimension offers NeuroSolutions, NeuroSolutions for Excel, and a Custom Solution Wizard each of which can be downloaded for a free eval.

1/31/2011 7 Business Intelligence

Page 8: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Opening Vignette:

An very good video is offered by the company that explains Neural Network algorithms and the field in general.

Pricing is relatively reasonable for the product. - NS for Excel costs $295

1/31/2011 8 Business Intelligence

Page 9: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Neural Network Concepts(Page 245)

Neural networks (NN): a brain metaphor for information processing– uses artificial neurons (programming constructs that mimic the properties of biological neurons).

Neural computing - pattern recognition methodology for machine learning

Artificial neural network (ANN) – resulting model from neural computing

Many uses for ANN for pattern recognition, forecasting, prediction, and classification finance, marketing, manufacturing, operations, information

systems, and so on

1/31/2011 9 Business Intelligence

Page 10: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

ANN Video

Here is an excellent video offered by NeuroSolutions that provides a good overview of ANN

1/31/2011 10 Business Intelligence

Page 11: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Biological Neural Networks(Page 246)

Soma

Axon

Axon

SynapseSynapse

Dendrites

Dendrites Soma

Two interconnected brain cells (neurons) An axon is a long, slender projection of a nerve cell, or neuron, that conducts electrical

impulses away from the neuron's cell body or soma. Dendrites are branched filaments in nerve cells (neurons). The word dendrite derives

from the Greek word for tree which describes their branching tree-like structure.

1/31/2011 11 Business Intelligence

Page 12: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Biological Neural Networks(Page 246)

Soma

Axon

Axon

SynapseSynapse

Dendrites

Dendrites Soma

Synapse – able to increase or decrease the strength of the connection between neurons and cause excitation or inhibition of a subsequent neuron.

The word "soma" comes from the the Greek word “body”; the soma of a neuron is often called the cell body.

1/31/2011 12 Business Intelligence

Page 13: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Processing Information in ANN(Page 247 Figure 6.3)

w1

w2

wn

x1

x2

xn

.

.

.

Y

Y1

Yn

Y2

Inputs Weights Outputs

.

.

.

Neuron (or PE)

n

iiiWXS

1

)( Sf

SummationTransferFunction

A single neuron (processing element – PE) with inputs and outputs

1/31/2011 13 Business Intelligence

Page 14: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562Biology Analogy(Page 247)

1/31/2011 14 Business Intelligence

Page 15: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Elements of ANN(Page 248-250)

Processing element (PE) – organized in different ways to form the networks structure.

Network architecture Hidden layers - takes input from the previous

layer and converts into outputs for more processing (used in complex problems)

Parallel processing – resembles the way the brain works – different than serial processing in conventional computing

1/31/2011 15 Business Intelligence

Not this ANN

Page 16: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Elements of ANN(Page 248-250)

Network information processing Inputs – single attribute such as age, income

level, etc Outputs – solution to the problem – ie – loan

app “yes” or “no” Connection weights – relative strength of input

data (how important) Summation function – weighted sums of all

input elements entering a PE.

1/31/2011 16 Business Intelligence

Page 17: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Elements of ANN(Figure 6.4 Page 249)

(PE)

(PE)

(PE)

(PE)

(PE)

(PE)

(PE)

TransferFunction

( f )

WeightedSum(S)

x1

x2

x3

Y1

InputLayer

Hidden Layer

Output Layer

Neural Network with One Hidden Layer

1/31/2011 17 Business Intelligence

Page 18: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Elements of ANN

x1

x2 2211 WXWXY

(PE)

(PE)

Y

(PE)

(PE)

w1

w1

w11

w21

w12

w22

w23

x1

x2

Y1

Y2

Y3

2121111 WXWXY

2221212 WXWXY

2323 WXY

(a) Single neuron (b) Multiple neurons

PE: Processing Element (or neuron)

Summation Function for a Single Neuron (a) and Several Neurons (b)

1/31/2011 18 Business Intelligence

Page 19: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Elements of ANN(Page 251)

Transformation (Transfer) Function – activation level of a neuron (based on this level the neuron may or may not produce an output). Computed via Sigmoid (logical activation) function – YT=1/(1+e-Y)

Y is computed via weighted summation Any value less than threshold will not be passed to output (0); anything above

does (1)

X1 = 3

Processing element (PE)

X2 = 1

X3 = 2

W1 = 0.2

W2 = 0.4

W3 = 0.1

Y = 1.2

Summation function:

Transfer function:

Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2

YT = 1/(1 + e-1.2) = 0.77

YT = 0.77

Threshold value

1/31/2011 19 Business Intelligence

Page 20: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Neural Network Architectures(Page 251-252)

Several ANN architectures exist Feedforward - figure 6.4 page 249 (see previous

slide) Recurrent - - figure 6.7 page 252 (next slide) Associative memory Self-organizing feature maps Hopfield networks, etc

1/31/2011 20 Business Intelligence

Page 21: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562Neural Network ArchitecturesRecurrent Neural Networks

(Page 252, figure 6.7)

1/31/2011 21 Business Intelligence

Page 22: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Neural Network Architectures(Page 252)

Architecture of a neural network is driven by the task it is intended to address

Most popular architecture: Feedforward, multi-layered perceptron with backpropagation learning algorithm Ie – Feedforward Perceptron is the architecture and

backpropagation is the learning algorithm.

1/31/2011 22 Business Intelligence

Page 23: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562Neural Network Architectures

The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. Frank Rosenblatt was a computer scientist born in 1928 in New York City.  He helped to create the Perceptron, a.k.a. the Mark 1, computer in 1960 at Cornell University.  This was the first computer that could learn skills by trial and error in an attempt to mimic human thought processes through the use of a neural network.  (Died 1971)

Backpropagation is a common, supervised method for teaching artificial neural networks how to perform a given task. It was first described by Arthur E. Bryson and Yu-Chi Ho in 1969.

1/31/2011 23 Business Intelligence

Frank Rosenblatt (1957)

Page 24: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562Neural Network Architectures

1/31/2011 24 Business Intelligence

Original Mark 1 (Automatic Sequence Controlled Calculator (ASCC)The building elements of the ASCC were switches, relays, rotating shafts, and clutches.

Page 25: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Learning in ANN(Page 252)

A process by which a neural network learns the underlying relationship between input and outputs, or just among the inputs

Supervised learning For prediction type problems E.g., backpropagation

Unsupervised learning For clustering type problems Self-organizing E.g., adaptive resonance theory

1/31/2011 25 Business Intelligence

Page 26: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

A Taxonomy of ANN Learning Algorithms(Page 253, Figure 6.8)

Learning Algorithms

Discrete/binary input Continuous Input

Surepvised Unsupervised

· Delta rule· Gradient Descent· Competitive learning· Neocognitron· Perceptor

· Simple Hopefield· Outerproduct AM· Hamming Net

· ART-1· Carpenter /

Grossberg

· ART-3· SOFM (or SOM)· Other clustering

algorithms

Architectures

Supervised Unsupervised

Recurrent Feedforward Extimator Extractor

· Hopefield · SOFM (or SOM)· Nonlinear vs. linear· Backpropagation· ML perceptron· Boltzmann

· ART-1· ART-2

UnsupervisedSurepvised

1/31/2011 26 Business Intelligence

Most popular

Page 27: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Read Application Case(Page 254)

Microsoft used BrainMaker Neural Network software from California Scientific to maximize return on direct Mail

Some of the variables considered (25 in total) Recency (how long since last registration / product

purchase) First date to file – loyal over time? Number of products bought and filed Value of products bought and registered Number of days from product release to purchase

Improved response rate from 4.9% to 8.2% - 35% cost savings on 40 Millions pieces of direct mailings

1/31/2011 27 Business Intelligence

Page 28: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

A Supervised Learning Process(Pages 255-256, figure 6.9)

Compute output

Is desiredoutput

achieved?

Stoplearning

Adjustweights

Yes

No

ANNModel

Three-step process:

1. Compute temporary outputs

2. Compare outputs with desired targets

3. Adjust the weights and repeat the process

1/31/2011 28 Business Intelligence

Page 29: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

How a Network Learns(Page 256)

Example: single neuron that learns the inclusive OR operation

* See page 257 for step-by-step progression of the learning process

Learning parameters:

Learning rate Momentum

1/31/2011 29 Business Intelligence

Page 30: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Backpropagation Learning(Page 258)

Errors are used to correct weights – called Back-error propagation

The (supervised) learning algorithm procedure:1. Initialize weights with random values and set other network

parameters

2. Read in the inputs and the desired outputs

3. Compute the actual output (by working forward through the layers)

4. Compute the error (difference between the actual and desired output)

5. Change the weights by working backward through the hidden layers

6. Repeat steps 2-5 until weights stabilize

1/31/2011 30 Business Intelligence

Page 31: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Backpropagation Learning(Figure 6.10 Page 258)

Backpropagation of Error for a Single Neuron

w1

w2

wn

x1

x2

xn

.

.

.

Yi

Neuron (or PE)

n

iiiWXS

1

)( Sf

SummationTransferFunction

)(SfY

a(Zi – Yi)error

1/31/2011 31 Business Intelligence

Page 32: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Development Process of an ANN(Page 259)

1/31/2011 32 Business Intelligence

Similar to structured design for traditional IS, with some new elements

See page 253

Page 33: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562An MLP ANN Structure for the Box-Office Prediction Problem (Page 262, Fig 6.12)

This is the vignette at the start of Chapter 5 of page 191

1

2

3

4

5

6

7

...

1

2

3

4

5

6

7

8

9

...

MPAA Rating (5)(G, PG, PG13, R, NR)

Competition (3)(High, Medium, Low)

Star Value (3)(High, Medium, Low)

Genre (10)(Sci-Fi, Action, ... )

Technical Effects (3)(High, Medium, Low)

Sequel (2)(Yes, No)

Number of Screens (Positive Integer)

Class 1 - FLOP(BO < 1 M)

Class 2(1M < BO < 10M)

Class 3(10M < BO < 20M)

Class 4(20M < BO < 40M)

Class 5(40M < BO < 65M)

Class 6(65M < BO < 100M)

Class 7(100M < BO < 150M)

Class 8(150M < BO < 200M)

Class 9 - BLOCKBUSTER(BO > 200M)

INPUTLAYER

(27 PEs)

HIDDENLAYER I(18 PEs)

HIDDENLAYER II(16 PEs)

OUTPUTLAYER(9 PEs)

1/31/2011 33 Business Intelligence

Page 34: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Data Collection and Testing(Page 261)

Data is split into three parts Training (~60%) Validation (~20%) Testing (~20%)

1/31/2011 34 Business Intelligence

Page 35: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Sensitivity Analysis on ANN Models(Page 264-265)

A common criticism for ANN: The black-box syndrome!

Answer: sensitivity analysis Conducted on a trained ANN The inputs are changed while the relative

change on the output is measured/recorded Results illustrates the relative importance of

input variables

1/31/2011 35 Business Intelligence

Page 36: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Sensitivity Analysis on ANN Models(Page 265, Figure 6.13)

D1

Systematically Perturbed

Inputs

Observed Change in Outputs

Trained ANN“the black-box”

See and read example Application Case 6.5 (Page 266) Sensitivity analysis reveals the most important injury severity

factors in traffic accidents

1/31/2011 36 Business Intelligence

Page 37: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Sensitivity Analysis on ANN Models(Page 266)

Application Case 6.5 – see here 41,000 die in 6M US traffic accidents Analyze the factors that elevate the risk of severe injury Factors include behavior, environment, technical, etc. Used series of ANN models to estimate the significance of the crash factors on the level of severity sustained by the driver. Two step process used (1) prediction models, (2) sensitivity analysis on trained neural network Results shows significant differences among models built for different injury severity levels. (The most influential factors HIGHLY depend on the level of injury).

1/31/2011 37 Business Intelligence

Page 38: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562A Sample Neural Network ProjectBankruptcy Prediction

(Pg 267-270)

A comparative analysis of ANN versus logistic regression (LR) (a statistical method)

Inputs X1: Working capital/total assets X2: Retained earnings/total assets X3: Earnings before interest and taxes/total assets X4: Market value of equity/total debt X5: Sales/total assets

1/31/2011 38 Business Intelligence

Page 39: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

A Sample Neural Network ProjectBankruptcy Prediction

Data was obtained from Moody's Industrial Manuals Time period: 1975 to 1982 129 firms (65 of which went bankrupt during the

period and 64 nonbankrupt)

Different training and testing propositions are used/compared 90/10 versus 80/20 versus 50/50 Resampling is used to create 60 data sets

1/31/2011 39 Business Intelligence

Page 40: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

A Sample Neural Network ProjectBankruptcy Prediction

x1

x2

x3

x4

x5

BR = 1

NBR = 1

Network Specifics Feedforward MLP Backpropagation Varying learning and

momentum values 5 input neurons (1 for

each financial ratio), 10 hidden neurons, 2 output neurons (1

indicating a bankrupt firm and the other indicating a nonbankrupt firm)

1/31/2011 40 Business Intelligence

Page 41: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562A Sample Neural Network ProjectBankruptcy Prediction – Results

(Page 269 figure 6.2)

1/31/2011 41 Business Intelligence

Page 42: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Bottomline - Advantages of ANN(Pages 274-276)

Able to deal with (identify/model) highly nonlinear relationships

Can handle variety of problem types (loan apps, forecast profitability / finances, sports – team success, fraud prevention, time-series forecasting, health care and medicine – diagnose breast cancer – see Case 6.4 on page 276)

Usually provides better results (prediction and/or clustering) compared to its statistical counterparts

1/31/2011 42 Business Intelligence

Page 43: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Disadvantages of ANN

They are deemed to be black-box solutions, lacking expandability

It is hard to find optimal values for large number of network parameters Optimal design is still an art: requires expertise and extensive

experimentation

It is hard to handle large number of variables (especially the rich nominal attributes)

Training may take a long time for large datasets; which may require case sampling

1/31/2011 43 Business Intelligence

Page 44: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

ANN Software(Page 263)

Standalone ANN software tool NeuroSolutions BrainMaker NeuralWare NeuroShell, … for more (see pcai.com) …

Part of a data mining software suit PASW (formerly SPSS Clementine) SAS Enterprise Miner Statistica Data Miner, … many more …

1/31/2011 44 Business Intelligence

Page 45: CSC 562  Lecture 9 Chapter 6 – Artificial Neural Networks for Data Mining Business Intelligence 1/31/20111 Business Intelligence

CSC 562

Next lecture

Chapter 7 - Text and Web Mining

1/31/2011 45 Business Intelligence