Upload
lucinda-baldwin
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
Chapter 7Chapter 7Neural Networks in Data MiningNeural Networks in Data Mining
Automatic Model Building
(Machine Learning)
Artificial Intelligence
結束
7-2
ContentsContents
Describe neural networks as used in Data mining
Reviews real applications of each model
Shows the application of models to larger data sets
結束
7-3
High-Growth ProductHigh-Growth Product
There are some types of data where neural network models usually outperform better when there are complicated relationships (nonlinearity) in the data.
Used for classifying data target customersbank loan approvalhiring stock purchaseDATA MINING
Used for prediction
結束
7-4
Neural NetworkNeural Network
Neural networks are the most widely used method in data mining.
The idea of neural networks was derived from how neurons operate in the brain.
Real neurons are connected to each other, and accept electrical charges across synapses and pass on the electrical charge to other neighboring neurons.
ANN is usually arranged in at least three layers, have a defined and constant structure to reflect complex nonlinear relationships. (at least one hidden layer)
結束
7-5
NetworkNetwork
Input Hidden Output
Layer Layers Layer
Good
Bad
結束
7-6
Neural NetworkNeural Network
For classification neural network models, the output layer has on node for each classification category (true or false).
Each node is connected by an arc to nodes in the next layer. These arcs have weights, which are multiplied by the value of incoming nodes and summed.
Middle layer node values are the sum of incoming node values multiplied by the arc weights.
ANN learn through feedback loops. Output is compared to target values, and the difference between attained and target output is fed back to the system to adjust the weights on arcs.
Measure fit fine tune around best fit
結束
7-7
Neural NetworkNeural Network
ANN can apply learned experience to new cases, for decision, classifications, and forecasts.ANN modeling should consider:Input variable selection and manipulation Select learning parameter, such as the no. of hidden
layers, learning rate, momentum, activation function…
About 95% of business applications were reported to use multilayered feedforward neural network with backpropagation learning rule.Supervised learning Each element in each layer is connected to all elements
of the next layer.
結束
7-8
Neural NetworkNeural Network
Multilayered feedforward neural networks are analogous to regression and discriminant analysis in dealing with cases where training data is available.
Self-organizing map (SOM) is analogous to clustering technique used there is no training data.To classify data to maximize the similarity of patterns
within clusters while minimizing the similarity to patterns of different clusters.
Kohonen SOM were developed to detect strong features of large data sets.
結束
7-9
Neural Network TestingNeural Network Testing
Usually train on part of available data package tries weights until it successfully categorizes a selected
proportion of the training data
When trained, test model on part of dataif given proportion successfully categorized, quitsif not, works some more to get better fit
The “model” is internal to the package
Model can be applied to new data
結束
7-10
Neural Network ProcessNeural Network Process
1. Collect data
2. Separate into training, test sets
3. Transform data to appropriate units• Categorical works better, but not necessary
4. Select, train, & test the network• Can set number of hidden layers
• Can set number of nodes per layer
• A number of algorithmic options
5. Apply (need to use system on which built)
結束
7-11
Loan ApplicationsLoan Applications
Loan decision is repetitive and time consuming, and every attempt should be made the decision that is fair to the applicant while reducing the risk of default to the lender.
1. Data collection: sex, marital status, No. of dependent children, occupation, …
2. Separating data: learning data (at least 100 sets) and testing data (100 sets)
3. Transform the inputs: ANN requires numeric data. See page 125.
結束
7-12
Loan ApplicationsLoan Applications
4. Select, train and test the network: 1. The number of middle layer nodes, transfer function,
learning algorithms.
2. Too many hidden layer nodes results in the ANN memorizing the input data, without learning a generalizable pattern for the accurate analysis of new data. Too few nodes, requires more training time and result in less accurate models.
5. Repeat step 1 through 4 until the prescribed tolerance reached.
結束
7-13
Neural Nets to Predict BankruptcyNeural Nets to Predict Bankruptcy
Wilson & Sharda (1994)
Monitor firm financial performanceUseful to identify internal problems, investment evaluation, auditing
Predict bankruptcy - multivariate discriminant analysis of financial ratios (develop formula of weights over independent variables)
Neural network - inputs were 5 financial ratios - data from Moody’s Industrial Manuals (129 firms, 1975-1982; 65 went bankrupt)
Tested against discriminant analysis
Neural network significantly better
結束
7-14
Ranking Neural NetworkRanking Neural Network
Wilson (1994)
Decision problem - ranking candidates for position, computer systems, etc.
INPUT - manager’s ranking of alternatives
Real decision - hire 2 sales people from 15 applicants
Each applicant scored by manager
Neural network took scores, rank ordered
best fit to manager of alternatives compared (AHP)
結束
7-15
Application resultsApplication results
結束
7-16
Application resultsApplication results
結束
7-17
Application resultsApplication results
結束
7-18
ExerciseExercise
Data coding refers to page 117. Age <20 0
20~50 (age-20)/30> 50 1.0
State CA 1.0Rest 0
Degree Cert 0UG 0.5Rest 1.0
Major IS 1.0Csci, Engr Sci 0.9BusAd 0.7Other 0.5None 0
Experience Max Years/5 Minimal 2 Adequate 3