Upload
vokhue
View
233
Download
0
Embed Size (px)
Citation preview
ECLT 5810Classification – Decision Trees
Prof. Wai Lam
ECLT 5810 Classification and Decision Tree 2
Classification and Decision Tree What is classification? Issues regarding classification Classification by decision tree induction
ECLT 5810 Classification and Decision Tree 3
Classification vs. Prediction
Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying new data
E.g. categorize bank loan applications as either safe or risky. Prediction:
models continuous-valued functions, i.e., predicts unknown or missing values
E.g. predict the expenditures of potential customers on computer equipment given their income and occupation.
Typical Applications credit approval target marketing medical diagnosis treatment effectiveness analysis
ECLT 5810 Classification and Decision Tree 4
Classification - A Two-Step Process Step1 (Model construction): describing a predetermined set of
data classes Each tuple/sample is assumed to belong to a predefined
class, as determined by the class label attribute The set of tuples used for model construction: training set The individual tuples making up the training set are referred
to as training samples Supervised learning: Learning of the model with a given
training set. The learned model is represented as
classification rules decision trees, or mathematical formulae.
ECLT 5810 Classification and Decision Tree 5
Classification - A Two-Step Process Step 2 (Model usage): the model is used for classifying future
or unseen objects. Estimate accuracy of the model
The known label of test sample is compared with the classified result from the model
Accuracy rate is the percentage of test set samples that are correctly classified by the model.
Test set is independent of training set, otherwise over-fitting will occur
If the accuracy is acceptable, the model is used to classify future data tuples with unknown class labels.
ECLT 5810 Classification and Decision Tree 6
Classification Process (1): Model Construction
TrainingData
NAME AGE INCOME CREDIT RATINGMike <= 30 low fairMary <= 30 low poorBill 31..40 high excellentJim >40 med fairDave >40 med fairAnne 31..40 high excellent
ClassificationAlgorithms
IF age = “31..40”and income = high
THENcredit rating = excellent
Classifier(Model)
ECLT 5810 Classification and Decision Tree 7
Classification Process (2): Use the Model in Prediction
Classifier
TestingData
Unseen Data
(John, 31..40, med)
Credit rating?
NAME AGE INCOME CREDIT RATINGMay <= 30 high fairAna 31..40 low poorWayne >40 high excellentJack <=30 med fair fair
ECLT 5810 Classification and Decision Tree 8
Supervised vs. Unsupervised Learning Supervised learning (classification)
Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
New data is classified based on the training set Unsupervised learning (clustering)
The class labels of training data is unknown Given a set of measurements, observations, etc. with the
aim of establishing the existence of classes or clusters in the data
ECLT 5810 Classification and Decision Tree 9
Issues regarding Classification and Prediction (1): Data Preparation Data cleaning
Preprocess data in order to reduce noise and handle missing values
Relevance analysis (feature selection) Remove the irrelevant or redundant attributes E.g. date of a bank loan application is not relevant Improve the efficiency and scalability of data mining
Data transformation Data can be generalized to higher level concepts (concept
hierarchy) Data should be normalized when methods involving distance
measurements are used in the learning step (e.g. neural network)
ECLT 5810 Classification and Decision Tree 10
Issues regarding Classification and Prediction (2): Evaluating Classification Methods
Predictive accuracy Speed and scalability
time to construct the model time to use the model
Robustness handling noise and missing values
Scalability efficiency in disk-resident databases (large amount of data)
Interpretability: understanding and insight provided by the model
Goodness of rules decision tree size compactness of classification rules
ECLT 5810 Classification and Decision Tree 11
Classification by Decision Tree Induction Decision tree
A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution
Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision
tree
ECLT 5810 Classification and Decision Tree 12
An Example of a Decision Tree For “buys_computer”
age?
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
31..40
ECLT 5810 Classification and Decision Tree 13
How to Obtain a Decision Tree? Manual construction Decision tree induction:
Automatically discover a decision tree from data Tree construction
At start, all the training examples are at the root Partition examples recursively based on selected attributes
Tree pruning Identify and remove branches that reflect noise or outliers
ECLT 5810 Classification and Decision Tree 14
Training Dataset
buys_computernonoyesyesyesnoyesnoyesyesyesyesyesno
This follows an example from Quinlan’s ID3
age income student credit_rating<=30 high no fair<=30 high no excellent31..40 high no fair>40 medium no fair>40 low yes fair>40 low yes excellent31..40 low yes excellent<=30 medium no fair<=30 low yes fair>40 medium yes fair<=30 medium yes excellent31..40 medium no excellent31..40 high yes fair>40 medium no excellent
ECLT 5810 Classification and Decision Tree 15
Algorithm for Decision Tree Induction Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer manner
At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are
discretized in advance) Examples are partitioned recursively based on selected
attributes
ECLT 5810 Classification and Decision Tree 16
Basic Algorithm for Decision Tree Induction If the samples are all of the same class, then the node
becomes a leaf and is labeled with that class Otherwise, it uses a statistical measure (e.g., information
gain) for selecting the attribute that will best separate the samples into individual classes. This attribute becomes the “test” or “decision” attribute at the node.
A branch is created for each known value of the test attribute, and the samples are partitioned accordingly
The algorithm uses the same process recursively to form a decision tree for the samples at each partition. Once an attribute has occurred at a node, it need not be considered in any of the node’s descendents.
ECLT 5810 Classification and Decision Tree 17
Basic Algorithm for Decision Tree Induction The recursive partitioning stops only when any one of the
following conditions is true: All samples for a given node belong to the same class There are no remaining attributes on which the samples
may be further partitioned. In this case, majority voting is employed. This involves converting the given node into a leaf and labeling it with the class in majority voting among samples.
There are no samples for the branch test-attribute=ai. In this case, a leaf is createdwith the majority class in samples.
ECLT 5810 Classification and Decision Tree 18
ECLT 5810 Classification and Decision Tree 19
Attribute Selection by Information Gain Computationattribute1 attribute2 class labelhigh high yeshigh high yeshigh high yeshigh low yeshigh low yeshigh low yeshigh low nolow low nolow low nolow high nolow high nolow high no
ECLT 5810 Classification and Decision Tree 20
Attribute Selection by Information Gain Computationattribute1 attribute2 class labelhigh high yeshigh high yeshigh high yeshigh low yeshigh low yeshigh low yeshigh low nolow low nolow low nolow high nolow high nolow high no
Consider the attribute1:
Class label
attribute1 yes no
high 6 1
low 0 5
Consider the attribute2:
Class label
attribute2 yes no
high 3 3
low 3 3
ECLT 5810 Classification and Decision Tree 21
Attribute Selection by Information Gain Computationattribute1 attribute2 class labelhigh high yeshigh high yeshigh high yeshigh low yeshigh low yeshigh low yeshigh low nolow low nolow low nolow high nolow high nolow high no
Consider the attribute1:
Class label
attribute1 yes no
high 6 1
low 0 5
Consider the attribute2:
attribute1 is better than attribute2 for classification purpose !
Class label
attribute2 yes no
high 3 3
low 3 3
ECLT 5810 Classification and Decision Tree 22
Information Gain (ID3/C4.5) To measure the degree of diversity, we can use the entropy
function Assume there are two groups, P and N
Suppose there are p elements of group P and n elements of group N
The entropy function I(p,n) is defined as:
npn
npn
npp
nppnpI
22 loglog),(
Some examples:
ECLT 5810 Classification and Decision Tree 23
Information Gain in Decision Tree Induction Assume that using attribute A, a set S will be partitioned into
sets {S1, S2 , …, Sv} where {1,..., v} are the possible values of A If Si contains pi examples of P and ni examples of N, the
entropy, or the expected information needed to classify objects in all subtrees Si is
The encoding information that would be gained by branching on A.
The attribute with the highest information gain is chosen as the test attribute for the given set S.
Generalize the above to m classes.
1),()(
iii
ii npInpnpAE
)(),()( AEnpIAGain
ECLT 5810 Classification and Decision Tree 24
Attribute Selection by Information Gain Computation
age yes no I(pi, ni)<=30 2 3 0.97131..40 4 0 0>40 3 2 0.971
048015100290
.)rating_credit(Gain.)student(Gain.)income(Gain
Consider the attribute age:
Consider other attributes in a similar way:
694023145
0414432
145
.),(I
),(I),(I)age(E
24606940940 ...)age(Gain)age(E)n,p(I)age(Gain
I(p, n) = I(9, 5) =0.940
ECLT 5810 Classification and Decision Tree 25
Learning (Constructing) a Decision Tree
age?
<=30 >4031..40
ECLT 5810 Classification and Decision Tree 26
Extracting Classification Rules from Trees Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating =“excellent” THEN buys_computer= “yes” IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”
age?
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
31..40
ECLT 5810 Classification and Decision Tree 27
Classification in Large Databases Classification—a classical problem extensively studied by
statisticians and machine learning researchers Scalability: Classifying data sets with millions of examples and
hundreds of attributes with reasonable speed Why decision tree induction in data mining?
relatively faster learning speed (than other classification methods)
convertible to simple and easy to understand classification rules
comparable classification accuracy with other methods
ECLT 5810 Classification and Decision Tree 28
Presentation of Classification Results