Classification and Decision TreesIza Moise, Evangelos Pournaras, Dirk Helbing
Iza Moise, Evangelos Pournaras, Dirk Helbing 1
Overview
ClassificationDecision Trees
Iza Moise, Evangelos Pournaras, Dirk Helbing 2
Classification
Iza Moise, Evangelos Pournaras, Dirk Helbing 3
Definition
Classificationis a data mining function that assigns items in a collection to targetcategories or classes.
The goal
is to accurately predict the target class for each data point.
• Supervised
• Outcome → class
Iza Moise, Evangelos Pournaras, Dirk Helbing 4
Definition
Classificationis a data mining function that assigns items in a collection to targetcategories or classes.
The goal
is to accurately predict the target class for each data point.
• Supervised
• Outcome → class
Iza Moise, Evangelos Pournaras, Dirk Helbing 4
Definition
Classificationis a data mining function that assigns items in a collection to targetcategories or classes.
The goal
is to accurately predict the target class for each data point.
• Supervised
• Outcome → class
Iza Moise, Evangelos Pournaras, Dirk Helbing 4
Types of Classification
I binary classification → target attribute has only two values
I multi class targets have more than two values
crispy classification → given an input, the classifier returns itslabel
probabilistic → given an input, the classifier returns itsprobabilities to belong to each class
Iza Moise, Evangelos Pournaras, Dirk Helbing 5
Applications
Classification Example: Spam Filtering
Classify as “Spam” or “Not Spam”
11Machine Learning: CS 6375 Introduction, Instructor: Vibhav Gogate,The University of Texas at Dallas
Iza Moise, Evangelos Pournaras, Dirk Helbing 6
Applications[cont.]
Classification Example: Weather Prediction
22Machine Learning: CS 6375 Introduction, Instructor: Vibhav Gogate,The University of Texas at Dallas
Iza Moise, Evangelos Pournaras, Dirk Helbing 7
Applications[cont.]
• Customer Target Marketing
• Medical Disease Diagnosis
• Supervised Event Detection
• Multimedia Data Analysis
• Document Categorization and Filtering
• Social Network Analysis
Iza Moise, Evangelos Pournaras, Dirk Helbing 8
A Three-Phase Process
1. Training phase: a model is constructed from the traininginstances.
→ classification algorithm finds relationships between predictorsand targets
→ relationships are summarised in a model→ train the model on data with known labels (training data)
2. Testing phase: test the model on a test sample whose classlabels are known but not used for training the model (testingdata)
3. Usage phase: use the model for classification on new datawhose class labels are unknown (new data)
Iza Moise, Evangelos Pournaras, Dirk Helbing 9
Training Phase - Model Construction
3
3Data Warehousing and Data Mining, Instructor: Prof. Hany Saleeb
Iza Moise, Evangelos Pournaras, Dirk Helbing 10
Testing Phase - Model usage
4
4Data Warehousing and Data Mining, Instructor: Prof. Hany Saleeb
Iza Moise, Evangelos Pournaras, Dirk Helbing 11
Methods of classification
• Decision Trees
• k-Nearest Neighbours
• Neural Networks
• Logistic Regression
• Linear Discriminant Analysis
Iza Moise, Evangelos Pournaras, Dirk Helbing 12
Decision Trees
Iza Moise, Evangelos Pournaras, Dirk Helbing 13
Main principles
A decision treecreates a hierarchical partitioning of the data which relates the differ-ent partitions at the leaf level to the different classes.
Data requirements:
• Attribute-Value description: object expressible in terms of afixed collection of properties or attributes (e.g., hot, mild, cold).
• Predefined classes (target values): the target function hasdiscrete output values (boolean or multi-class).
• Sufficient data: enough training cases should be provided tolearn the model.
Iza Moise, Evangelos Pournaras, Dirk Helbing 14
Main principles [cont.]
• decision node = test on anattribute
• branch = an outcome of the test
• leaf node = classification ordecision
• root = the best predictor
• path: a disjunction of tests tomake the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing 15
Main principles [cont.]
• decision node = test on anattribute
• branch = an outcome of the test
• leaf node = classification ordecision
• root = the best predictor
• path: a disjunction of tests tomake the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing 15
Main principles [cont.]
• decision node = test on anattribute
• branch = an outcome of the test
• leaf node = classification ordecision
• root = the best predictor
• path: a disjunction of tests tomake the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing 15
Main principles [cont.]
• decision node = test on anattribute
• branch = an outcome of the test
• leaf node = classification ordecision
• root = the best predictor
• path: a disjunction of tests tomake the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing 15
Main principles [cont.]
• decision node = test on anattribute
• branch = an outcome of the test
• leaf node = classification ordecision
• root = the best predictor
• path: a disjunction of tests tomake the final decision
Iza Moise, Evangelos Pournaras, Dirk Helbing 15
Main principles [cont.]
• decision node = test on anattribute
• branch = an outcome of the test
• leaf node = classification ordecision
• root = the best predictor
• path: a disjunction of tests tomake the final decision
Classification on new instances is done by following a matchingpath from the root to a leaf node
Iza Moise, Evangelos Pournaras, Dirk Helbing 15
5
5Dr. Saed Sayad, adjunct Professor at the University of Toronto
Iza Moise, Evangelos Pournaras, Dirk Helbing 16
Split criterion
A condition (or predicate) on:
• a single attribute → univariate split
• multiple attributes → multivariate split
I Recursively split the training data
I Goal: maximize the information gain (the discrimination amongthe classes)
→ how well an attribute separates the examples accordingto their target classification
Iza Moise, Evangelos Pournaras, Dirk Helbing 17
How to build a decision tree?
Top-down tree construction:
• all training data are the root
• data are partitioned recursively based on selected attributes
• bottom-up tree pruning→ remove subtrees or branches, in a bottom-up manner, toimprove the estimated accuracy on new cases.
• conditions for stopping partitioning:
• all samples for a given node belong to the same class
• there are no remaining attributes for further partitioning
• there are no samples left
Iza Moise, Evangelos Pournaras, Dirk Helbing 18
Pros and Cons
Pros:
X simple to understand and interpret
X little data preparation and little computation
X indicates which attributes are most important for classification
Iza Moise, Evangelos Pournaras, Dirk Helbing 19
Pros and Cons
Cons:
X learning an optimal decision tree is NP-complete
X perform poorly with many classes and small data
X computationally expensive to train
X over-complex trees do not generalise well from the trainingdata (overfitting)
Iza Moise, Evangelos Pournaras, Dirk Helbing 20
What’s next?
• k-nearest Neighbors
• Clustering
Iza Moise, Evangelos Pournaras, Dirk Helbing 21