Download pdf - Classification and Decision Trees - ETH Z · decision node = test on an attribute branch = an outcome of the test leaf node = classiﬁcation or decision root = the best predictor

Classification and Decision TreesIza Moise, Evangelos Pournaras, Dirk Helbing

Iza Moise, Evangelos Pournaras, Dirk Helbing 1

Overview

ClassificationDecision Trees


Classification


Definition

Classificationis a data mining function that assigns items in a collection to targetcategories or classes.

The goal

is to accurately predict the target class for each data point.

• Supervised

• Outcome → class


Definition


The goal


• Supervised



Definition


The goal


• Supervised



Types of Classification

I binary classification → target attribute has only two values

I multi class targets have more than two values

crispy classification → given an input, the classifier returns itslabel

probabilistic → given an input, the classifier returns itsprobabilities to belong to each class


Applications

Classification Example: Spam Filtering

Classify as “Spam” or “Not Spam”

11Machine Learning: CS 6375 Introduction, Instructor: Vibhav Gogate,The University of Texas at Dallas


Applications[cont.]

Classification Example: Weather Prediction

22Machine Learning: CS 6375 Introduction, Instructor: Vibhav Gogate,The University of Texas at Dallas


Applications[cont.]

• Customer Target Marketing

• Medical Disease Diagnosis

• Supervised Event Detection

• Multimedia Data Analysis

• Document Categorization and Filtering

• Social Network Analysis


A Three-Phase Process

1. Training phase: a model is constructed from the traininginstances.

→ classification algorithm finds relationships between predictorsand targets

→ relationships are summarised in a model→ train the model on data with known labels (training data)

2. Testing phase: test the model on a test sample whose classlabels are known but not used for training the model (testingdata)

3. Usage phase: use the model for classification on new datawhose class labels are unknown (new data)


Training Phase - Model Construction

3

3Data Warehousing and Data Mining, Instructor: Prof. Hany Saleeb


Testing Phase - Model usage

4

4Data Warehousing and Data Mining, Instructor: Prof. Hany Saleeb


Methods of classification

• Decision Trees

• k-Nearest Neighbours

• Neural Networks

• Logistic Regression

• Linear Discriminant Analysis


Decision Trees


Main principles

A decision treecreates a hierarchical partitioning of the data which relates the differ-ent partitions at the leaf level to the different classes.

Data requirements:

• Attribute-Value description: object expressible in terms of afixed collection of properties or attributes (e.g., hot, mild, cold).

• Predefined classes (target values): the target function hasdiscrete output values (boolean or multi-class).

• Sufficient data: enough training cases should be provided tolearn the model.


Main principles [cont.]

• decision node = test on anattribute

• branch = an outcome of the test

• leaf node = classification ordecision

• root = the best predictor

• path: a disjunction of tests tomake the final decision




































Classification on new instances is done by following a matchingpath from the root to a leaf node


5

5Dr. Saed Sayad, adjunct Professor at the University of Toronto


Split criterion

A condition (or predicate) on:

• a single attribute → univariate split

• multiple attributes → multivariate split

I Recursively split the training data

I Goal: maximize the information gain (the discrimination amongthe classes)

→ how well an attribute separates the examples accordingto their target classification


How to build a decision tree?

Top-down tree construction:

• all training data are the root

• data are partitioned recursively based on selected attributes

• bottom-up tree pruning→ remove subtrees or branches, in a bottom-up manner, toimprove the estimated accuracy on new cases.

• conditions for stopping partitioning:

• all samples for a given node belong to the same class

• there are no remaining attributes for further partitioning

• there are no samples left


Pros and Cons

Pros:

X simple to understand and interpret

X little data preparation and little computation

X indicates which attributes are most important for classification


Pros and Cons

Cons:

X learning an optimal decision tree is NP-complete

X perform poorly with many classes and small data

X computationally expensive to train

X over-complex trees do not generalise well from the trainingdata (overfitting)


What’s next?

• k-nearest Neighbors

• Clustering