Oblique Decision Trees Using Householder Reflection

Oblique Decision Trees Using Householder Reflection

Chitraka Wickramarachchi

Dr. Blair RobertsonDr. Marco RealeDr. Chris PriceProf. Jennifer Brown

Introduction

Literature Review

Methodology

Results and Discussion

Outline of the Presentation

Example: A bank wants to predict the potential status (Default or not) of a new credit card customer

For the existing customers the bank has following data

Introduction

Salary No of Credit cards

Total Credit Amount

No of Loans

Total Loan Installment

Value of Other earnings

Gender Age

Possible approach - Generalized Linear models with binomial errors

Model become complex if the structure of the data is complex.

Decision tree is a tree structured classifier.

Decision Tree (DT)

Salary <= s

TCA < tc

TLA < tl

DND D

Root Node

Non-Terminal

Node

Terminal Node

Test based on features

Recursively partition the feature space into disjoint sub-

regions until each sub-region becomes homogeneous with

respect to a particular class

Partitions

X1

X2

Choosing the best split

0.0221

0.0895 0.154

6

0.1123

0.00150.154

6

0.0654

0.1586

0.1412

0.1224

0.0345

0.1586

X2 <= 0.6819

X1<= 0.4026X1<= 0.5713

Types of DTs

Decision Trees

Univariate DT

Multivariate DT

Linear DTNon-Linear

DT

Axis parallel splits

Oblique splits

Axis parallel splits

Easy to implement Computer complexity is low Easy to interpret

Advantages

Disadvantage

When the true boundaries are

not axis parallel it produces

complicated boundary structure

Axis parallel boundaries

X1

X2

Oblique splits

Advantage - Simple boundary structure

X1

X2

Disadvantages

Implementation is challenging Computer complexity is high

Therefore computationally less expensive oblique tree

induction method would be desirable

X1

X2

Oblique splits

Literature Review

Oblique splits search for splits in the form of

CART – LC Starts with the best axis parallel split Perturb each coefficient until find the best split

Breiman et al. (1984)

Can get trapped in local mimina

Limitations

No upper bound on the time spent at any node

∑𝑖=1

𝑑

𝑎𝑖 𝑥 𝑖+𝑎0≤𝑐

Literature Review

Heath et al. (1993)

Simulated annealing Decision Trees (SADT)

First places a hyperplane in a canonical location Perturb each coefficient randomly

By randomization - try to escape from the local mimima

Algorithm runs much slower than CART- LC

Limitations

Literature Review

Murthy et al. (1994)

Oblique Classifier 1 (OC1)

Start with the best axis parallel split Perturb each coefficient At a local mimima, perturb the hyperlane randomly

Since 1994, there are many ODT induction methods have been

developed based on EA algorithms and neural network

concept

Proposed Methodology

Our approach is to

Transform the data set parallel to one of the

feature axes

Implement axis parallel splits

Back-transform them in to the original space

Transformation is done using Householder reflection.

Householder Reflection

Let X and Y are vectors with the same norm there exists

orthogonal symmetric matrix P such that

where𝒀=𝑯𝑿 𝑯= 𝑰−𝟐𝑾𝑾𝑻 𝒂𝒏𝒅𝑾=𝑿 −𝒀

‖𝑿−𝒀‖𝟐

Orientation of a cluster can be represented by the dominant

Eigen vector of its variance covariance matrix.

X1

X2



𝒆𝟏=𝑯𝒅

𝑯= 𝑰−𝟐𝑾𝑾𝑻 𝒂𝒏𝒅𝑾=𝒅−𝒆𝟏

‖𝒅−𝒆𝟏‖𝟐

X1

X2


To avoid over-fitting

Number of Terminal Nodes

Accuracy

Cost-complexity pruning


Data sets - UCI Machine Learning Repository

Data set Number of examples

Number of features

Number of Classes

Iris Data 150 4 3

Boston Housing Data 506 13 2

Estimate of the accuracy was obtained by ten 5-fold

cross validation experiments.


Classifer Iris Data Housing Data

Householder Method

CART-LC

OC1

C4.5

Results High accuracy Computationally inexpensive

THANK YOU

Documents

Oblique Decision Trees Using Householder Reflection