SLIQ

SLIQ: A Fast Scalable Classifier for Data Mining

Manish Mehta, Rakesh Agrawal, Jorma Rissanen

Presentation by: Sara Alaee , Zahra Taheri

SLIQ: A Fast Scalable Classifier for Data Mining

Presented in: 5th International Conference on

Extending Database Technology Avignon, France, March 25–29, 1996 Proceedings

927 citations

Outline Introduction Motivation SLIQ Algorithm

Building tree Pruning Example

Evaluation Conclusion04/13/23 3

Introduction

Most of the classification algorithms are designed for memory-resident data limited suitability for mining large training

datasets Solution : build a scalable classifier -

SLIQ SLIQ : Supervised Learning in Quest

04/13/23 4

Outline Introduction Motivation SLIQ Algorithm

Motivation Improve scalability of tree classifiers Previous proposals:

Sampling data at each node Discretization of numerical attributes Partitioning input data and build tree for

each partition All methods achieve low accuracy!

SLIQ – improve learning time without loss in accuracy!

04/13/23 6

Motivation (cont.)

Recall (ID3, C4.5, CART):

04/13/23 7

Motivation (cont.) Non-Scalable Decision Trees:

Complexity in determining the best split for each attribute

Cost of evaluating splits for numerical attributes = cost of sorting values at each node

Cost of evaluating splits for categorical attributes = cost of searching for the best subset

Pruning cross-validation: inapplicable for large

datasets divide data in two parts - training and test

set : sizes & distribution problem04/13/23 8

Outline Introduction Motivation SLIQ - Algorithm

SLIQ – Algorithm

Key features: Tree classifier, handling both numerical

and categorical attributes Pre-sort numerical attributes before

tree has been built Breadth first growing strategy Goodness test – Gini index Inexpensive tree pruning algorithm

based on Minimum Description Length (MDL)

04/13/23 10

SLIQ – Algorithm (cont.)

Pre-sorting: Eliminate the need to sort the data at

each node

Create sorted list for each numerical attribute

Create class list04/13/23 11

Example:

04/13/23 12

Split evaluation:

04/13/23 13

Example:

04/13/23 14

Update class list:

04/13/23 15

Example:

04/13/23 16

SLIQ – Algorithm (cont.) When node becomes pure, stop splitting Condense attribute lists by discarding

examples corresponding to the pure node

For large-cardinality categorical attributes (determined based on threshold): the best split computed either in greedy way, or all possible splits are evaluated

SLIQ is able to scale for large datasets with no loss in accuracy – the splits evaluated with or without pre-sorting are identical

04/13/23 17

SLIQ - Pruning

Post pruning algorithm based on Minimum Description Length principle

Find a model that minimizes:Cost(M,D) = Cost(D|M) + Cost(M)Cost(M) - cost of the modelCost(D|M) - cost of encoding the data D if model M is given

04/13/23 18

SLIQ - Pruning Cost of the data: classification error Cost of the model:

Encoding the tree: number of bits Encoding the splits:

numerical attribute - constant (empirically 1) categorical attribute - depends on cardinality

MDL pruning evaluates the code length at each node to decide on pruning

04/13/23 19

SLIQ - Pruning

Pruning Algorithm:

C’(ti) : cost of encoding the children’s examples using the parent’s statistics.

04/13/23 20

SLIQ - Pruning

Three pruning strategies: Full – pruning both children and

convert node to the leaf Partial – prune into the leaf or prune

the left child or prune the right child or leave node intact

Hybrid – apply Full method and then partial (prune left, prune right or leave intact)

04/13/23 21

Evaluation Metrics:

Primary: classification accuracy Secondary: classification time & size of the

decision tree Setup:

Small benchmarks: datasets from the STATLOG classification

benchmark Synthetic databases: 9 attributes for each

tuple, 2 classification functions04/13/23 23

Evaluation

STATLOG benchmark:

04/13/23 24

Evaluation

Pruning strategy comparison:

Hybrid pruning is the preferred approach, and is used for the experiments in this paper.04/13/23 25

Evaluation

Small datasets:• IND-Cart:

• good accuracy • small trees• an order of

magnitude slower than others.

• IND-C4: • Accurate• fast• large decision

trees. • SLIQ:

• Accurate• smaller than IND-

C4.• faster than IND-

04/13/23 26

Evaluation

Scalability:

04/13/23 27

Conclusion SLIQ demonstrates to be a fast, low-cost

and scalable classifier that builds accurate trees

Based on empirical tests SLIQ achieves accuracy while producing smaller decision trees compared to other algorithms

Scalability??? Memory problem when increasing number of attributes or number of classes

04/13/23 29

THANK YOU!

04/13/23 30

SLIQ

Engineering

FREE : Sony Music Disc FREE : Pendrive 4GB Sliq USB Drive

SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen 1996. Presentation by: Vladan Radosavljevic

SLIQ and SPRINT for disk resident data

ISLGAS: Improved Supervised Learning in Quest Using Gain ... · ear to weather forecast news for local ... The present research is the enhancement to our previous papers such as SLIQ

Introducing CHOCO LATE-CO ATED SLIQ UE BARS Bars.pdfCHOCO LATE-CO ATED SLIQ UE BARS Not just for your weight-management program, great tasting chocolate-coated Slique® Bars are as

CC-SLIQ: Performance Enhancement with 2 Split Points in ...CC-SLIQ: Performance Enhancement with 2k Split Points in SLIQ Decision Tree Algorithm Narasimha Prasad L V, Member, IAENG,

SLIQ: A Fast Scalable Classifier for Data Mining

Corporate Uniforms by SLIQ Dressings Gurgaon.ppsx

SLIQ: A Fast Scalable Classifier for Data Mining › keel › pdf › algorithm › congreso › SLIQ.pdf · SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh

Executive Certificate Program in Project Management...flexibility to reach out to the professors, real time during the class or offline via our SLIQ Cloud Campus to raise questions

Evaluating sensitivity to change in the Simple Lifestyle ......May, 2014 St. John’s Newfoundland ii Abstract The Simple Lifestyle Indicator Questionnaire (SLIQ) was developed to

Decision Trees SLIQ – fast scalable classifier Group 12 -Vaibhav Chopda -Tarun Bahadur