29
Pruning and Dynamic Scheduling Pruning and Dynamic Scheduling of Cost-sensitive Ensembles of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York IBM T.J.Watson, Hawthorne, New York Fang Chu Fang Chu UCLA, Los Angeles, CA UCLA, Los Angeles, CA

Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Embed Size (px)

Citation preview

Page 1: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Pruning and Dynamic Pruning and Dynamic Scheduling of Cost-sensitive Scheduling of Cost-sensitive EnsemblesEnsembles

Wei Fan, Haixun Wang, and Philip S. YuWei Fan, Haixun Wang, and Philip S. YuIBM T.J.Watson, Hawthorne, New YorkIBM T.J.Watson, Hawthorne, New York

Fang ChuFang ChuUCLA, Los Angeles, CAUCLA, Los Angeles, CA

Page 2: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Inductive LearningInductive Learning

TrainingData

Learner Classifier

($43.45,retail,10025,10040, ..., nonfraud)($246,70,weapon,10001,94583,...,fraud)

1. Decision trees2. Rules3. Naive Bayes...

Transaction {fraud,nonfraud}

TestData

($99.99,pharmacy,10013,10027,...,?)($1.00,gas,10040,00234,...,?)

Classifier Class Labels

nonfraudfraud

Page 3: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los
Page 4: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Cost-sensitive ProblemsCost-sensitive Problems

ƒ Charity Donation:Solicit to people who will donate large amount of charity.Costs $0.68 to send a letter.A(x): donation amount.Only solicit if A(x) > 0.68, otherwise lose money.

ƒ Credit card fraud detection:Detect frauds with high transaction amount

$90 to challenge a potential fraudA(x): fraudulant transaction amount.Only challenge if A(x) > $90, otherwise lose money.

Page 5: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Scalable Issues of Data Mining Scalable Issues of Data Mining

ƒ Learning algorithm:non-linear complexity in the size of dataset n. memory based due to random access pattern of record in dataset.significantly slower if dataset is not held entirely in memory.

ƒ State-of-the-artmany scalable solutions are algorithm specific.general algorithms are not very scalable and only work for cost-insensitive problemsCharity donation: solicit to people who will donate a lot.

Credit card fraud: detect frauds with high transaction amount.

ƒ Our solution: general framework for both cost-sensitive and cost-insensitive problems.

Page 6: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

TrainingTraining

D

D1 D2D2

large dataset

partition into

K subsets

ML1ML2 MLt

C1 C2Ck

generate

K models

Page 7: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

TestingTesting

DTest Set

C1 C2 Ck

Sent to k models

P1 P2 PkCompute k predictions

Combine

P

Combine to one prediction

Page 8: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Cost-sensitive Decision MakingCost-sensitive Decision Making

ƒ Assume that records the benefit received by predicting an example of class to be an instance of class .

ƒ The expected benefit received to predict an example to be an instance of class (regardless of its true label) is

ƒ The optimal decision-making policy chooses the label that maximizes the expected benefit, i.e.,

ƒ When and is a

traditional accuracy-based problem.ƒ Total benefits

Page 9: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Charity Donation ExampleCharity Donation Example

ƒ It costs $.68 to send a solicitation.ƒ Assume that is the best

estimate of the donation amount,

ƒ The cost-sensitive decision making will solicit an individual if and only if

Page 10: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Credit Card Fraud Detection Credit Card Fraud Detection ExampleExample

ƒ It costs $90 to challenge a potential fraud

ƒ Assume that y(x) is the transaction amount

ƒ The cost-sensitive decision making policy will predict a transaction to be fraudulent if and only if

Page 11: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Adult DatasetAdult Dataset

ƒ Downloaded from UCI database.ƒ Associate a benefit factor 2 to positives

and a benefit factor 1 to negatives

ƒ The decision to predict positive is

Page 12: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Calculating probabilitiesCalculating probabilities

For decision trees, n is the number of examples in a node and k is the number of examples with class label , then the probability is more sophisticated methods

smoothing:early stopping, and early stopping plus smoothing

For rules, probability is calucated in the same way as decision trees

For naive Bayes, is the score for

class label , then

binning

Page 13: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los
Page 14: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Combining Technique-Combining Technique-AveragingAveraging

ƒ Each model computes an expected benefit for example over every class label

ƒ Combining individual expected benefit together

ƒ We choose the label with the highest combined expected benefit

Page 15: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

1. Decision threshold line2. Examples on the left are more profitable than those on the right3. "Evening effect": biases towards big fish.

Why accuracy is higher?Why accuracy is higher?

Page 16: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

ExperimentsExperiments

ƒ Decision Tree Learner: C4.5 version 8ƒ Dataset:

Donation Credit CardAdult

Page 17: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Accuracy comparisionAccuracy comparision

Page 18: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Accuracy comparisonAccuracy comparison

Page 19: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Accuracy comparisonAccuracy comparison

Page 20: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Detailed SpreadDetailed Spread

Page 21: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Credit Card Fraud DatasetCredit Card Fraud Dataset

Page 22: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Adult DatasetAdult Dataset

Page 23: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Why accuracy is higher?Why accuracy is higher?

Page 24: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

PruningPruning

D

D1 D2D2

large dataset

partition into

K subsets

ML1ML2 MLt

C1 C2Ck

generate

K models

Pruning

C1 C2Ck

Keep k models

Page 25: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

TechniquesTechniques

ƒ Always use greedy to choose the next classifier.

ƒ Criteria:Directly use accuracy or total benefits: choose the most accurateMost diversifiedMost accuratecombinations

ƒ Result: directly use accuracy is the best

Page 26: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Pruning ResultsPruning Results

Page 27: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Dynamic scheduling Dynamic scheduling

ƒ For a fixed number of classifiers, do we need every classifier to predict on every example? Not necessarily.

ƒ Some examples are easier to predict than others. For easier examples, we don't require as many classifiers as more difficult ones.

ƒ Techniques:Order the classifiers according their accuracy into a pipelineThe most accurate classifier is always called first.Each prediction generates a confidence that describes the likelihood of the current prediction to be the same as the prediction by the fixed number of classifiers.If the confidence is too low, more classifiers will be employed.

Page 28: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Dynamic SchedulingDynamic Scheduling

D

C1

C1 C1

C1 C1 C1

predicted examples

(pred, conf)

(pred, conf)(C1)

(pred, conf)

(pred, conf)(C1,C2)

(pred, conf)

(pred, conf)(C1, C2,C3)

Page 29: Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los

Dynamic Scheduling ResultDynamic Scheduling Result