34
Introducing Boosted Trees BigML Winter 2017 Release

BigML Winter 2017 Release

Embed Size (px)

Citation preview

Page 1: BigML Winter 2017 Release

Introducing Boosted Trees

BigML Winter 2017 Release

Page 2: BigML Winter 2017 Release

BigML, Inc 2Winter Release Webinar - March 2017

Winter 2017 Release

POUL PETERSEN (CIO)

Enter questions into chat box – we’ll answer some via chat; others at the end of the session

https://bigml.com/releases

ATAKAN CETINSOY, (VP Predictive Applications)

Resources

Moderator

Speaker

Contact [email protected]

Twitter @bigmlcom

Questions

Page 3: BigML Winter 2017 Release

BigML, Inc 3Winter Release Webinar - March 2017

Boosted Trees

• Ensemble method

• Supervised Learning

• Classification & Regression

• Now "BigML Easy"!

NUMBER OF MODELS1

NUMBER OF MODELS10

NUMBER OF MODELS20

Page 4: BigML Winter 2017 Release

BigML, Inc 4Winter Release Webinar - March 2017

Questions…

• Novice: Wait, what is Supervised Learning again?

• Intermediate: When are ensemble methods useful and how are Boosted Trees different from the ensemble methods already available in BigML?

• Power User: How does Boosting work and what do all these knobs do?

Page 5: BigML Winter 2017 Release

BigML, Inc 5Winter Release Webinar - March 2017

Instances

Supervised Learning

4113 NW Peppertree 4 2 1876 7841 1977 44,598141 -123,29689 366000

2450 NW 13th 3 2 1502 9148 1964 44,593322 -123,267435 265000

3545 NE Canterbury 3 1,5 1172 10019 1972 44,603042 -123,236332 245000

1245 NW Kainui 4 2 1864 49658 1974 44,651826 -123,250615 341500

FeaturesADDRESS BEDS BATHS SQFT LOT

SIZEYEAR BUILT

LATITUDE LONGITUDE LAST SALE PRICE

1522 NW Jonquil

4 3 2424 5227 1991 44,594828 -123,269328 360000

7360 NW Valley Vw

3 2 1785 25700 1979 44,643876 -123,238189 307500

4748 NW Veronica

5 3,5 4135 6098 2004 44,5929659 -123,306916 600000

411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350

2975 NW Stewart 3 1,5 1327 9148 1973 44,597319 -123,25452 245000

LAST SALE PRICE

360000

307500

600000

435350

245000

366000

265000

245000

341500

L a b

Supervised Learning: There is a column of data, the label, we want to predict.

Page 6: BigML Winter 2017 Release

BigML, Inc 6Winter Release Webinar - March 2017

Classification & Regression

animal state … proximity actiontiger hungry … close run

elephant happy … far take picture… … … … …

Classification

animal state … proximity min_kmhtiger hungry … close 70

hippo angry … far 10… …. … … …

Regression

label

Page 7: BigML Winter 2017 Release

BigML, Inc 7Winter Release Webinar - March 2017

Evaluation

Instances Features

ADDRESS BEDS BATHS SQFT LOT SIZE

YEAR BUILT

LATITUDE LONGITUDE LAST SALE PRICE

1522 NW Jonquil

4 3 2424 5227 1991 44,594828 -123,269328 360000

7360 NW Valley Vw

3 2 1785 25700 1979 44,643876 -123,238189 307500

4748 NW Veronica

5 3,5 4135 6098 2004 44,5929659 -123,306916 600000

411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350

2975 NW Stewart 3 1,5 1327 9148 1973 44,597319 -123,25452 245000

LAST SALE PRICE

360000

307500

600000

435350

245000

L a b

4113 NW Peppertree 4 2 1876 7841 1977 44,598141 -123,29689 366000

2450 NW 13th 3 2 1502 9148 1964 44,593322 -123,267435 265000

3545 NE Canterbury 3 1,5 1172 10019 1972 44,603042 -123,236332 245000

1245 NW Kainui 4 2 1864 49658 1974 44,651826 -123,250615 341500

366000

265000

245000

341500

MODEL

EVALUATION

Page 8: BigML Winter 2017 Release

BigML, Inc 8Winter Release Webinar - March 2017

Demo #1

Page 9: BigML Winter 2017 Release

BigML, Inc 9Winter Release Webinar - March 2017

Why Ensembles?

MODEL PREDICTIONDATASET

• Model can "overfit" - especially if there are outliers. Consider: mansions

• Model might "underfit" - especially if there is not enough training data to discover a single good representation. Consider: No mid-size homes

• The data may be too "complicated" for the model to represent at all. Consider: Trying to fit a line to exponential pricing

• Models might be "unlucky" - algorithms typically rely on stochastic processes. This is more subtle…

Problems: Ensemble Intuition: • Build many models, each with only some of

the data and outliers will be averaged out

• Built many models instead of relying on one guess.

• Built many models, allowing each to focus on part of the data thus increasing complexity of the solution.

• Built many models to avoid unlucky initial conditions.

Page 10: BigML Winter 2017 Release

BigML, Inc 10Winter Release Webinar - March 2017

Decision Forest

MODEL 1

DATASET

SAMPLE 1

SAMPLE 2

SAMPLE 3

SAMPLE 4

MODEL 2

MODEL 3

MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

PREDICTION

VOTE

aka "Bagging"

Page 11: BigML Winter 2017 Release

BigML, Inc 11Winter Release Webinar - March 2017

Demo #2

Page 12: BigML Winter 2017 Release

BigML, Inc 12Winter Release Webinar - March 2017

Random Decision Forest

MODEL 1

DATASET

SAMPLE 1

SAMPLE 2

SAMPLE 3

SAMPLE 4

MODEL 2

MODEL 3

MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

SAMPLE 1

PREDICTION

VOTE

Page 13: BigML Winter 2017 Release

BigML, Inc 13Winter Release Webinar - March 2017

Demo #3

Page 14: BigML Winter 2017 Release

BigML, Inc 14Winter Release Webinar - March 2017

BoostingQuestion:• Instead of sampling tricks… can we incrementally improve a single

model?

MODEL BATCH PREDICTION

Intuition: • The training data has the correct answer, so we know the errors

exactly. Can we model the errors?

DATASET

label

DATASET

label prediction

DATASET

label prediction

error

Page 15: BigML Winter 2017 Release

BigML, Inc 15Winter Release Webinar - March 2017

BoostingADDRESS BEDS BATHS SQFT LOT

SIZEYEAR BUILT LATITUDE LONGITUDE LAST SALE

PRICE

1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 360000

7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 307500

4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 600000

411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 435350

MODEL 1

PREDICTED SALE PRICE

360750

306875

587500

435350

ERROR

750

-625

-12500

0

ADDRESS BEDS BATHS SQFT LOT SIZE

YEAR BUILT LATITUDE LONGITUDE ERROR

1522 NW Jonquil 4 3 2424 5227 1991 44,594828 -123,269328 750

7360 NW Valley Vw 3 2 1785 25700 1979 44,643876 -123,238189 625

4748 NW Veronica 5 3,5 4135 6098 2004 44,5929659 -123,306916 12500

411 NW 16th 3 2825 4792 1938 44,570883 -123,272113 0

MODEL 2

PREDICTED ERROR

750

625

12393,83333

6879,67857

Why stop at one iteration?

• "Hey Model 1, what do you predict is the sale price of this home?"

• "Hey Model 2, how much error do you predict Model 1 just made?"

Page 16: BigML Winter 2017 Release

BigML, Inc 16Winter Release Webinar - March 2017

Boosting

DATASET MODEL 1

DATASET 2 MODEL 2

DATASET 3 MODEL 3

DATASET 4 MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

PREDICTION

SUM

Iteration 1

Iteration 2

Iteration 3

Iteration 4etc…

Page 17: BigML Winter 2017 Release

BigML, Inc 17Winter Release Webinar - March 2017

Boosting

DATASET MODEL 1

DATASET 2 MODEL 2

DATASET 3 MODEL 3

DATASET 4 MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

PREDICTION

SUM

Iteration 1

Iteration 2

Iteration 3

Iteration 4etc…

Page 18: BigML Winter 2017 Release

BigML, Inc 18Winter Release Webinar - March 2017

Demo #4

Page 19: BigML Winter 2017 Release

BigML, Inc 19Winter Release Webinar - March 2017

Boosting Parameters• Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping:

• Early out of bag: tests with the out-of-bag samples

Page 20: BigML Winter 2017 Release

BigML, Inc 20Winter Release Webinar - March 2017

“OUT OF BAG” SAMPLES

Boosting

DATASET MODEL 1

DATASET 2 MODEL 2

DATASET 3 MODEL 3

DATASET 4 MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

PREDICTION

SUM

Iteration 1

Iteration 2

Iteration 3

Iteration 4etc…

Page 21: BigML Winter 2017 Release

BigML, Inc 21Winter Release Webinar - March 2017

Boosting Parameters• Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping:

• Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a

high number of iterations and let the early stopping work.

Page 22: BigML Winter 2017 Release

BigML, Inc 22Winter Release Webinar - March 2017

IterationsBoosted Ensemble #1

1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50

Early Stop # Iterations

1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50

Boosted Ensemble #2Early Stop# Iterations

This is OK because the early stop means the iterative improvement is smalland we have "converged" before being forcibly stopped by the # iterations

This is NOT OK because the hard limit on iterations stopped improving the quality of the boosting long before there was enough iterations to have achieved the best quality.

Page 23: BigML Winter 2017 Release

BigML, Inc 23Winter Release Webinar - March 2017

Boosting Parameters• Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping:

• Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a

high number of iterations and let the early stopping work. • Learning Rate: Controls how aggressively boosting will fit the data:

• Larger values ~ maybe quicker fit, but risk of overfitting • You can combine sampling with Boosting!

• Samples with Replacement • Add Randomize

Page 24: BigML Winter 2017 Release

BigML, Inc 24Winter Release Webinar - March 2017

Boosting

DATASET MODEL 1

DATASET 2 MODEL 2

DATASET 3 MODEL 3

DATASET 4 MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

PREDICTION

SUM

Iteration 1

Iteration 2

Iteration 3

Iteration 4etc…

Page 25: BigML Winter 2017 Release

BigML, Inc 25Winter Release Webinar - March 2017

Boosting

DATASET MODEL 1

DATASET 2 MODEL 2

DATASET 3 MODEL 3

DATASET 4 MODEL 4

PREDICTION 1

PREDICTION 2

PREDICTION 3

PREDICTION 4

PREDICTION

SUM

Iteration 1

Iteration 2

Iteration 3

Iteration 4etc…

Page 26: BigML Winter 2017 Release

BigML, Inc 26Winter Release Webinar - March 2017

Boosting Parameters• Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping:

• Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a

high number of iterations and let the early stopping work. • Learning Rate: Controls how aggressively boosting will fit the data:

• Larger values ~ maybe quicker fit, but risk of overfitting • You can combine sampling with Boosting!

• Samples with Replacement • Add Randomize

• All other tree based options are the same: • weights: for unbalanced classes • missing splits: to learn from missing data • node threshold: increase to allow "deeper" trees

Page 27: BigML Winter 2017 Release

BigML, Inc 27Winter Release Webinar - March 2017

Wait a second….

pregnancies plasma glucose

blood pressure

triceps skin thickness insulin bmi diabetes

pedigree age diabetes

6 148 72 35 0 33,6 0,627 50 TRUE

1 85 66 29 0 26,6 0,351 31 FALSE

8 183 64 0 0 23,3 0,672 32 TRUE

1 89 66 23 94 28,1 0,167 21 FALSE

MODEL 1

predicted diabetes

TRUE

TRUE

FALSE

FALSE

ERROR

?

?

?

?

… what about classification?

pregnancies plasma glucose

blood pressure

triceps skin thickness insulin bmi diabetes

pedigree age diabetes

6 148 72 35 0 33,6 0,627 50 1

1 85 66 29 0 26,6 0,351 31 0

8 183 64 0 0 23,3 0,672 32 1

1 89 66 23 94 28,1 0,167 21 0

MODEL 1

predicted diabetes

1

1

0

0

ERROR

0

-1

1

0

… we could try

Page 28: BigML Winter 2017 Release

BigML, Inc 28Winter Release Webinar - March 2017

Wait a second….

pregnancies plasma glucose

blood pressure

triceps skin thickness insulin bmi diabetes

pedigree age favorite color

6 148 72 35 0 33,6 0,627 50 RED

1 85 66 29 0 26,6 0,351 31 GREEN

8 183 64 0 0 23,3 0,672 32 BLUE

1 89 66 23 94 28,1 0,167 21 RED

MODEL 1

predicted favorite color

BLUE

GREEN

RED

GREEN

ERROR

?

?

?

?

… but then what about multiple classes?

Page 29: BigML Winter 2017 Release

BigML, Inc 29Winter Release Webinar - March 2017

Boosting Classification

pregnancies plasma glucose

blood pressure

triceps skin thickness insulin bmi diabetes

pedigree age favorite color

6 148 72 35 0 33,6 0,627 50 RED

1 85 66 29 0 26,6 0,351 31 GREEN

8 183 64 0 0 23,3 0,672 32 BLUE

1 89 66 23 94 28,1 0,167 21 RED

MODEL 1 RED/NOT RED

Class RED Probability

0,9

0,7

0,46

0,12

Class RED ERROR

0,1

-0,7

0,54

-0,12

pregnancies plasma glucose

blood pressure

triceps skin thickness insulin bmi diabetes

pedigree age ERROR

6 148 72 35 0 33,6 0,627 50 0,1

1 85 66 29 0 26,6 0,351 31 -0,7

8 183 64 0 0 23,3 0,672 32 0,54

1 89 66 23 94 28,1 0,167 21 -0,12

MODEL 2 RED/NOT RED ERR

PREDICTED ERROR

0,05

-0,54

0,32

-0,22

MODEL 1 BLUE/NOT BLUE

Class BLUE Probability

0,1

0,3

0,54

0,88

Class BLUE ERROR

-0,1

0,7

-0,54

0,12…and repeat for each class at each iteration

…and repeat for each class at each iteration

Iteration 1

Iteration 2

Page 30: BigML Winter 2017 Release

BigML, Inc 30Winter Release Webinar - March 2017

Boosting Classification

DATASET MODELS 1 per class

DATASETS 2 per class

MODELS 2 per class

PREDICTIONS 1 per class

PREDICTIONS 2 per class

PREDICTIONS 3 per class

PREDICTIONS 4 per class

Comb

PROBABILITY per class

MODELS 3 per class

MODELS 4 per class

DATASETS 3 per class

DATASETS 4 per class

Iteration 1

Iteration 2

Iteration 3

Iteration 4etc…

Page 31: BigML Winter 2017 Release

BigML, Inc 31Winter Release Webinar - March 2017

Demo #5

Page 32: BigML Winter 2017 Release

BigML, Inc 32Winter Release Webinar - March 2017

ProgrammableAPI First!

Bindings!

WhizzML!

Page 33: BigML Winter 2017 Release

BigML, Inc 33Winter Release Webinar - March 2017

What to use?• The one that works best!

• Ok, but seriously. Did you evaluate? • For "large" / "complex" datasets

• Use DF/RDF with deeper node threshold • Even better, use Boosting with more iterations

• For "noisy" data • Boosting may overfit • RDF preferred

• For "wide" data • Randomize features (RDF) will be quicker

• For "easy" data • A single model may be fine • Bonus: also has the best interpretability!

• For classification with "large" number of classes • Boosting will be slower

• For "general" data • DF/RDF likely better than a single model or Boosting. • Boosting will be slower since the models are processed serially

Page 34: BigML Winter 2017 Release

Questions?

Twitter: @bigmlcomMail: [email protected]: https://bigml.com/releases