Improving Model Predictions via Stacking and Hyper-parameters Tuning

Improv ing Model Pred ictions v ia S tack ing and Hyper-Parameters Tuning

Jo-fai (Joe) ChowData Scientistjoe@h2o.ai

@matlabulus

About Me

• 2005 - 2015

• Water Engineero Consultant for Utilitieso EngD Research

• 2015 - Present

• Data Scientisto Virgin Mediao Domino Data Labo H2O.ai

Mango Data Science Radar

About This Talk

• Predictive modellingo Kaggle as an example

• Improve predictions with simple tricks

• Use data science for social good 👍

About Kaggle

• World’s biggest predictive modelling competition platform

• 560k members• Competition types:

o Featured (prize)o Recruitmento Playground o 101

Predicting Shelter Animal Outcomes

• X: Predictorso Nameo Gendero Type (🐱 or 🐶)o Date & Timeo Ageo Breedo Colour

• Y: Outcomes (5 types)o Adoptiono Diedo Euthanasiao Return to Ownero Transfer

• Datao Training (27k samples)o Test (11k)

Basic Feature Engineering

X Raw (Before) Reformatted (After)

Name Elsa, Steve, Lassie [name_len]: 4, 5, 6

Date & Time 2014-02-12 18:22:00 [year]: 2014[month]: 2[weekday]: 4[hour]: 18

Age 1 year, 3 weeks, 2 days [age_day]: 365, 21, 2

Breed German Shepherd, Pit Bull Mix [is_mix]: 0, 1

Colour Brown Brindle/White [simple_colour]: brown

Common Machine Learning Techniques

• Ensembleso Bagging/boosting of

decision treeso Reduces variance and

increase accuracyo Popular R Packages (used

in next example)• “randomForest”• “xgboost”

• There are a lot more machine learning packages in R:o “caret”, “caretEnsemble”o “h2o”, “h2oEnsemble”o “mlr”

Simple Trick – Model Averaging

• Stratified samplingo 80% for trainingo 20% for validation

• Evaluation metrico Multi-class Log Losso Lower the bettero 0 = Perfect

• 50 runso different random seed

More Advanced Methods

• Model Stackingo Uses a second-level metalearner

to learn the optimal combination of base learners

o R Packages:• “SuperLearner”• “subsemble”• “h2oEnsemble”• “caretEnsemble”

• Hyper-parameters Tuningo Improves the performance of

individual machine learning algorithms

o Grid search• Full / Random

o R Packages:• “caret”• “h2o”

For more info, seehttps://github.com/h2oai/h2o-meetups/tree/master/2016_05_20_MLconf_Seattle_Scalable_Ensembleshttps://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/gbmTuning.Rmd

Trade-Off of Advanced Methods

• Strengtho Model tuning + stacking

won nearly all Kaggle competitions.

o Multi-algorithm ensemble may better approximate the true predictive function than any single algorithm.

• Weaknesso Increased training and

prediction times.o Increased model

complexity.o Requires large machines

or clusters for big data.

R + H2O = Scalable Machine Learning

• H2O is an open-source, distributed machine learning library written in Java with APIs in R, Python and more.

• ”h2oEnsemble” is the scalable implementation of the Super Learner algorithm for H2O.

H2O Random Grid Search ExampleDefine search range and criteria

Best models

H2O Model Stacking Example

17 out of 717 teams (≈ top 2%)

Getting reasonable results Using h2o.stack(…) to combine multiple models

Conclusions

• Many R packages for predictive modelling.

• Use hyper-parameters tuning to improve individual models.

• Use model averaging / stacking to improve predictions.

• Trade-off between model performance and computational costs.

• Use R + H2O for scalable machine learning.

• H2O random grid search and stacking.

• Use data science for social good 👍

Big Thank You!

• Mango Solutions• RStudio• Domino Data Lab• H2O

o Erin LeDello Raymond Pecko Arno Candel

1st LondonR TalkCrime Map Shiny Appbit.ly/londonr_crimemap

2nd LondonR TalkDomino API Endpointbit.ly/1cYbZbF

Any Questions?

• Contacto joe@h2o.aio @matlabulouso github.com/woobe

• Slides & Codeo github.com/h2oai/h2o-

meetups

• H2O in Londono Meetups / Office (soon)o www.h2o.ai/careers

• More H2O at Strata tomorrowo Innards of H2O (11:15)o Intro to Generalised Low-Rank

Models (14:05)

Extra Sl ide (Stratified Sampling)

Improving Model Predictions via Stacking and Hyper-parameters Tuning

Data & Analytics

Loading, Unloading, Stacking

Verb Specification Guide - Steelcase - Office Furniture ... · PDF fileChevron Tables 8 Team Tables ... Verb Specification Guide ... Stacking Stacking Stacking L Stacking T Stacking

Core Stacking

TM STACK-RACK Portable Stacking Racks - ESCPSTACK-RACK Portable Stacking Racks ESCP Portable Stacking racks are bene˜cial in cost, time, and space saving versatility. Stacking racks

Stacking Booklet

Molded Plastic Stacking Chairs - Furniture Concepts · 2019-05-03 · Molded plastic stacking chairs, stacking chairs, stacking chairs for high abuse environments, durable stacking

Stacking Process

Cable Stacking

Barrier RF Stacking

Stacking the Decks

Bracketing - Stacking

Stacking cups

Habit stacking

TM STACK-RACK Portable Stacking Racks - escp.net · STACK-RACK Portable Stacking Racks ESCP Portable Stacking racks are bene˜cial in cost, time, and space saving versatility. Stacking

STACKING & BUNDLING

Stacking GDD

Stacking Cups!

FLEXIBLE STACKING

Stacking Velocity

Stacking stones