96
Better than Deep Learning: Gradient Boosting Machines (GBM) Szilárd Pafka, PhD Chief Scientist, Epoch (USA) ½ Day Workshop, Budapest Data Forum Conference June 2018

Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Better than Deep Learning:Gradient Boosting Machines (GBM)

Szilárd Pafka, PhDChief Scientist, Epoch (USA)

½ Day Workshop, Budapest Data Forum ConferenceJune 2018

Page 2: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 3: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

At a Glance...

ML: sup.L: y = f(x) “learn” f from data (y, X)training, testing/prediction, algos (LR,DT,NN…), optimization, overfitting, regularization...

GBM: ensemble of decision trees

GBM libs: R/Python

Page 4: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 5: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

other than GBMs

Page 6: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 7: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Disclaimer:

✔ I understand this is an intermediate/advanced workshop

Prerequisites:

basic ML conceptsR/Python experience

Page 8: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Schedule:

1. Intro talk (slides)

2. Demo main features (me running code)

3. Hands-on (you install/run code)

Page 9: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 10: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 11: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Student Intros / Goals

Page 12: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Disclaimer:

I am not representing my employer (Epoch) in this talk

I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk

Page 13: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Andrew Ng

Page 14: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Andrew Ng

Page 15: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Andrew Ng

Page 16: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 17: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 18: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 19: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 20: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: https://twitter.com/iamdevloper/

Page 21: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 22: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

...

Page 23: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 24: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 25: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

Page 26: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

Page 27: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 28: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 29: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 30: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 31: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

Page 32: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends

Page 33: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all

Page 34: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all / hyperparam tuning

Page 35: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all / hyperparam tuning / ensembles

Page 36: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all / hyperparam tuning / ensemblesfeature engineering

Page 37: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all / hyperparam tuning / ensemblesfeature engineering / other goals e.g. interpretability

Page 38: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all / hyperparam tuning / ensemblesfeature engineering / other goals e.g. interpretability

the title of this talk was misguided

Page 39: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

structured/tabular data: GBM (or RF)very small data: LRvery large sparse data: LR with SGD (+L1/L2)images/videos, speech: DL

it depends / try them all / hyperparam tuning / ensemblesfeature engineering / other goals e.g. interpretability

the title of this talk was misguidedbut so is recently almost every use of the term AI

Page 40: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Hastie etal, ESL 2ed

Page 41: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Hastie etal, ESL 2ed

Page 42: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Hastie etal, ESL 2ed

Page 43: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

Source: Hastie etal, ESL 2ed

Page 44: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 45: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

I usually use other people’s code [...] I can find open source code for what I want to do, and my time is much better spent doing research and feature engineering -- Owen Zhanghttp://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/

Page 46: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 47: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 48: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 49: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 50: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 51: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 52: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 53: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 54: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 55: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 56: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

10x

Page 57: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 58: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 59: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

10x

Page 60: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 61: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 62: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 63: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 64: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 65: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 66: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 67: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 68: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 69: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 70: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 71: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 72: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 73: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 74: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 75: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 76: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf

Page 77: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

http://www.argmin.net/2016/06/20/hypertuning/

Page 78: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 79: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 80: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 81: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

ML training:

lots of CPU coreslots of RAM

limited time

Page 82: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

ML training:

lots of CPU coreslots of RAM

limited time

Page 83: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 84: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 85: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 86: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

“people that know what they’re doing just use open source [...] the same open source tools that the MLaaS services offer” - Bradford Cross

Page 87: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 88: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 89: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 90: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 91: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

no-one is using this crap

Page 92: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 93: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 94: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large
Page 95: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large

More:

Page 96: Better than Deep Learning: Gradient Boosting Machines (GBM)biconsulting.hu/letoltes/2018budapestdata/pafka... · structured/tabular data: GBM (or RF) very small data: LR very large