34
MUNGING, MODELING, AND PIPELINES USING PYTHON Hank Roark

H2O World - Munging, modeling, and pipelines using Python - Hank Roark

Embed Size (px)

Citation preview

Page 1: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MUNGING, MODELING,AND PIPEL INES USING PYTHON

Hank Roark

Page 2: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

COMMUNITY FEEDBACK

Pythonic Interface to H2O, R interface parity

Rapid learning and iteration

Leverage existing knowledge and skills

Interface cleanly with PyData ecosystem

More Environments, esp. PySpark

Python Pipelines to Production

Page 3: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

EXAMPLE FROM THE IOTDomain: Prognostics and Health ManagementMachine: Turbofan Jet EnginesData Set: A. Saxena and K. Goebel (2008). "Turbofan Engine Degradation Simulation Data Set", NASA Ames Prognostics Data Repository

Predict Remaining Useful Life from Partial Life Runs

Six operating modes, two failure modes, manufacturing variability

Training: 249 jet engines run to failureTest: 248 jet engines

Page 4: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

WHY THIS EXAMPLE?

GETTING READY FOR BRONTOBYTES

Page 5: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

LOADING DATA

Page 6: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

SUMMARY STATISTICS

Page 7: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

FEATURE ENGINEERING

Calculate Total CyclesFor Each Unit

Page 8: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

FEATURE ENGINEERING

Append To OriginalFrame

Page 9: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

FEATURE ENGINEERING

Create New Feature of Cycles

Remaining

Page 10: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

EXPLORATORY DATA ANALYSISBoolean Indexing

Page 11: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

EXPLORATORY DATA ANALYSISSample thedata to local

memory

Page 12: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

EXPLORATORY DATA ANALYSIS

Use yourfavorite

visualizationtools

(Seaborn!)

Ugh, where are

trendsover time

Time

ZeroRemainingUsefulLife

Page 13: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODEL BASED DATA ENRICHMENTSensor

measurementsappear inclusters

Correspondingto operating

mode!

Page 14: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODEL BASED DATA ENRICHMENT

Use H2O k-means to find cluster

centers

Page 15: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODEL BASED DATA ENRICHMENT

Enrich existing datawith operating mode

membership

Page 16: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MORE FEATURE ENGINEERINGFor non-constant

sensor measurements

within an operating mode,

Standardize each sensor measurement

by operating mode

Based on thetraining data

Page 17: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

TRENDS OVER TIME!

Before H2O Munging

Ready for H2O Learning

Time Time

Page 18: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODELING

Configure anEstimator

Page 19: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODELING

Train an Estimator

Page 20: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODEL EVALUATIONEvaluate Performance

at a glancein Python

Page 21: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODEL EVALUATIONEvaluate Performance

at a glancein H2O Flow

Page 22: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MODEL EVALUATIONEvaluate Performance

at a glancegraphically in Python

Page 23: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

CROSS VALIDATION

SetupHyperparameterSearch Options

Page 24: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

CROSS VALIDATION

Configurefull full

grid search

Page 25: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

CROSS VALIDATION

Executegrid search

Page 26: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

CROSS VALIDATION

Evaluate results &model selection

Page 27: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MORE CONTROL – SCIKIT PIPELINES

Create Pipelines

Hyperparameter Options

Cross validation strategy

HyperparameterSearch Strategy

Fit

Page 28: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

DATA PIPELINES USING H2OASSEMBLY

TypicalData Preparation

Add some structure

Page 29: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

H2OASSEMBLY TO PRODUCTION

Javafor

ProductionScoring

Python

Page 30: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

MORE ENVIRONMENTS

PySparkling Water = Python + Spark + H2O

Python + Sparkling Water

Page 31: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

COMMUNITY FEEDBACK

Pythonic Interface to H2O, R interface parity

Rapid learning and iteration

Leverage existing knowledge and skills

Interface cleanly with PyData ecosystem

More Environments, esp. PySpark

Python Pipelines to Production

Page 32: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

RESULTSH2O Python Framework:

H2OFrame & H2OEstimators

H2OAssembly for Data Prep Pipelines

Python, Jupyter Notebooks,Pandas, Scikit-Learn Integration

PySparkling Water

Page 33: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

RESOURCES

• Python booklet• Tibshirani release• Python documentation• Github examples• Jupyter Notebook of Example

Page 34: H2O World - Munging, modeling, and pipelines using Python - Hank Roark

THANK YOU