21
Open Source AI Platform for Business Transformation

Start Getting Your Feet Wet in Open Source Machine and Deep Learning

Embed Size (px)

Citation preview

Open Source AI Platform for Business Transformation

Desmond ChanSenior Director of Marketing, H2O.ai

Agenda for H2O Introduction Webinar

▪ Company Introduction (5 mins)

▪ H2O Introduction and Demo (35 mins)

– Installation of H2O

– Flight delay prediction use case

• Use case description

• Data set description

• Data munging

• Model creation

▪ Q&A (10 mins)

H2O AI Platform

In-Memory, Distributed Machine Learning with

Visual Intelligence

H2O AI in Spark with Data Prep and ML

Pipelines

Operationalize Model Building and Deployment

Governance.

Best-of-breed GPU Deep Learning

with easy API and AutoML TensorFlow, MXNet or Caffe

and H2O

Deep Water

AI For Business Transformation

Insights on Text, Images, Transactions,

Speech

Best Machine Learning Algorithms

on Spark

Platform to Build and Scale Data Products. Dual licensing (AGPL

and Commercial)

H2O is the #1 Platform for Open Source AI

Open Source Drives Community Adoption

Companies Using H2O.ai

2014 2015 2016 2017

9173

6427

3810

400

H2O.ai Users

2014 2015 2016 2017

83108

54163

38257

1000

* Data from July of every year, except for 2017 when data from Feb 21st are used.

H2O Recognized by Press and Customers

H2O.ai Strongly Positioned in Key Analyst Reports and Press

“Overall customer satisfaction is very high.”

“H2O is especially suited to IoT edge and device scenarios.”

“H2O had the highest reference customer analytics support score of all the vendors.”

H2O.ai is a Visionary in the Gartner Magic Quadrantfor Data Science Platforms

“H2O.ai has significant adoption by large enterprises such as Macy’s, Comcast, and Capital One.”

“H2O.ai is best known for developing open source, cluster-distributed ML algorithms at a time (2011) when big data demanded them, but no one else had them.”

H2O.ai is a Strong Performer in the Forrester Predictive Analytics & Machine Learning

H2O.ai is a Top 10 Hot Artificial Intelligence (AI) Technologies on Forbes

H2O.ai named alongside Nvidia, Google, IBM, Intel, Microsoft, SAS, et al as in Top 10 Hot Artificial Intelligence (AI) on Forbes - contributed by Gil Press

H2O Use Cases – Videos and Talks

Auto Insurance

UBI Telematics

Commercial Insurance

Risk Analytics

Financial Services Customer Insights

Digital Marketing Consumer Behavior

Pawan Divarkarla Chief Data Officer

“H2O is an enabler in how people are

thinking about data.”

Conor Jensen Analytics Director

“Advanced analytics was one of the key

investments we decided to make.”

Brendan Herger Data Scientist

“H2O is the best solution to to iterate very quickly

on large datasets and produce meaning models.”

Satya Satyamoorthy Director, Software Dev

"I am a big fan of open source. H2O is the best fit in terms of cost as well as ease of use and

scalability and usability.”

Play Video Play Video Play Video Play Video

Progressive Zurich Capital One Nielsen Catalina

Amy WangMath Hacker, H2O.ai

What is H2O?Open%source%in,memory%prediction%engineMath%Platform

• Parallelized%and%distributed%algorithms%making%the%most%use%out%of%multithreaded%systems

• GLM,%Random%Forest,%GBM,%PCA,%etc.

Easy%to%use%and%adoptAPI• Written%in%Java%– perfect%for%Java%Programmers• REST%API%(JSON)%– drives%H2O%from%R,%Python,%Excel,%Tableau

More%data?%Or%better%models?%BOTHBig%Data• Use%all%of%your%data%– model%without%down%sampling• Run%a%simple%GLM%or%a%more%complex%GBM%to%find%the%best%fit%for%the%data• More%Data%+%Better%Models%=%Better%Predictions

Supervised Learning

H2O AlgorithmsStatistical Analysis

Ensembles

Deep Neural Networks

• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson, and Tweedie

• Naive Bayes: Binary Text Classification

• Distributed Random Forest: Classification or Regression Models • Gradient Boosting Machine: Ensembles of shallow decision trees with

increasing refined approximations

• Deep Learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Unsupervised Learning

Clustering

Dimensionality Reduction

Anomaly Detection

• K-means: Partition observations into k clusters of the same spatial size. Categorical features are one hot encoded.

• Archetypes [GLRM]: Partition observations into k archetypes.

• Principal Component Analysis: Linearly transforms correlated variables to independent components

• Generalized Low Rank Model: Approximates data set as a product of two low dimensional factors. Extends PCA to handle sparse data, categorical data, and adds regularization.

• Autoencoders [Deep Learning]: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

H2O Algorithms

Accuracy with Speed and Scale

HDFS

S3

SQL

NoSQL

Classification Regression

Feature Engineering

In-Memory

Map Reduce/Fork Join

Columnar Compression

Deep Learning

PCA, GLM, Cox

Random Forest / GBM Ensembles

Fast Modeling Engine

Streaming Nano Fast Java Scoring Engines

Matrix Factorization

Clustering

Munging

Reading Data into H2O with R

STEP 1

R user

h2o_df = h2o.importFile(“../data/allyears2k.csv”)

Reading Data from HDFS into H2O with R

H2OH2O

H2O

data.csv

HTTP REST API request to

H2Ohas HDFS path

H2O ClusterInitiate distributed

ingest

HDFSRequest data from

HDFS

STEP 22.2

2.3

2.4

Rh2o.importFile()

2.1R function

call

Reading Data from HDFS into H2O with R

H2OH2O

H2O

R

HDFS

STEP 3

Cluster IPCluster Port

Pointer to Data

Return pointer to data in REST API

JSON Response

HDFS provides

data

3.3

3.43.1h2o_df object

created in Rdata.csv

h2o_df H2OFram

e

3.2Distributed

H2OFrame in DKV

H2O Cluster

Data Munging in R

Installing in R

Installing in Python

R> install.packages(“h2o”)

Terminal$ pip install h2o

Demo Time!

Questions?

Thanks for joining us!