10
Hands On : PySparkling Water - By Nidhi Mehta

Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

  • Upload
    others

  • View
    32

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Hands On : PySparkling Water

- By Nidhi Mehta

Page 2: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

What is PySparkling Water

PySparkling Water = Python + Spark + H2O

Sparkling Water Python +

Page 3: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Py4J

H2O Context

Spark Context

H2O Python

h2o.init ( ip, port )

Driver Python

Cluster Manager

Executor

H2O

Executor

H2O

H2O Rest API

Master Workers

PySparkling Architecture

Page 4: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Aim: Build a model to predict Arrest for Chicago crime dataset

● Import Chicago Crime Dataset● Combine Crime data with Census and Weather

data.● Build a model to predict whether an arrest was

made● Predict on a test dataset

Demo Workflow

Page 5: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

- Install Spark-1.5.1

- Install and Build Sparkling Water-1.5.6

( ./gradlew build -x check )

- Install H2O-3.6.0.3

- Install H2O-python

( sudo pip install h2o-3.6.0.3-py2.py3-none-any.whl )

Pre Requisites to run the demo

Page 6: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

1)

Set spark environment by specifying SPARK_HOME and Master

export SPARK_HOME =Path_to_Spark_dir

export MASTER ='local-cluster[2,8,6040]'

2)

- To run from Python notebook-

IPYTHON_OPTS="notebook" Path_to_Sparkling_dir/bin/pysparkling

- To run from regular Python shell

Path_to_Sparkling_dir/bin/pysparkling

Command to Start/Access PySparking Water Cluster

Page 7: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Let's Run the Demo!

Page 8: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Why use PySparkling

● Automatic Parallelization and less lines of code

● Much Faster on big data - uses H2O's rest API calls to connect to H2O Cluster

Page 9: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Thank You

Page 10: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

What do these stickers mean?

I have Sparkling Water Installed

I have Python installed

I have H2O installed

I have the H2O World data sets

Pick up stickers or get install help at the information booth