Upload
0xdata
View
620
Download
4
Embed Size (px)
DESCRIPTION
Citation preview
Beauty and Big Data [Made possible by H2O and Tableau]
Amy Wang
“A data scientist knows more statistics than a computer scientist and more computer science than a statistician.”
What is H2O?Open source in-memory prediction engineMath Platform
• Parallelized and distributed algorithms making the most use out of multithreaded systems
• GLM, Random Forest, GBM, PCA, etc
Easy to use and adoptAPI• Written in Java – perfect for Java Programmers• REST API (JSON) – drives H2O from R, python, excel
More data? Or better models? Both?Big Data• Use all of your data – model without down sampling• Run a simple GLM or a more complex GBM to find the best fit for the data• More Data + Better Models = Better Predictions
SQLHDFS NoSQLS3
RJSON
H2O
Scala
Java
Intelligent Enterprise Applications
Prediction Engine
Memory Manager
ensemblesSolvers
Deep learningCluster
Classify
Regression
Trees
Forest
Boosting
Gradients
Processes
Nano Fast Scoring Engine
Columnar Compression
Query Processor R-engine
In-Mem Map Reduce
2M Row ingest/ sec
50M Row Regression / sec
750M Row Aggregates / sec
On PremiseOn / Off HadoopOn EC2
Python
Installation Process
Start playing with H2O with R yourself!Grab H2O and our R package: • Download from website : 0xdata.com/downloads• Build from git: https://github.com/0xdata/h2oGet support at: • http://docs.0xdata.com/
Demo: Big Data Workflow using R with H2O
OSEMN
INterpret [in Tableau]
Model [in H2O] and Explore the different models
Explore [in R or Tableau]
Obtain and Scrub
H2O
Data
REST API
Local Socket Server
Demo: Big Data Modeling Visualization in Tableau through R with H2O
A little about us
AdvisorsSystems, Data, File Systems and Hadoop
Scientific Advisory Council
Investors
Doug LeaACM Fellow, Malloc for C, fork-join, java memory model, suny Oswego
Chris PouliotVP of Data Science, Lyft, formerly, Netflix, Google
Dhruba BorthakurHDFS, Hive, Facebook
Stephen BoydProfessor of EE Engineering, Stanford
Rob TibshiraniProfessor of Health Research and Policy, and Statistics, Stanford
Trevor HastieProfessor of Statistics, Stanford
Jishnu BhattacharjeeNexus Venture Partners
Anand Babu PeriasamyFounder, Gluster (RedHat)
Anand RajaramanFounder, Junglee (Amazon) Kosmix (WalmartLabs)
Dipchand “Deep” NisharSVP of Products & UX (LinkedIn)
We’ve Got the Who’s Who of Predictive Analytics