52
© 2016 Continuum Analytics - Confidential & Proprietary 1 Breaking Data Science Open How Open Data Science is Eating the World

Breaking Data Science Open

Embed Size (px)

Citation preview

© 2016 Continuum Analytics - Confidential & Proprietary 1

Breaking Data Science Open How Open Data Science is Eating the World

© 2016 Continuum Analytics - Confidential & Proprietary 2

Michele Chambers @mcAnalytics •  EVP Product & CMO Continuum Analytics •  M.B.A Duke University, B.S. Computer

Engineering •  Author

•  Breaking Data Science Open: O’Reilly •  Modern Analytics Methodologies: Driving Business

Value with Analytics Pearson FT Press •  Advanced Analytics Methodologies: Driving

Business Value with Analytics Pearson FT Press •  Big Data Big Analytics Wiley

About Us

© 2016 Continuum Analytics - Confidential & Proprietary 3

About Us Christine Doig @ch_doig

•  Sr. Data Scientist & Product Mgr. Continuum Analytics

•  M.S. Polytechnic University of Catalonia in Industrial Engineering

•  Open Source advocate and speaker •  PyData, EuroPython, SciPy, PyCon,

•  Author •  Breaking Data Science Open: O’Reilly

5+ years in advanced analytics, operations research, machine learning in energy, manufacturing & banking

© 2016 Continuum Analytics - Confidential & Proprietary 4

Business Intelligence & Predictive Analytics Using Data for Insight & Human-in-the-Loop actions

© 2016 Continuum Analytics - Confidential & Proprietary 5

Cognitive Intelligence Using Data & Deep Learning to Make Recommendations

© 2016 Continuum Analytics - Confidential & Proprietary 6

© 2016 Continuum Analytics - Confidential & Proprietary 7

© 2016 Continuum Analytics - Confidential & Proprietary 8

Open Data Science Connecting Data, Analytics & Computation

© 2016 Continuum Analytics - Confidential & Proprietary

“ ”9

An interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms.

Data Science is…

10

an inclusive movement that makes open source tools of data science

— data, analytics, & computation — easily work together

as a connected ecosystem

Open Data Science is…

© 2016 Continuum Analytics - Confidential & Proprietary

Data Science is not just Machine Learning…

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

© 2016 Continuum Analytics - Confidential & Proprietary

Data Science is Interdisciplinary…

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

Classification, deep learning, Regression, PCA

Hadoop, Spark Web crawling, scraping, 3rd party data & API providers, predictive services & APIs

GPUs, multi-cores Data warehouse, querying, reporting

© 2016 Continuum Analytics - Confidential & Proprietary

Numba

dask

xlwings

Airflow

Blaze Open Source Communities Creates Powerful Technology for Data Science

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

Machine Learning / Statistics

© 2016 Continuum Analytics - Confidential & Proprietary

Numba

dask

xlwings

Airflow

Blaze Python is the common language

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

Machine Learning / Statistics

© 2016 Continuum Analytics - Confidential & Proprietary

Python’s Not the Only One…

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

SQL

Machine Learning / Statistics

© 2016 Continuum Analytics - Confidential & Proprietary

But it’s also a Great Glue Language

Distributed Systems

Business Intelligence

Machine Learning / Statistics

Web

Scientific Computing / HPC

SQL

© 2016 Continuum Analytics - Confidential & Proprietary

Numba

dask

xlwings

Airflow

Blaze Anaconda is the Open Data Science Platform Bringing Technology Together…

Distributed Systems

Business Intelligence

Web

Scientific Computing / HPC

Machine Learning / Statistics

© 2016 Continuum Analytics - Confidential & Proprietary 18

Empowering the Data Science Team

Data Scientist Biz Analyst Data Engineer Developer DevOps

Explore & Analyze

Collaborate & Publish

Deploy & Operate

© 2016 Continuum Analytics - Confidential & Proprietary 19

Modern Data Science Teams use…

•  Hadoop / Spark •  Programming

Languages •  Analytic Libraries •  IDE •  Notebooks •  Visualization

•  Spreadsheets •  Visualization •  Notebooks •  Analytic

Development Environment

•  Database / Data Warehouse

•  ETL

•  Programming Languages

•  Analytic Libraries •  IDE •  Notebooks •  Visualization

•  Database / Data Warehouse

•  Middleware •  Programming

Languages

Data Scientist Biz Analyst Data Engineer Developer DevOps

RIGHT TECHNOLOGY FOR THE PROBLEM

© 2016 Continuum Analytics - Confidential & Proprietary 20

Modern Data Science Teams Want…

DATA SCIENCE COLLABORATION

SELF-SERVICE DATA SCIENCE

DATA SCIENCE DEPLOYMENT

OPEN DATA SCIENCE

© 2016 Continuum Analytics - Confidential & Proprietary 21

•  Accelerate Time-to-Value

•  Connect Data, Analytics & Compute

•  Empower Data Science Teams

…is the leading Open Data Science platform powered by Python the fastest growing data science language

© 2016 Continuum Analytics - Confidential & Proprietary 22

INNOVATE faster through managed agile experimentation

MOVE from analysis to deployment immediately

DELIVER powerful results backed by high performance open data science platform

LEVERAGE innovative open source analytics to extract value from data MAXIMIZE your computational power to easily analyze all data

CONNECT and integrate all your data sources for predictive models

ITERATE quickly to create powerful analysis and predictive models COLLABORATE and share with your data science team

PUBLISH interactive results to the business

ACCELERATE Time-to-Value

CONNECT Data, Analytics & Compute

EMPOWER Data Science Teams

© 2016 Continuum Analytics - Confidential & Proprietary 23

Open Data Science Platform ACCELERATE. CONNECT. EMPOWER

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary

Anaconda Gives Superpowers To People Who Change The World

© 2016 Continuum Analytics - Confidential & Proprietary 25

Open Data Science Vibrant and Growing Community

Python Community

30M+ Packages in Anaconda

720+

R Community

16M+ Spark Python Usage

50%+

ANACONDA Downloads

3M+

© 2016 Continuum Analytics - Confidential & Proprietary 26

Financial Services •  Risk management, Quant modeling, Data exploration

and processing, algorithmic trading, compliance reporting

Government •  Fraud detection, data crawling, web & cyber data

analytics, statistical modeling Healthcare & Life Sciences •  Genomics data processing, cancer research, natural

language processing for health data science High Tech •  Customer behavior, recommendations, ad bidding,

retargeting, social media analytics Retail & CPG •  Engineering simulation, supply chain modeling,

scientific analysis Oil & Gas •  Pipeline monitoring, noise logging, seismic data

processing, geophysics

…is Trusted by Industry Leaders

Anaconda

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary

DEMOS

© 2016 Continuum Analytics - Confidential & Proprietary 28

Anaconda Enterprise Notebooks A collaborative environment for Data Science teams

Anaconda Fusion Bringing Data Science and Interactive Visualizations to Microsoft Excel

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary

Anaconda Enterprise Notebooks A collaborative environment for data science teams

© 2016 Continuum Analytics - Confidential & Proprietary 30

Search projects per tag and collaborators

Manage contributors

Manage collaborative projects

© 2016 Continuum Analytics - Confidential & Proprietary 31

Organize notebooks, scripts and other files in projects

Manage teams’ collaborators

Save favorite projects

© 2016 Continuum Analytics - Confidential & Proprietary 32

Data lineage

Interactive Visualizations

Advanced notebook extensions

Access to collaborative executable notebooks

© 2016 Continuum Analytics - Confidential & Proprietary 33

•  Publishing to Anaconda Repository integration •  Revision control, commit and notebook diff comparison •  Collaborative locking •  Advanced interactive presentations editor

Use advanced notebook extensions for enhanced collaboration

© 2016 Continuum Analytics - Confidential & Proprietary 34

Easily publish and share your results with Business Leaders and Analysts

© 2016 Continuum Analytics - Confidential & Proprietary 35

Leverage revision control, commit and diff comparison in notebooks

Notebooks version tracking Notebooks changes diff comparison

Commit your work to be able to go back to, and compare changes with other revisions

© 2016 Continuum Analytics - Confidential & Proprietary 36

Collaborate with notebooks locking features

© 2016 Continuum Analytics - Confidential & Proprietary 37

Edit slides layout and content

Edit slides theme

Present your slides with embedded interactive visualizations

Transform notebook into an Interactive Presentation with an advanced editor

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary

Anaconda Fusion Bringing Data Science and Interactive Visualizations to Microsoft Excel

© 2016 Continuum Analytics - Confidential & Proprietary 39

Create browser-based Interactive Visualizations directly from your spreadsheet

Write your visualization directly into the formula

Access a powerful interactive toolbox

Enhance exploration with a customizable hover tool

© 2016 Continuum Analytics - Confidential & Proprietary 40

Interactively explore your spreadsheet data with the cross filter app

Select variables to plot, and color, palette and size of the points

Immediately view your updates in the visualization

© 2016 Continuum Analytics - Confidential & Proprietary 41

Access advanced Machine Learning models to cluster your data

Simple formulas for advanced modeling applications

Easily input variables into algorithms with interactive widgets

Access a wide range of modeling algorithms

© 2016 Continuum Analytics - Confidential & Proprietary 42

Anaconda Enterprise Open Data Science Platform

DATA SCIENCE COLLABORATION

SELF-SERVICE DATA SCIENCE

DATA SCIENCE DEPLOYMENT

Empower the Data Science Team •  Explore data interactively •  Build, test, validate data science models with Python & R •  Publish, share & reproduce data science results easily

Arm Citizen Data Scientists with Intelligent Apps •  Empower your team with intelligent & interactive apps •  Leverage data science from Microsoft Excel® •  Create portable data transformations for reuse

Move Data Science into Production to Get Results •  Go from ad hoc to production deployment easily •  Launch & provision distributed environments •  Boost performance by maximizing your computational power

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary

Open Data Science Starting the Journey to

© 2016 Continuum Analytics - Confidential & Proprietary 44

Journey to Open Data Science

© 2016 Continuum Analytics - Confidential & Proprietary 45

1.  Reproducibility

2.  Governance

3.  Open source assurance

What are typical enterprise barriers to adopting Open Data Science?

45

© 2016 Continuum Analytics - Confidential & Proprietary 46

Embrace Innovation Without Anarchy

From http://www.slideshare.net/RevolutionAnalytics/r-at-microsoft

Reproducibility

© 2016 Continuum Analytics - Confidential & Proprietary 47

Embrace Innovation Without Anarchy

Controlled access to data science assets

Governance

© 2016 Continuum Analytics - Confidential & Proprietary 48

Mitigate legal risk through selection of appropriate OSS license and vendor backed open source assurance

Embrace Innovation Without Risk Open Source Assurance

© 2016 Continuum Analytics - Confidential & Proprietary 49

Next Steps

Download Anaconda

Download continuum.io/ downloads Documentation docs.continuum.io/

Check Out Anaconda Enterprise

Get Data Science Training

Migrate Your First Model to Python

Engage us for migrating SAS models to Python, to learn more contact [email protected]

Anaconda with scalable high performance, team collaboration & governance continuum.io/ anaconda-subscriptions/ anaconda-enterprise

Private corporate training and public online training formats available at continuum.io/training

© 2016 Continuum Analytics - Confidential & Proprietary 50

Anaconda Subscriptions

© 2016 Continuum Analytics - Confidential & Proprietary 51

Thank You Michele Chambers Twitter: @mcAnalytics

Christine Doig Twitter: @ch_doig

Email: [email protected] Twitter: @ContinuumIO

© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary

Continuum Analytics We empower data science teams to make the world a better place

We Empower Data Science Teams to Make the World Better