Upload
continuum-analytics
View
2.422
Download
0
Embed Size (px)
Citation preview
© 2016 Continuum Analytics - Confidential & Proprietary 1
Breaking Data Science Open How Open Data Science is Eating the World
© 2016 Continuum Analytics - Confidential & Proprietary 2
Michele Chambers @mcAnalytics • EVP Product & CMO Continuum Analytics • M.B.A Duke University, B.S. Computer
Engineering • Author
• Breaking Data Science Open: O’Reilly • Modern Analytics Methodologies: Driving Business
Value with Analytics Pearson FT Press • Advanced Analytics Methodologies: Driving
Business Value with Analytics Pearson FT Press • Big Data Big Analytics Wiley
About Us
© 2016 Continuum Analytics - Confidential & Proprietary 3
About Us Christine Doig @ch_doig
• Sr. Data Scientist & Product Mgr. Continuum Analytics
• M.S. Polytechnic University of Catalonia in Industrial Engineering
• Open Source advocate and speaker • PyData, EuroPython, SciPy, PyCon,
• Author • Breaking Data Science Open: O’Reilly
5+ years in advanced analytics, operations research, machine learning in energy, manufacturing & banking
© 2016 Continuum Analytics - Confidential & Proprietary 4
Business Intelligence & Predictive Analytics Using Data for Insight & Human-in-the-Loop actions
© 2016 Continuum Analytics - Confidential & Proprietary 5
Cognitive Intelligence Using Data & Deep Learning to Make Recommendations
© 2016 Continuum Analytics - Confidential & Proprietary 8
Open Data Science Connecting Data, Analytics & Computation
© 2016 Continuum Analytics - Confidential & Proprietary
“ ”9
An interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms.
Data Science is…
10
an inclusive movement that makes open source tools of data science
— data, analytics, & computation — easily work together
as a connected ecosystem
Open Data Science is…
© 2016 Continuum Analytics - Confidential & Proprietary
Data Science is not just Machine Learning…
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
© 2016 Continuum Analytics - Confidential & Proprietary
Data Science is Interdisciplinary…
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
Classification, deep learning, Regression, PCA
Hadoop, Spark Web crawling, scraping, 3rd party data & API providers, predictive services & APIs
GPUs, multi-cores Data warehouse, querying, reporting
© 2016 Continuum Analytics - Confidential & Proprietary
Numba
dask
xlwings
Airflow
Blaze Open Source Communities Creates Powerful Technology for Data Science
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
Machine Learning / Statistics
© 2016 Continuum Analytics - Confidential & Proprietary
Numba
dask
xlwings
Airflow
Blaze Python is the common language
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
Machine Learning / Statistics
© 2016 Continuum Analytics - Confidential & Proprietary
Python’s Not the Only One…
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
SQL
Machine Learning / Statistics
© 2016 Continuum Analytics - Confidential & Proprietary
But it’s also a Great Glue Language
Distributed Systems
Business Intelligence
Machine Learning / Statistics
Web
Scientific Computing / HPC
SQL
© 2016 Continuum Analytics - Confidential & Proprietary
Numba
dask
xlwings
Airflow
Blaze Anaconda is the Open Data Science Platform Bringing Technology Together…
Distributed Systems
Business Intelligence
Web
Scientific Computing / HPC
Machine Learning / Statistics
© 2016 Continuum Analytics - Confidential & Proprietary 18
Empowering the Data Science Team
Data Scientist Biz Analyst Data Engineer Developer DevOps
Explore & Analyze
Collaborate & Publish
Deploy & Operate
© 2016 Continuum Analytics - Confidential & Proprietary 19
Modern Data Science Teams use…
• Hadoop / Spark • Programming
Languages • Analytic Libraries • IDE • Notebooks • Visualization
• Spreadsheets • Visualization • Notebooks • Analytic
Development Environment
• Database / Data Warehouse
• ETL
• Programming Languages
• Analytic Libraries • IDE • Notebooks • Visualization
• Database / Data Warehouse
• Middleware • Programming
Languages
Data Scientist Biz Analyst Data Engineer Developer DevOps
RIGHT TECHNOLOGY FOR THE PROBLEM
© 2016 Continuum Analytics - Confidential & Proprietary 20
Modern Data Science Teams Want…
DATA SCIENCE COLLABORATION
SELF-SERVICE DATA SCIENCE
DATA SCIENCE DEPLOYMENT
OPEN DATA SCIENCE
© 2016 Continuum Analytics - Confidential & Proprietary 21
• Accelerate Time-to-Value
• Connect Data, Analytics & Compute
• Empower Data Science Teams
…is the leading Open Data Science platform powered by Python the fastest growing data science language
© 2016 Continuum Analytics - Confidential & Proprietary 22
INNOVATE faster through managed agile experimentation
MOVE from analysis to deployment immediately
DELIVER powerful results backed by high performance open data science platform
LEVERAGE innovative open source analytics to extract value from data MAXIMIZE your computational power to easily analyze all data
CONNECT and integrate all your data sources for predictive models
ITERATE quickly to create powerful analysis and predictive models COLLABORATE and share with your data science team
PUBLISH interactive results to the business
ACCELERATE Time-to-Value
CONNECT Data, Analytics & Compute
EMPOWER Data Science Teams
© 2016 Continuum Analytics - Confidential & Proprietary 23
Open Data Science Platform ACCELERATE. CONNECT. EMPOWER
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Gives Superpowers To People Who Change The World
© 2016 Continuum Analytics - Confidential & Proprietary 25
Open Data Science Vibrant and Growing Community
Python Community
30M+ Packages in Anaconda
720+
R Community
16M+ Spark Python Usage
50%+
ANACONDA Downloads
3M+
© 2016 Continuum Analytics - Confidential & Proprietary 26
Financial Services • Risk management, Quant modeling, Data exploration
and processing, algorithmic trading, compliance reporting
Government • Fraud detection, data crawling, web & cyber data
analytics, statistical modeling Healthcare & Life Sciences • Genomics data processing, cancer research, natural
language processing for health data science High Tech • Customer behavior, recommendations, ad bidding,
retargeting, social media analytics Retail & CPG • Engineering simulation, supply chain modeling,
scientific analysis Oil & Gas • Pipeline monitoring, noise logging, seismic data
processing, geophysics
…is Trusted by Industry Leaders
Anaconda
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
DEMOS
© 2016 Continuum Analytics - Confidential & Proprietary 28
Anaconda Enterprise Notebooks A collaborative environment for Data Science teams
Anaconda Fusion Bringing Data Science and Interactive Visualizations to Microsoft Excel
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Enterprise Notebooks A collaborative environment for data science teams
© 2016 Continuum Analytics - Confidential & Proprietary 30
Search projects per tag and collaborators
Manage contributors
Manage collaborative projects
© 2016 Continuum Analytics - Confidential & Proprietary 31
Organize notebooks, scripts and other files in projects
Manage teams’ collaborators
Save favorite projects
© 2016 Continuum Analytics - Confidential & Proprietary 32
Data lineage
Interactive Visualizations
Advanced notebook extensions
Access to collaborative executable notebooks
© 2016 Continuum Analytics - Confidential & Proprietary 33
• Publishing to Anaconda Repository integration • Revision control, commit and notebook diff comparison • Collaborative locking • Advanced interactive presentations editor
Use advanced notebook extensions for enhanced collaboration
© 2016 Continuum Analytics - Confidential & Proprietary 34
Easily publish and share your results with Business Leaders and Analysts
© 2016 Continuum Analytics - Confidential & Proprietary 35
Leverage revision control, commit and diff comparison in notebooks
Notebooks version tracking Notebooks changes diff comparison
Commit your work to be able to go back to, and compare changes with other revisions
© 2016 Continuum Analytics - Confidential & Proprietary 36
Collaborate with notebooks locking features
© 2016 Continuum Analytics - Confidential & Proprietary 37
Edit slides layout and content
Edit slides theme
Present your slides with embedded interactive visualizations
Transform notebook into an Interactive Presentation with an advanced editor
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Anaconda Fusion Bringing Data Science and Interactive Visualizations to Microsoft Excel
© 2016 Continuum Analytics - Confidential & Proprietary 39
Create browser-based Interactive Visualizations directly from your spreadsheet
Write your visualization directly into the formula
Access a powerful interactive toolbox
Enhance exploration with a customizable hover tool
© 2016 Continuum Analytics - Confidential & Proprietary 40
Interactively explore your spreadsheet data with the cross filter app
Select variables to plot, and color, palette and size of the points
Immediately view your updates in the visualization
© 2016 Continuum Analytics - Confidential & Proprietary 41
Access advanced Machine Learning models to cluster your data
Simple formulas for advanced modeling applications
Easily input variables into algorithms with interactive widgets
Access a wide range of modeling algorithms
© 2016 Continuum Analytics - Confidential & Proprietary 42
Anaconda Enterprise Open Data Science Platform
DATA SCIENCE COLLABORATION
SELF-SERVICE DATA SCIENCE
DATA SCIENCE DEPLOYMENT
Empower the Data Science Team • Explore data interactively • Build, test, validate data science models with Python & R • Publish, share & reproduce data science results easily
Arm Citizen Data Scientists with Intelligent Apps • Empower your team with intelligent & interactive apps • Leverage data science from Microsoft Excel® • Create portable data transformations for reuse
Move Data Science into Production to Get Results • Go from ad hoc to production deployment easily • Launch & provision distributed environments • Boost performance by maximizing your computational power
© 2016 Continuum Analytics - Confidential & Proprietary © 2016 Continuum Analytics - Confidential & Proprietary
Open Data Science Starting the Journey to
© 2016 Continuum Analytics - Confidential & Proprietary 45
1. Reproducibility
2. Governance
3. Open source assurance
What are typical enterprise barriers to adopting Open Data Science?
45
© 2016 Continuum Analytics - Confidential & Proprietary 46
Embrace Innovation Without Anarchy
From http://www.slideshare.net/RevolutionAnalytics/r-at-microsoft
Reproducibility
© 2016 Continuum Analytics - Confidential & Proprietary 47
Embrace Innovation Without Anarchy
Controlled access to data science assets
Governance
© 2016 Continuum Analytics - Confidential & Proprietary 48
Mitigate legal risk through selection of appropriate OSS license and vendor backed open source assurance
Embrace Innovation Without Risk Open Source Assurance
© 2016 Continuum Analytics - Confidential & Proprietary 49
Next Steps
Download Anaconda
Download continuum.io/ downloads Documentation docs.continuum.io/
Check Out Anaconda Enterprise
Get Data Science Training
Migrate Your First Model to Python
Engage us for migrating SAS models to Python, to learn more contact [email protected]
Anaconda with scalable high performance, team collaboration & governance continuum.io/ anaconda-subscriptions/ anaconda-enterprise
Private corporate training and public online training formats available at continuum.io/training
© 2016 Continuum Analytics - Confidential & Proprietary 51
Thank You Michele Chambers Twitter: @mcAnalytics
Christine Doig Twitter: @ch_doig
Email: [email protected] Twitter: @ContinuumIO