Hassle Free Data Science Apps with Bokeh Webinar

Preview:

Citation preview

Hassle-Free Data Science Apps with Bokeh

Presenters

Peter Wang is the CTO and Co-founder of Continuum Analytics and the creator of Bokeh.

He has been developing commercial scientific computing and visualization software for over 15 years.

As a creator of the PyData conference, he devotes time and energy to growing the Python data

community, and advocating and teaching Python at conferences worldwide.

Bryan Van de Ven is the lead developer on the Bokeh project.

He holds an undergraduate degree in Computer Science & Mathematics form UT Austin, and a Masters degree in Physics from UCLA.

Previously Bryan developed data exploration and visualization software for sonar feature detection, financial risk modeling, and fluid mixing simulation.

Overview• What is Bokeh?

• Overview and tour of major features

• Demo 1: Scikit-learn clustering

• Demo 2: Gapminder

• Demo 3: Streaming data

• Really big data: Preview of data shading

• Q&A

Overview of Anaconda

is….the modern open source analytics platform powered by Pythonthe fastest growing open data science language• Easy to Build, Maintain & Deploy Analytics• Talks with Everything, Runs Anywhere• High Performance, Scalable Analytics

AnacondaAccelerating Adoption of Python for Enterprises

COLLABORATIVE NOTEBOOKSwith publication, authentication, & search

Jupyter/ IPython

PYTHON & PACKAGE MANAGEMENTfor Hadoop & Apache stack Spark

PERFORMANCEwith compiled Python for lightning fast execution

Numba

VISUAL APPSfor interactivity, streaming, & BigBokeh

SECURE & ROBUST REPOSITORYof data science libraries, scripts, & notebooks

Conda

ENTERPRISE DATA INTEGRATIONwith optimized connectors & out-of-core

processing

NumPy & Pandas

Anaconda for Data ScienceEmpowering Everyone on the Team

Data Scientist• Advanced analytics with Python & R• Simplified library management• Easily share data science notebooks & packages

Developer• Support for common APIs & data formats• Common language with data scientists• Python extensibility with C, C++, etc.

Business Analyst• Collaborative interactive analytics with

notebooks• Rich browser based visualizations• Powerful MS Excel integration

Data Engineer• Powerful & efficient libraries for data

transformations • Robust processing for noisy dirty data• Support for common APIs & data formats

Ops• Validated source of up-to-date packages including indemnification • Agile Enterprise Package Management• Supported across platforms

Computational Scientist• Rich set of advanced analytics• Trusted & production ready libraries for

numerics• Simplified scale up & scale out on clusters &

GPUs

Modern Analytics Stack

Write Once, Deploy AnywhereM

ANAG

ED

PYTH

ON

Explore & Visualize

Python & R Advanced Analytics

High Performance & Scalability

Data Engineering & Analysis

Collaboration & Integration

Servers Linux,Windows

OSX

GPUs&HighEndWorkstations

Linux&Windows

NVIDIA,AMD,X86/ARM

Clusters Yarn,Mesos,MPI

Power8,LSF,SungridEngine

NoSQL MongoDB

Cassandra/DataStax

Hadoop Cloudera,Hortonworks

ApacheHadoop&Spark

Files MicrosoftExcel

Trifacta,Import.io

DW&SQL AnySQLDB

AnySQLDW,Impala

Bokeh Overview & Tour

Bokeh

11

http://bokeh.pydata.org

• Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • No need to write Javascript

Versatile Plots

12

Novel Graphics

13

14

Linked Plots (Notebook 2)

• Easy to show multiple plots and link them • Easy to link data selections between plots • Can easily customize the kind of linkage straight from

Python, without needing to fiddle around with JS

15

Flexible Tools (Notebook 3)

• Many useful tools with built-in functionality • Easy to extend with Javascript, if so inclined

rBokeh

16http://hafen.github.io/rbokeh

Plays well with R ecosystem: HTMLwidget, RMarkdown…

rBokeh with RStudio & Shiny

17

Architecture

19Server-side Data Processing: Python, Java, etc.

HTML

Javascript

D3 Highcharts Flot nvd3 dcjs

JavaScript Plotting library

CSV, SQL

Data

Traditional Web Visualization

CSSTech: • Python/R/Java • HTML & browser compat • CSS/LESS/Sass • JS plotting library API • Javascript

• jQuery, underscore • svg, canvas2D • webGL, three.js • React • Angular • node.js, browserify,

gulp, grunt, npm, …

Browser

HTML

20

HTML

CSSJavascript

User

Data

Python, Ruby, Java, .NET

Server

Traditional Web Viz - Interaction

Javascript

Javascript

Data’

Simple dashboard: Server language generating HTML, JS, CSS styling, subset of data

Handling user interaction: Custom Javascript, calling Server endpoint, which generates updated JSON or JS that gets pushed back to client via websocket

Server

Bokeh BokehJS

JSON

(HTML, CSS)

Client

Bokeh Conceptual Architecture

UserPython, R,

Scala

Data

Simple dashboard: Single language, no need to write HTML, JS, CSS

Handling user interaction: Single language that you already know; interactive data updates feel seamless to the user

• Skills required: 5-10 skills • Time to market: weeks to months • Server code: 100s to 1000s lines

• Skills required: ~1 skill • Time to market: minutes • Server code: 0

Client

Data

BokehJSPython, RBokeh

Server

Python, Ruby Java, .NET

Data

Client

CSSData

Comparison Chart

Some Bokeh Users

Community & AdoptionGithub • 3500+ watchers • 680 forks

Mailing list • 400+ members • 150+ posts in November

Downloads • 21,500 / month (conda) • 10,000 / month (pip)

25

http://cecp.mit.edu

Embeds Well

Demo: Clustering with Scikit-learn

Demo Overview

In this demo, we will build a basic application which lets us visualize different kinds of clustering approaches with Scikit-learn.

• We will use a drop-down to select the algorithm • We will write a Python handler function which

responds to the user action, and pushes an update to the plot in the browser.

• Notebook for basic viz: ~25 LOC • Example app with 1 dropdown: < 100 LOC • Multiple dropdown and sliders: < 200 LOC

Demo: Gapminder

Demo OverviewThis demo shows how we can embed a little bit of Javascript to make a server-less but very capable interactive visualization.

• We will build up the visualization from the ground up, showing different kinds of Bokeh plotting primitives

• We will do it inside the Jupyter Notebook, so we can see our changes immediately

• Then we will wire up an interactive slider

The resulting interactive visualization will be embedded in the browser, with no reliance on a server to handle user interactions.

Demo: Animation & Streaming example

Demo Overview

In this demo, we will demonstrate how the Bokeh server makes it easy to visualize streaming and dynamic data.

• A minimal example with < 50 LOC • Demonstrates ease of pushing

data from Python code into the browser

32

• Realtime audio sampling via PyAudio, realtime FFT via Numpy

• 30 fps • ~200 lines of code

Bokeh: Progress and Future

Visualizing Big Data: Preview of “Data Shading”

35

Billions and billions…

36

Data Shading Main Points• When trying to visualize millions of points, browser vs. rich client

doesn’t really matter • Raft of common problems that are ignored: Overdraw, over- & under-

saturation, clipping, coarse binning • Statistical transformations of data are a first-class aspect of the

visualization • Rapid iteration of visual styles & configs, interactive selections and

filtering are key concerns in data exploration

When data is large, you don’t know when the viz is lying.

37

Data Shading Pipeline

Data

Project / Synthesize

Scene Aggregates

Sample / Raster Transfer

Image

Visual Abstraction

DataTransforms

VisualMappings

ViewTransforms

Data Tables

Source Data Views

Selection Aggregation Transfer

SignificantSet Aggregates

Anaconda Subscriptions and Resources

Priority 1 support with Dedicated Customer

Support Rep

ANACONDAENTERPRISE

CONTACT USCONTACT US

ANACONDAPRO

Priority 1 support

DOWNLOAD

ANACONDA

Community Support

FREE FOREVER

Open Source Modern Analytics Platform

Powered by Python

Anaconda with Support & Indemnification

Priority 1 support

ANACONDAWORKGROUP

CONTACT US

Anaconda with High Performance and Team

Collaboration

Anaconda with Scalable High Performance and

Team Collaboration

per year

+ $1,000 per year foradditional users

$10,000Starting at

+ $3,000 per year foradditional users

per year

$30,000Starting at

+ $6,000 per year foradditional users

per year

$60,000Starting at

Anaconda Subscriptions

Contact Information and Additional Details

• Contact sales@continuum.io for more information aboutAnaconda subscriptions, consulting, or training

• View documentation and examples at

bokeh.pydata.org

• View demo notebooks on Anaconda Cloud

notebooks.anaconda.org/pwang/

Thank you

Email: sales@continuum.io

Twitter: @ContinuumIO

Peter WangTwitter: @pwang

BokehTwitter: @bokehplots

Recommended