40
A tour (of a small part) of the Python visualization A tour (of a small part) of the Python visualization landscape landscape PyHEP Workshop - Abingdon 2019 PyHEP Workshop - Abingdon 2019 Philipp Rudiger (@philippjfr) Philipp Rudiger (@philippjfr)

l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

A tour (of a small part) of the Python visualizationA tour (of a small part) of the Python visualizationlandscapelandscape

PyHEP Workshop - Abingdon 2019PyHEP Workshop - Abingdon 2019

Philipp Rudiger (@philippjfr)Philipp Rudiger (@philippjfr)

Page 2: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

PhD in Computational Neuroscience @ University of EdinburghLeft academia to switch to software engineering immediatelyWorking as consultant for Anaconda Inc. for 4+ yearsSpend most of my time on open-source visualization tools

Page 3: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Anaconda Open Source ToolsAnaconda Open Source ToolsAnaconda employs 17+ open-source developersConda (4), Pandas (2), Dask (2), Numba (4), HoloViz (4), Intake (1), scikit-learn (1)

Page 4: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

PackagingPackaging

Used by 18 million users per monthServe 1 PB of packages per month

Page 5: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Data intake & processingData intake & processing

Page 6: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612
Page 7: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

pyviz.org provides an overview over all Python visualization toolsActively working with maintainers to expand guidance and materials

Page 8: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Tools that work well together and with the rest of the ecosystemTake you from exploratory work�ows smoothly to interactive dashboardsFocused on iterating within notebooks but not tied to notebooks

Page 9: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

High-level plotting interface like DataFrame.plotRenders Bokeh plots using HoloViewsSupports interactivity and datashadingSupports Pandas, Dask, Streamz, XArray, GeoPandas, NetworkX

Page 10: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [2]: import dask.dataframe as ddimport hvplot.pandas

Page 11: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [3]: ddf = dd.read_parquet('/Users/philippjfr/development/datashader/examples/data/nyc_taxi_wide.parq').compute()print('The parquet table has %d rows.\n' % len(ddf))print('The table has the following columns: %s' % list(ddf.columns))

The parquet table has 11842094 rows. The table has the following columns: ['tpep_pickup_datetime', 'tpep_dropoff_datetime', 'passenger_count', 'trip_distance', 'pickup_x', 'pickup_y', 'dropoff_x', 'dropoff_y', 'fare_amount', 'tip_amount', 'dropoff_hour', 'pickup_hour']

Page 12: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [4]: ddf.hvplot.hist('pickup_hour', bins=24)

Out[4]:(ht

Page 13: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [5]: ddf[(ddf.trip_distance>0) & (ddf.trip_distance<10)].sort_values('pickup_hour').hvplot.violin( 'trip_distance', by='pickup_hour', c='pickup_hour', cmap='Category20')

Out[5]:(ht

Page 14: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Fast 2D histograming/binning using NumbaParallelized using daskSupport for different glyphs including points, lines, rasters, areas, meshes etc.Working on GPU supports in collaboration with NVIDIA

Page 15: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [6]: ddf.hvplot.scatter( 'pickup_x', 'pickup_y', datashade=True, cmap='viridis', xaxis=False, yaxis=False)

Out[6]:(ht

Page 16: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [7]: ddf.hvplot.scatter( 'pickup_x', 'pickup_y', groupby='pickup_hour', tiles='Wikipedia', datashade=True, cmap='viridis', xaxis=False, yaxis=False)

Out[7]:

(ht

© contributorsOpenStreetMap (https://www.openstreetmap.org/copyright)

picku

Page 17: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

High-level visualization library focused on annotating your data with semanticmetadataAllows building dynamic pipelines to process data as events come inSupport streaming data and many data formats (pandas, dask, xarray, soon cuDF)Renders to different plotting backends including Bokeh, Matplotlib and Plotly

Page 18: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [8]: import holoviews as hvfrom holoviews.operation.datashader import datashade, dynspread

ds = hv.Dataset(ddf)

geo = datashade(hv.Points(ds, kdims=['pickup_x', 'pickup_y'])).opts( width=600, xaxis=None, yaxis=None)fares = datashade(hv.Points(ds, kdims=['fare_amount', 'tip_amount'])).opts( width=600, xlim=(0, 50), ylim=(0, 10))

link = hv.selection.link_selections.instance()

link(geo + fares)

Out[8]:

Page 19: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Easy apps and dashboards for any viz library.Panel is relatively new library, but built on Bokeh's solid 1.x release.Panel lets all your notebooks double as apps or dashboards.Use just about any plotting library, image type, or other objects.Develop dashboards in a notebook, deploy, revise, repeat.Fully usable with Jupyter and without Jupyter installed.

Page 20: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

To dive in, let's say we have a dataset to explore, such as this

.UCI ML dataset measuring the environment in a meeting room(http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+)

In [9]: import matplotlib.pyplot as plt

data = pd.read_csv('./occupancy.csv', index_col='date', parse_dates=True)data.tail()

Out[9]:Temperature Humidity Light CO2 HumidityRatio Occupancy

date

2015-02-10 09:29:00 21.05 36.0975 433.0 787.250000 0.005579 1

2015-02-10 09:29:59 21.05 35.9950 433.0 789.500000 0.005563 1

2015-02-10 09:30:59 21.10 36.0950 433.0 798.500000 0.005596 1

2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1

2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612 1

Page 21: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

And we've written some code that smooths a time series and plots it using Matplotlib withoutliers highlighted:

In [10]: def mpl_plot(avg, highlight): ax = avg.plot() if len(highlight): highlight.plot(style='o', ax=ax) fig = ax.get_figure() plt.close(fig) return fig

def find_outliers(variable='Temperature', window=30, sigma=10, view_fn=mpl_plot): avg = data[variable].rolling(window=window).mean() residual = data[variable] - avg std = residual.rolling(window=window).std() outliers = (np.abs(residual) > std * sigma) return view_fn(avg, avg[outliers])

Page 22: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

We can call the function with parameters and get a plot:

In [11]: find_outliers(variable='Temperature', window=20, sigma=10)

It works! But exploring all these parameters by typing Python is slow and tedious. Plus wewant someone else to be able to try it out.

Out[11]:

Page 23: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Let's make a panel instead:

In [12]: pn.interact(find_outliers)

Out[12]: window: 30

sigma: 10

Page 24: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Let's do a bit more work and capture the full range of parameters that can be varied:

In [13]: kw = dict(window=(1, 60), variable=sorted(list(data.columns)), sigma=(1, 20))pn.interact(find_outliers, **kw)

Out[13]: variable

Temperature

window: 30

sigma: 10

Page 25: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Now that we've explored it, let's share it with someone:

In [14]: pn.interact(find_outliers, **kw).show()

Out[14]: <bokeh.server.server.Server at 0x32dfaa750>

Page 26: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

The person we are sharing it with comes back and says they don't know how to use thedashboard.

Let's look at it and see how it's made, so we can add some explanatory text:

In [15]: i = pn.interact(find_outliers, **kw)

print(i)

Ah, it's just a column of widgets and a plot.

Column [0] Column [0] Select(name='variable', options=['CO2', 'Humidity', ...], value='Temperature') [1] IntSlider(end=60, name='window', start=1, value=30) [2] IntSlider(end=20, name='sigma', start=1, value=10) [1] Row [0] Matplotlib(Figure, name='interactive05169')

Page 27: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Let's unpack that, rearrange it, add some instructions, and hide one of the widgets that willjust be confusing:

In [16]: text = "<br>\n# Room Occupancy\nSelect the variable, and the time window for smoothing"

p = pn.Row(i[1][0], pn.Column(text, *i[0][:2], width=400))p

Out[16]:

Room OccupancyRoom OccupancySelect the variable, and the time windofor smoothing

variable

Temperature

window: 30

Page 28: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Once we're happy with that in the notebook, we'll share it again:

In [17]: p.show()

Out[17]: <bokeh.server.server.Server at 0x32388c090>

Page 29: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [18]: p

Note that even widgets in another notebook cell stay linked:

In [19]: i[0][2]

Out[18]:

Room OccupancyRoom OccupancySelect the variable, and the time windofor smoothing

variable

Temperature

window: 30

Out[19]: sigma: 10

Page 30: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Also note that Panel widgets are reactive, so they will update even if you set the values byhand:

In [20]: p

In [21]: i[0][0].value = 'Light'

Out[20]:

Room OccupancyRoom OccupancySelect the variable, and the time windofor smoothing

variable

Temperature

window: 30

Page 31: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Of course, you don't need to use the magic of interact ; it's also easy to make widgets

and link them up by hand:

In [22]: import panel.widgets as pnw

variable = pnw.RadioButtonGroup(name='variable', value='Temperature', options=list(data.columns))window = pnw.IntSlider(name='window', value=10, start=1, end=60)

@pn.depends(variable, window)def reactive_outliers(variable, window): return find_outliers(variable, window, 10)

widgets = pn.Column("<br>\n# Room occupancy", variable, window)occupancy = pn.Row(reactive_outliers, widgets)

Page 32: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [23]: occupancy

Out[23]:

Room occupancyRoom occupancyTemperature Humidity Light CO2 HumidityR

window: 10

Page 33: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

So far we've only used Matplotlib.

What about other libraries? You can use almost anything!

E.g. , a drop-in replacement for Pandas .plot() (and xarray,dask, intake, ...) that gives fully interactive plots in panels:

hvPlot (http://hvplot.pyviz.org)Bokeh (http://bokeh.pydata.org)

Page 34: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [24]: import hvplot.pandas

def hvplot(avg, highlight): return avg.hvplot(height=300, legend=False) * highlight.hvplot.scatter(color='orange', padding=0.1, legend=False)

text2 = "## Room Occupancy\nSelect the variable and the smoothing values"hvp = pn.interact(find_outliers, view_fn=hvplot, **kw)occupancy_app = pn.Column(pn.Row(pn.panel(text2, width=400), hvp[0]), hvp[1])occupancy_app.servable()

Out[24]:

Room OccupancyRoom OccupancySelect the variable and the smoothingvalues

variable

Temperature

window: 30

sigma: 10

(ht

Page 35: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

We can bring in other datasets, dozens of other plotting libraries, linked plots, drilling down,big data, etc., but let's call that done for now.

Can we keep it running as a server inde�nitely now?

Sure, just mark the item to serve with .servable() in the notebook, then run panel serve --show PyHEP2019.ipynb .

As you can see, Panel is designed to support your entire data-analysis work�ow -- work in anotebook, explore freely, deploy freely, do batch or cron jobs, and never have to rewrite touse your code in a new context.

Page 36: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Other FeaturesOther Features

Page 37: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

TemplatingTemplating{% extends base %} <!-- goes in body --> {% block postamble %} <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/materialize.min.css"> <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script> <style type="text/css"> body { background-color: var(--jp-layout-color0); overflow-y: scroll; } .nav-wrapper { background-color: #2f2f2f; } .brand-logo {

Page 38: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

In [29]: tmpl = pn.Template(template)

tmpl.add_panel('vega', vega)tmpl.add_panel('plotly', plotly)tmpl.add_panel('holoviews', holoviews)

tmpl.servable()

Out[29]: Template [vega] Row [0] Vega(dict, height=425, sizing_mode='fixed', width=750) [1] Column [0] RangeSlider(end=125, name='Color limits', step=1, value=(0, 40), width=200) [1] IntSlider(end=125, name='IMDB Ratings Bins', step=25, value=60, width=200) [2] IntSlider(end=125, name='Rotten Tomato R..., step=25, value=40, width=200) [plotly] Plotly(Figure) [holoviews] Row [0] Column [0] WidgetBox(css_classes=['widget-box']) [0] FloatSlider(end=1, name='Alpha', start=0, value=1) [1] Select(name='Colormap', options={'fire': <matplotlib.color...}, value=<matplotlib.colors.LinearS...) [2] Select(name='Location', options=['dropof

Page 39: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

DemosDemoshttp://localhost:5006/ (http://localhost:5006/)https://glaciers.pyviz.demo.anaconda.com/glaciers(https://glaciers.pyviz.demo.anaconda.com/glaciers)https://webcam-classi�er.pyviz.demo.anaconda.com/app (https://webcam-classi�er.pyviz.demo.anaconda.com/app)

Page 40: l a n d s ca p e A t o u r (of a s m a l l p a r t ) of t ... · 2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1 2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612

Thank you!

See to see how everything �ts together and go through atutorial!

Come talk to me about any of our open source tools!

Twitter & GitHub: @philippjfr

[email protected]

holoviz.org (https://holoviz.org)