19
Python and Data: What’s next? Wes McKinney @wesmckinn PyCon Singapore 2013-06-14

PyCon Singapore 2013 Keynote

Embed Size (px)

DESCRIPTION

by Wes McKinney

Citation preview

Page 1: PyCon Singapore 2013 Keynote

Python and Data:What’s next?

Wes McKinney@wesmckinn

PyCon Singapore 2013-06-14

Page 2: PyCon Singapore 2013 Keynote

Me

2013: Analytics Startup in SF

Page 3: PyCon Singapore 2013 Keynote

Book

• Python essentials

• NumPy

• IPython

• matplotlib

• pandas

Published October 2012

Page 4: PyCon Singapore 2013 Keynote

Some context

• 2007 to 2013

• NumPy, SciPy mature

• IPython Notebook

• Key libraries/tools developed: scikit-learn, statsmodels, PyCUDA, ...

• pandas helps make Python a desirable data preparation language

Page 5: PyCon Singapore 2013 Keynote
Page 6: PyCon Singapore 2013 Keynote

pandas

• Fast structured data manipulation tools for Python with nice API

• Goal: make Python a halfway decent language for data preparation / statistical analysis

• Sometimes say: “R data frames in Python”

• Fast-growing user base / community

Page 7: PyCon Singapore 2013 Keynote

Aside: vbench

Page 8: PyCon Singapore 2013 Keynote

Tool inefficiency impedes innovation

Page 9: PyCon Singapore 2013 Keynote

What can

tell us about Python?

Page 10: PyCon Singapore 2013 Keynote

Some Trends

• Decline of Desktop, Rise of Web/Cloud

• SVG / HTML5 Canvas / WebGL Tech

• Big Data

• JIT-compile all the things

• Democratize all the things

Page 11: PyCon Singapore 2013 Keynote

Challenge:Keeping Python relevant

Page 12: PyCon Singapore 2013 Keynote

Data on the Web

• Nirvana: ubiquitous, easy data analysis

• Challenges

• JavaScript: weak language for implementing analytics

• Computation needs to run “close” to data

• Maintaining interactivity

Page 13: PyCon Singapore 2013 Keynote

Golden age for web visualization

SVG

Page 14: PyCon Singapore 2013 Keynote

Embracing the JavaScript

• Build bridges, not walls

• Some examples

• IPython Notebook

• RStudio

• Rob Story’s pandas integrations

• Chartkick

Page 15: PyCon Singapore 2013 Keynote

In search of the perfect “data language”

• Minimal syntax overhead

• Domain-specific data types that all support missing (NA) values

• Rich built-in prep-related operations

• E.g. set logic, group by, sorting, binning, indexing

• Integrate within a larger application

Page 16: PyCon Singapore 2013 Keynote

JIT compiler tech

• LLVM: growing in popularity

• Rolling a new, fast compute engine much easier than it used to be

• But: not sure compiling Python code is the optimal long-term strategy

Page 17: PyCon Singapore 2013 Keynote

Big Data SQL

Page 18: PyCon Singapore 2013 Keynote

Some thoughts

• Web-friendliness: essential for survival

• You can never be too productive

• The data’s not getting any smaller

Page 19: PyCon Singapore 2013 Keynote

Thanks!