30

Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Embed Size (px)

Citation preview

Page 2: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook
Page 3: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook
Page 4: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Plan1. Topics

○ Why the end of IT departments will help data-scientists ○ Data-Science empowered by ipython notebooks

2. Use cases○ Algo trading ○ Clustering visualization○ Confusion matrix visualization○ Outlier inspection○ Session clustering (idstats)○ Amazing data-science platform: Quantopian

Page 5: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

QAJust another barrier of entry

Reminder: Data MaturityBarriers of entry Levels

ML ● Sampling● Big-Data

Level 5 | Level 1 | Level 2 | Level 3 | Level 4

Page 6: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The end of IT departments

● Car > 30K● Gaz+parking = 5k● max speed = 180 KM/h● avg speed = 10 km/h● ROI = 29%

● bike < 1K● max speed = 45 km/h● avg speed = 30 km/h● ROI = 3000%

IT department

Page 7: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

IT department only argument

Page 8: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Strategies to get rid of IT department*

*If don't cooperate, too slow, have always an excuse -> union approach

1. Bypass them/ignore -> workarounds

http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html

Page 9: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Strategies to get rid of IT department*

*If don't cooperate, too slow, have always an excuse -> union approach

1. Bypass them/ignore -> workarounds2. Play their game -> Help them hang themselves

Page 10: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Strategy: Play the gamedon't fight

1. Dialogue = explain goals 2. Listen proposal3. Explain why it's not a good idea if its not4. Do as they say (don't fight too much) -> Try5. Evaluate: Failure + cost + lost 3 months6. Who will be fired?

Page 11: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The NLU pipelinevirtual assistant

Why?● Measure,Understand and Improve Virtual Assistant User Experience

What?● Measure user experience (task completion), retention, ...● Understand good/bad user experience ->

○ Speech○ UX○ Dialog○ User○ Client vs server side○ Latency….

Page 12: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

IT layer: R&D hadoop cluster

SQL layer of abstraction

Hook -> hadoop streaming

Page 13: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Data-Science empowered by ipython notebook

wsgi

Proto to Prod

Exploration to Proto

Page 14: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

IPython NotebookThe IPython Notebook is an interactive computational environment, in which you can combine code execution, rich

text, mathematics, plots and rich media, as shown in this example session:

ipython notebook

http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html

extend to all language

Page 15: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

notebook - train a algo trading strategytrading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)

Page 16: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

notebook integration in git

Page 17: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib)

http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html

Page 18: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib)

http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html

Page 19: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Simple way to inspect outliers?mlboost/clustering/visu.py (matplotlib+scipy)

http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html

Page 20: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

How to see session clusters?mlboost/utils/idstats.py (mlboost)

http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html

Page 21: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The quantopian use caseCommunity+Research->Experiment->deploy

Page 22: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The quantopian use caseCommunity+Research->Experiment->deploy

Page 23: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The quantopian use caseCommunity+Research->Experiment->deploy

Page 24: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The quantopian use caseCommunity+Research->Experiment->deploy

Page 25: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The quantopian use caseCommunity+Research->Experiment->deploy

Page 26: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

The quantopian use caseCommunity+Research->Experiment->deploy

Page 27: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Quantopian = leader in data-science platform & fintech revolution

Self-disrupt or be disrupted

Page 28: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Python as a leverage

Page 29: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

Conclusion -> disrupt or be disrupted

● IT department = constraint to efficient data-science○ IT -> business solution but also biggest problem ○ IT departments will die it's not an if but when ○ Last argument = Security○ Strategy = outsource (amazon) or be inefficient ○ Why they hire old CIO …

○ IPython notebook = efficient exploration● Follow the lead of quantopian

○ Community+ python(Research->Experiment->deploy)● To be data-driven, we need data efficiency at any cost

Page 30: Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython notebook

[email protected]

hum...