Data science-summit MTL 2015 - The end of IT departments and data-science empowered by ipython...

Preview:

Citation preview

francis@qmining.com

Plan1. Topics

○ Why the end of IT departments will help data-scientists ○ Data-Science empowered by ipython notebooks

2. Use cases○ Algo trading ○ Clustering visualization○ Confusion matrix visualization○ Outlier inspection○ Session clustering (idstats)○ Amazing data-science platform: Quantopian

QAJust another barrier of entry

Reminder: Data MaturityBarriers of entry Levels

ML ● Sampling● Big-Data

Level 5 | Level 1 | Level 2 | Level 3 | Level 4

The end of IT departments

● Car > 30K● Gaz+parking = 5k● max speed = 180 KM/h● avg speed = 10 km/h● ROI = 29%

● bike < 1K● max speed = 45 km/h● avg speed = 30 km/h● ROI = 3000%

IT department

IT department only argument

Strategies to get rid of IT department*

*If don't cooperate, too slow, have always an excuse -> union approach

1. Bypass them/ignore -> workarounds

http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html

Strategies to get rid of IT department*

*If don't cooperate, too slow, have always an excuse -> union approach

1. Bypass them/ignore -> workarounds2. Play their game -> Help them hang themselves

Strategy: Play the gamedon't fight

1. Dialogue = explain goals 2. Listen proposal3. Explain why it's not a good idea if its not4. Do as they say (don't fight too much) -> Try5. Evaluate: Failure + cost + lost 3 months6. Who will be fired?

The NLU pipelinevirtual assistant

Why?● Measure,Understand and Improve Virtual Assistant User Experience

What?● Measure user experience (task completion), retention, ...● Understand good/bad user experience ->

○ Speech○ UX○ Dialog○ User○ Client vs server side○ Latency….

IT layer: R&D hadoop cluster

SQL layer of abstraction

Hook -> hadoop streaming

Data-Science empowered by ipython notebook

wsgi

Proto to Prod

Exploration to Proto

IPython NotebookThe IPython Notebook is an interactive computational environment, in which you can combine code execution, rich

text, mathematics, plots and rich media, as shown in this example session:

ipython notebook

http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html

extend to all language

notebook - train a algo trading strategytrading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)

notebook integration in git

Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib)

http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html

Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib)

http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html

Simple way to inspect outliers?mlboost/clustering/visu.py (matplotlib+scipy)

http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html

How to see session clusters?mlboost/utils/idstats.py (mlboost)

http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html

The quantopian use caseCommunity+Research->Experiment->deploy

The quantopian use caseCommunity+Research->Experiment->deploy

The quantopian use caseCommunity+Research->Experiment->deploy

The quantopian use caseCommunity+Research->Experiment->deploy

The quantopian use caseCommunity+Research->Experiment->deploy

The quantopian use caseCommunity+Research->Experiment->deploy

Quantopian = leader in data-science platform & fintech revolution

Self-disrupt or be disrupted

Python as a leverage

Conclusion -> disrupt or be disrupted

● IT department = constraint to efficient data-science○ IT -> business solution but also biggest problem ○ IT departments will die it's not an if but when ○ Last argument = Security○ Strategy = outsource (amazon) or be inefficient ○ Why they hire old CIO …

○ IPython notebook = efficient exploration● Follow the lead of quantopian

○ Community+ python(Research->Experiment->deploy)● To be data-driven, we need data efficiency at any cost

francis@qmining.com

hum...

Recommended