Plan1. Topics
○ Why the end of IT departments will help data-scientists ○ Data-Science empowered by ipython notebooks
2. Use cases○ Algo trading ○ Clustering visualization○ Confusion matrix visualization○ Outlier inspection○ Session clustering (idstats)○ Amazing data-science platform: Quantopian
QAJust another barrier of entry
Reminder: Data MaturityBarriers of entry Levels
ML ● Sampling● Big-Data
Level 5 | Level 1 | Level 2 | Level 3 | Level 4
The end of IT departments
● Car > 30K● Gaz+parking = 5k● max speed = 180 KM/h● avg speed = 10 km/h● ROI = 29%
● bike < 1K● max speed = 45 km/h● avg speed = 30 km/h● ROI = 3000%
IT department
IT department only argument
Strategies to get rid of IT department*
*If don't cooperate, too slow, have always an excuse -> union approach
1. Bypass them/ignore -> workarounds
http://fraka6.blogspot.ca/2014/08/dev-principle-you-should-apply-every.html
Strategies to get rid of IT department*
*If don't cooperate, too slow, have always an excuse -> union approach
1. Bypass them/ignore -> workarounds2. Play their game -> Help them hang themselves
Strategy: Play the gamedon't fight
1. Dialogue = explain goals 2. Listen proposal3. Explain why it's not a good idea if its not4. Do as they say (don't fight too much) -> Try5. Evaluate: Failure + cost + lost 3 months6. Who will be fired?
The NLU pipelinevirtual assistant
Why?● Measure,Understand and Improve Virtual Assistant User Experience
What?● Measure user experience (task completion), retention, ...● Understand good/bad user experience ->
○ Speech○ UX○ Dialog○ User○ Client vs server side○ Latency….
IT layer: R&D hadoop cluster
SQL layer of abstraction
Hook -> hadoop streaming
Data-Science empowered by ipython notebook
wsgi
Proto to Prod
Exploration to Proto
IPython NotebookThe IPython Notebook is an interactive computational environment, in which you can combine code execution, rich
text, mathematics, plots and rich media, as shown in this example session:
ipython notebook
http://fraka6.blogspot.ca/2015/04/how-to-create-your-ipython-datascience.html
extend to all language
notebook - train a algo trading strategytrading/example/run.py (pytrade->pandas + sklearn + theanets + matplotlib)
notebook integration in git
Default queriesPCA, LLE, isomap, t-sne etc. -> mlboost/clustering/visu.py (scikit-learn+matplotlib)
http://fraka6.blogspot.ca/2013/04/simplifying-clustering-visualization.html
Confusion matrices visu/mlboost/util/sklearn_confusion_matrix.py (sklearn+matplotlib)
http://fraka6.blogspot.ca/2013/05/generating-confusion-matrix-great.html
Simple way to inspect outliers?mlboost/clustering/visu.py (matplotlib+scipy)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
How to see session clusters?mlboost/utils/idstats.py (mlboost)
http://fraka6.blogspot.ca/2013/09/a-simple-way-to-identify-outliers-and.html
The quantopian use caseCommunity+Research->Experiment->deploy
The quantopian use caseCommunity+Research->Experiment->deploy
The quantopian use caseCommunity+Research->Experiment->deploy
The quantopian use caseCommunity+Research->Experiment->deploy
The quantopian use caseCommunity+Research->Experiment->deploy
The quantopian use caseCommunity+Research->Experiment->deploy
Quantopian = leader in data-science platform & fintech revolution
Self-disrupt or be disrupted
Python as a leverage
Conclusion -> disrupt or be disrupted
● IT department = constraint to efficient data-science○ IT -> business solution but also biggest problem ○ IT departments will die it's not an if but when ○ Last argument = Security○ Strategy = outsource (amazon) or be inefficient ○ Why they hire old CIO …
○ IPython notebook = efficient exploration● Follow the lead of quantopian
○ Community+ python(Research->Experiment->deploy)● To be data-driven, we need data efficiency at any cost
hum...