Upload
peadar-coyle
View
979
Download
9
Embed Size (px)
Citation preview
ProbabilisticProbabilisticProgrammingProgramming
A Brief introduction to Probabilistic Programming and Python
EuroSciPy - University of Cambridge August 2015
All opinions my own
Who am I?Who am I?
I work as a Data Scientist for a large Telecommunications Company
Masters in MathematicsInterned at AmazonWas a consultant for a whileOccasional contributor to Pandas and other projectsCo-organizer of the Data Science Meetup in LuxembourgMember of Royal Statistical Society and NumFOCUS@springcoil
What is Probabilistic ProgrammingWhat is Probabilistic Programming
Basically using random variables instead of variablesAllows you to create a generative story rather than a black boxA different tool to Machine LearningA different paradigm to frequentist statisticsForces you to be explicit about your 'subjective' assumptions
Source: Olivier Grisel
Source: Olivier Grisel
Bayesian StatisticsBayesian Statistics
I studied Mathematics, and encountered in textbooks BayesiansThis is a hard area to do by pen and paper, and most integrals can't besolved in exact formThankfully there was an invention of Monte Carlo SimulationsThese simulations are used to approximate your likelihood function
Some terminologySome terminology
Attribution: Quantopian blog
How do you pick your prior?How do you pick your prior?
This is a bit of an artYou generally base the prior on experience As you add more data this matters less and less
Huh but isn't ProbabilisticHuh but isn't ProbabilisticProgramming just Stan and BUGS?Programming just Stan and BUGS?
No in Python you have PyMC3No in Python you have PyMC3
A complete rewrite of PyMC2 now in 'Beta' statusBased upon Theano Computational techniques for handling gradientsAutomatic Differentiation and GPU speedupTheano - is also used in deep learning!Currently there is a project to port ' ' from I gave a thorough tutorial on this - Key authors: John Salvatier, Thomas Wiecki, Chris Fonnesbeck
BMH PyMC2 to PyMC3my github
Case study: Rugby AnalyticsCase study: Rugby Analytics
I wanted to do a model of the Six Nations last year.
I wanted to build an understandable model to predict the winner
Key Info: Inferring the 'strength' of each team.
We only have scoring data, which is noisy hence Bayesian Stats
What did I do?What did I do?
1. I picked Gamma as a prior for all teams
2. I used a Hierarchical Model because I wanted home advantage to bestronger for stronger teams based
3. From this I was able to create a novel model based only on historicalresults and scoring intensity
4. I simulated the likelihood function using MCMC
Run the modelRun the model
What actually happenedWhat actually happenedThe model incorrectly predicted that England would come out on top.Ireland actually won by points difference of 6 points. It really came down to the wire!"Prediction is difficult especially about the future"One of the problems is what we call 'over-shrinkage' and you candelve into the results to see what the errors are, my model was withinthe errors. Hat tip: Thanks to Abraham Flaxman and the PyMC3 on helping meport this from PyMC2 to PyMC3
Lessons learnedLessons learned
I can build an explainable model using PyMC2 and PyMC3
Generative stories help you build up interest with your colleagues
Communication is the 'last mile' problem of Data Science
PyMC3 is cool please use it and please contribute
Wanna learn more?Wanna learn more?
BMHBMH
Jake VanDerPlas
PyMC3PyMC3
[email protected]@googlemail.com