View
32
Download
1
Embed Size (px)
Citation preview
a talk
by Dan Frank | @danielhfrank
Reproducibility: Data Analysis that Keeps on Giving
A software product that enables payments, not the other way around
Data at Stripe
• Product usage analytics for business / product development, machine learning for fraud prevention, and much more
• Data team as a service team to the rest of the organization: enablers, not gatekeepers
Stripe Checkout
Owning checkout experience ->optimize conversions across all sites
"Feature X improves conversions by 8%""How do you know?"
"Trust me, it did."
A failure of tooling, not of methodology
Goal: data workspace that facilitates publishing both results and
methodology
Meanwhile, in academia...
Reproducibility solves these problems and more
• Revisit analysis with new data• Allow readers to deep dive into methodology• Collaborate a la Github
Meanwhile, in academia...
• "I realized that ... the open source community practiced many of the ideals of science better than academia."
• "open source ... is reproducible by necessity"• "a combination of technical tools and social practices"
Source: http://bit.ly/fperez_blog
Idea: use constraint of programmatic reproducibility as a forcing function for better
reports
Execution: programmatically executed publishing
•Publishing environment where all analysis is re-executed - programmatic reproducibility required for this to work
•Users work within constraints of reproducibility, and get beautiful publishing environment in exchange
•Viewers of published reports see methodology, and are guaranteed reproducibility
Tooling as a guideline towards good behavior
Make the easiest tools produce the best behavior
Example: Data Access
• SSH Tunnels, CSVs in home dirs, oh my• Automate away the hard parts, and make that solution play nicely with other tools
Bonus: enabling users with good tooling leads to more and better
reports
Get Startedbit.ly/notebook_helpers
Thanks.
Dan Frank | @danielhfrank