Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Preview:

Citation preview

a talk

by Dan Frank | @danielhfrank

Reproducibility: Data Analysis that Keeps on Giving

A software product that enables payments, not the other way around

Data at Stripe

• Product usage analytics for business / product development, machine learning for fraud prevention, and much more

• Data team as a service team to the rest of the organization: enablers, not gatekeepers

Stripe Checkout

Owning checkout experience ->optimize conversions across all sites

Author

"Feature X improves conversions by 8%""How do you know?"

"Trust me, it did."

A failure of tooling, not of methodology

Goal: data workspace that facilitates publishing both results and

methodology

Meanwhile, in academia...

Reproducibility solves these problems and more

• Revisit analysis with new data• Allow readers to deep dive into methodology• Collaborate a la Github

Meanwhile, in academia...

• "I realized that ... the open source community practiced many of the ideals of science better than academia."

• "open source ... is reproducible by necessity"• "a combination of technical tools and social practices"

Source: http://bit.ly/fperez_blog

Idea: use constraint of programmatic reproducibility as a forcing function for better

reports

Execution: programmatically executed publishing

•Publishing environment where all analysis is re-executed - programmatic reproducibility required for this to work

•Users work within constraints of reproducibility, and get beautiful publishing environment in exchange

•Viewers of published reports see methodology, and are guaranteed reproducibility

Tooling as a guideline towards good behavior

Make the easiest tools produce the best behavior

Example: Data Access

• SSH Tunnels, CSVs in home dirs, oh my• Automate away the hard parts, and make that solution play nicely with other tools

Bonus: enabling users with good tooling leads to more and better

reports

Get Startedbit.ly/notebook_helpers

Thanks.

Dan Frank | @danielhfrank

Recommended