19
a talk by Dan Frank | @danielhfrank Reproducibility: Data Analysis that Keeps on Giving

Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Embed Size (px)

Citation preview

Page 1: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

a talk

by Dan Frank | @danielhfrank

Reproducibility: Data Analysis that Keeps on Giving

Page 2: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving
Page 3: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

A software product that enables payments, not the other way around

Page 4: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Data at Stripe

• Product usage analytics for business / product development, machine learning for fraud prevention, and much more

• Data team as a service team to the rest of the organization: enablers, not gatekeepers

Page 5: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Stripe Checkout

Page 6: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Owning checkout experience ->optimize conversions across all sites

Author
Page 7: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

"Feature X improves conversions by 8%""How do you know?"

"Trust me, it did."

Page 8: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

A failure of tooling, not of methodology

Page 9: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Goal: data workspace that facilitates publishing both results and

methodology

Page 10: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Meanwhile, in academia...

Reproducibility solves these problems and more

• Revisit analysis with new data• Allow readers to deep dive into methodology• Collaborate a la Github

Page 11: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Meanwhile, in academia...

• "I realized that ... the open source community practiced many of the ideals of science better than academia."

• "open source ... is reproducible by necessity"• "a combination of technical tools and social practices"

Source: http://bit.ly/fperez_blog

Page 12: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Idea: use constraint of programmatic reproducibility as a forcing function for better

reports

Page 13: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Execution: programmatically executed publishing

•Publishing environment where all analysis is re-executed - programmatic reproducibility required for this to work

•Users work within constraints of reproducibility, and get beautiful publishing environment in exchange

•Viewers of published reports see methodology, and are guaranteed reproducibility

Page 14: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Tooling as a guideline towards good behavior

Page 15: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Make the easiest tools produce the best behavior

Page 16: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Example: Data Access

• SSH Tunnels, CSVs in home dirs, oh my• Automate away the hard parts, and make that solution play nicely with other tools

Page 17: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Bonus: enabling users with good tooling leads to more and better

reports

Page 18: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Get Startedbit.ly/notebook_helpers

Page 19: Daniel Frank, Data Engineer at Stripe: Reproductability: Data Analysis that Keeps on Giving

Thanks.

Dan Frank | @danielhfrank