A/B Testing and the Infinite Monkey Theory

  • View
    402

  • Download
    2

Embed Size (px)

Text of A/B Testing and the Infinite Monkey Theory

1. http://en.wikipedia.org/wiki/Portraits_of_Shakespeare A/B Testing and the Innite Monkey Theorem. Lukasz Twardowski www.useitbetter.com 2. a monkey hitting keys at random for an innite amount of time will almost surely type the complete works of William Shakespeare. 3. a monkey hitting keys at random for an innite amount of time will almost surely A/B testing reach the conversion rate of Amazon. 4. A/B testing helps nd out which of two versions performs better while running simultaneously. THEORY 5. We do this because every day is different, unlike in the Groundhog Day movie. Groundhog Day (1993, Dir. Harold Ramis) 6. http://nerds.airbnb.com/experiments-at-airbnb/ A single change, bad or good, will not change a trend. Unless a change is A/B tested, you wont know its impact. 7. Why the monkey metaphor? 8. The industry average hit rate for A/B testing = Provide the benchmark: EXERCISE 1. 9. The industry average hit rate for A/B testing = 14% Just 1 out of 7 A/B tests is successful! http://conversionxl.com/ab-tests-fail/ Provide the benchmark: EXERCISE 1. 10. King Kong (1933, Dir. Merian Cooper, Ernest Schoedsack) How to be the greatest monkey in the biz if innity is not an option? 11. Be a quick monkey. How to be the best monkey in the biz? 12. 1 out of 7 tests wins x 2 weeks per test = slow growth Do the math: EXERCISE 2. Unless you experiment at scale. 13. The currency in which you pay for A/B tests is trafc. 14. The currency in which you pay for A/B tests is trafc. The more you have, the more tests you can run. 15. The currency in which you pay for A/B tests is trafc. The more you have, the more tests you can run. Never waste what you have. 16. Shop Direct Scaled to 101 experiments a month in two years. 100+ year old company Etsy 25 releases a day, most of them are A/B tests. A startup launched in 2005 http://www.slideshare.net/danmckinley/design-for-continuous-experimentation(linkedin) 17. Zero Tests Per Month. Heres the test idea, numbers and execution. Can we proceed? Lets meet to discuss. Maybe next week? Looks good. Will check with Z and get back to you. So heres the test idea, numbers Sorry, had other priorities. Can we meet next week? Sure! (D***!) Have you checked with Z? Have you? Have you? 18. Ground rules: 1. Test ideas are subject to prioritization not approval. 19. evidence x opportunity size x strategy = priority Magic formula: EXERCISE 3. The worst idea gets tested if resources are available. 20. 101 Tests Per Month. Ok then, well do this, this and that test. Others will wait. Guys, our strategy shifted to checkout optimization. Guys, we need to increase basket value. Now this and that one And this These two would work Xmas is coming! DO NOTHING! this, this and that 21. Ground rules: 2. Accept the fact that things will go wrong. 22. Cheat like a monkey. How to be the best monkey in the biz? 23. If 1 out of 7 tests wins, what about the other 6? 24. https://www.groovehq.com/blog/failed-ab-tests What was the result of the Button Colors Test by Groove? EXERCISE 1. 25. If 1 out of 7 tests wins, what about the other 6? 5 of them will be inconclusive. 26. Most tests are inconclusive because: a) too few users were using the changed feature for it to get statistical signicance. b) the changed feature had little to do with metrics used to evaluate the test. c) there were multiple changes in the same test and they levelled up. 27. Complete the sentence: EXERCISE 4. You do it to nd out what works and how well. A/B testing is NOT about __________.making money 28. You can successfully run tests that have no chance of success. 29. removing a feature slowing down the website Cheat: Experiment to test signicance. Test results show that didnt reduce conversion. 30. we shouldnt waste time on that. Cheat: test signicance. Test results show that 31. Cheat: One change per test. Order matters. Select products, produce videos, upload, add links, launch test Add links Select products Produce videos INCONCLUSIVE 32. people dont click watch video links. Cheat: Measure against your hypothesis. adding videos had no impact on conversion. INCONCLUSIVE CONCLUSIVE Test results show that 33. A great presentation by Etsy: goo.gl/WQpY65 34. The benet you get from A/B testing is knowledge not revenue. 35. The benet you get from A/B testing is knowledge not revenue. Revenue will come as a result of applied knowledge. 36. Dont be a monkey. How to be the best monkey in the biz? Dont be a gnome either. 37. What about this 1 test out of 7 that fails? 38. http://conversionxl.com/ab-tests-fail/ 3 out of 4 companies (that are A/B testing) make changes based on intuition or best practices. 50% NOT A/B testing 50% A/B testing 39. collect underpants + ? = prot Solve equation: EXERCISE 5. 40. A/B test is launched. Test results come back negative. The idea gets killed, next test is launched. A/B Testing Flow Fail Fast Approach 41. One failed test doesnt make collecting underpants a bad idea. 42. A/B test is launched. Test results come back negative. Survey responses give a clue why. Users are surveyed alongside the test. Respondents logs give another clue. Respondents are emailed to clarify the issue. The issue is solved, the test relaunched. Users behaviors are logged. Pre-test research is done. Example of A/B Testing Flow at Spotify Prepare for failure. Courtesy of @bendressler researcher at Spotify 43. The real price you pay for not researching why tests fail is the death of great ideas. 44. User Testing Voice of Customer I predict that doing B will change X by Y% because of Z. Are Metrics Good? Accepted Rejected What really happened? Insight and Evidence Metrics Based Evaluation Hypothesis check Evidence-Led Flow Hypothesis Based A/B Testing Qual/Quant Analytics 45. User Testing Voice of Customer I predict that doing B will change X by Y% because of Z. Are Metrics Good? Accepted Rejected What really happened? Insight and Evidence Metrics Based Evaluation Hypothesis check Evidence-Led Flow Hypothesis Based A/B Testing 46. 1TB Behavioural Raw Data 40M Unique Interactions Collect behavioral data. Build segmentation rules. 41 Sets of Rules Created Explore, analyze. visualize. Quantify an opportunity Translate an insight into a test. average stats per website from the last month UseItBetter - The Platform for Evidence-Led Experimentation at Scale 47. An analyst researching for an innite amount of time will almost surely get you to 100% hit ratio. Which isnt good either. 48. If you are going to A/B test: 49. 1. Never waste your trafc. 50. 1. Never waste your trafc. 2. Many small changes are better than one big change. 51. 1. Never waste your trafc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 52. 1. Never waste your trafc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 53. 1. Never waste your trafc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. Its OK to fail if you know why you failed. 54. 1. Never waste your trafc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. Its OK to fail if you know why you failed. 6. Iterate. 55. 1. Never waste your trafc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. Its OK to fail if you know why you failed. 6. Iterate. 7. Be honest. 56. For the sake of this presentation, I assumed that the results of the 7 tests I referred to had been correctly read out by the people who are familiar with the terms like statistical signicance, condence intervals, p-value etc. Otherwise, its likely that the one winning test was just a phantom. Disclaimer 57. Get in touch: THE FINAL EXERCISE ukasz Twardowski https://linkedin.com/in/twardowski