Upload
lukasz-twardowski
View
415
Download
2
Embed Size (px)
Citation preview
http://en.wikipedia.org/wiki/Portraits_of_Shakespeare
A/B Testing and the Infinite
Monkey Theorem.Lukasz Twardowski
www.useitbetter.com
a monkey hitting keys at random for an infinite amount of time will almost surely type the complete works of William Shakespeare.
a monkey hitting keys at random for an infinite amount of time will almost surely
A/B testing
reach the conversion rate of Amazon.
A/B testing
helps find out which
of two versions performs better
while running simultaneously.
THEORY
We do this because every day is different,
unlike in the Groundhog Day movie.
Groundhog Day (1993, Dir. Harold Ramis)
http://nerds.airbnb.com/experiments-at-airbnb/
A single change, bad or good, will not change a trend.
Unless a change is A/B tested,you won’t know its impact.
Why the monkeymetaphor?
The industry average hit ratefor A/B testing
=
Provide the benchmark:EXERCISE 1.
The industry average hit ratefor A/B testing
= 14%
Just 1 out of 7 A/B tests is successful!
http://conversionxl.com/ab-tests-fail/
Provide the benchmark:EXERCISE 1.
King Kong (1933, Dir. Merian Cooper, Ernest Schoedsack)
How tobe the greatest monkey in the bizif infinity is not an option?
Be a quick monkey.
How to be the best monkey in the biz?
1 out of 7 tests winsx 2 weeks per test
= slow growth
Do the math:EXERCISE 2.
Unless you experiment at scale.
The currency in which you pay for A/B tests is traffic.
The currency in which you pay for A/B tests is traffic. The more you have, the more tests you can run.
The currency in which you pay for A/B tests is traffic. The more you have, the more tests you can run. Never waste what you have.
Shop Direct
Scaled to 101 experiments a month in two years.
100+ year old companyEtsy
25 releases a day,most of them are A/B tests.
A startup launched in 2005
http://www.slideshare.net/danmckinley/design-for-continuous-experimentation(linkedin)
Zero Tests Per Month.
Here’s the test idea, numbers and execution.
Can we proceed?
Let’s meet to discuss. Maybe
next week?
Looks good. Will check with Z
and get back to you.
So here’s the test idea, numbers…
Sorry, had other priorities.
Can we meet next week?
Sure! (D***!)Have you
checked with Z?
Have you…?
Have you…?
Ground rules: 1. Test ideas are subject to prioritization not approval.
evidencex opportunity size
x strategy=
priority
Magic formula:EXERCISE 3.
The worst idea gets tested if resources are available.
101 Tests Per Month.
Ok then, we’ll do this, this
and that test. Others will wait.
Guys, our strategy shifted
to checkout optimization.
Guys, we need to increase
basket value.
Now this and that one…
And this…
These two would work…
Xmas is coming!
DO NOTHING!
…this, this and that…
Ground rules: 2. Accept the fact that things will go wrong.
Cheat likea monkey.
How to be the best monkey in the biz?
If 1 out of 7 testswins, what about the other 6?
https://www.groovehq.com/blog/failed-ab-tests
What was the result of the Button Colors Test by Groove?
EXERCISE 1.
If 1 out of 7 testswins, what about the other 6? 5 of themwill be inconclusive.
Most tests are inconclusive because:
a) too few users were using the changed feature for it to get statistical significance.
b) the changed feature had little to do with metrics used to evaluate the test.
c) there were multiple changes in the same test and they levelled up.
Complete the sentence:EXERCISE 4.
You do it to find out what works and how well.
A/B testing is NOT about __________.making money
You can successfully run tests that have no chance of success.
… removing a feature… slowing down the website…
Cheat: Experiment to test significance.Test results show that…
didn’t reduce conversion.
… we shouldn’t waste time on that.
Cheat: test significance.Test results show that…
Cheat: One change per test. Order matters.
Select products, produce videos, upload, add links, launch test
Add linksSelect products Produce videos …
INCONCLUSIVE
… people don’tclick “watch video” links.
Cheat: Measure againstyour hypothesis.
… adding videos had no impact on conversion.
INCONCLUSIVECONCLUSIVE
Test results show that…
A great presentation by Etsy:
goo.gl/WQpY65
The benefit you getfrom A/B testing is knowledge notrevenue.
The benefit you getfrom A/B testing is knowledge notrevenue. Revenue willcome as a result of applied knowledge.
Don’t be a monkey.
How to be the best monkey in the biz?
Don’t be a gnome either.
What about this 1 test out of 7 that fails?
http://conversionxl.com/ab-tests-fail/
3 out of 4 companies (that are A/B testing) make changes based on intuition or best practices.
50% NOT A/B testing
50% A/B testing
collect underpants + ?=
profit
Solve equation:EXERCISE 5.
A/B testis launched.
Test results comeback negative.
The idea gets killed,next test islaunched.
A/B Testing Flow
Fail Fast Approach
One failed test doesn’t make collecting underpants a bad idea.
A/B testis launched.
Test results comeback negative.
Survey responsesgive a clue why.
Users are surveyedalongside the test.
Respondents’logs give
another clue.
Respondentsare emailed to
clarify the issue.
The issue is solved,the test relaunched.
Users’ behaviorsare logged.
Pre-test researchis done.
Example of A/B Testing Flow at Spotify
Prepare for failure.
Courtesy of @bendressler researcher at Spotify
The real price you payfor not researchingwhy tests fail is the death of great ideas.
UserTesting
Voice ofCustomer
I predictthat doing B
will change X by Y% because
of Z.
Are MetricsGood?
Accepted
Rejected
What reallyhappened?
Insight and Evidence
Metrics Based Evaluation
Hypothesis check
Evidence-Led FlowHypothesis Based
A/B Testing
Qual/QuantAnalytics
UserTesting
Voice ofCustomer
I predictthat doing B
will change X by Y% because
of Z.
Are MetricsGood?
Accepted
Rejected
What reallyhappened?
Insight and Evidence
Metrics Based Evaluation
Hypothesis check
Evidence-Led FlowHypothesis Based
A/B Testing
1TB Behavioural Raw Data
40MUnique
Interactions
Collect behavioral
data.
Build segmentation
rules.
41Sets of Rules
Created
Explore,analyze.visualize.
Quantifyan opportunity
Translatean insightinto a test.
average stats per website from the last month
UseItBetter - The Platform forEvidence-Led Experimentation at Scale
An analyst researching for an infinite amount of time will almost surely get you to 100% hit ratio. Which isn’t good either.
If you are going to A/B test:
1. Never waste your traffic.
1. Never waste your traffic. 2. Many small changes are better than one big change.
1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight.
1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure.
1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed.
1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed. 6. Iterate.
1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed. 6. Iterate. 7. Be honest.
For the sake of this presentation, I assumed that the results of the 7 tests I referred to had been correctly read out by the people who
are familiar with the terms like statistical significance, confidence intervals, p-value etc.
Otherwise, it’s likely that the one winning test was just a phantom.
Disclaimer
Get in touch:THE FINAL EXERCISE
Łukasz Twardowskihttps://linkedin.com/in/twardowski