30
1

Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

Embed Size (px)

DESCRIPTION

A/B testing is a great technique to experiment with changes to your product. At Etsy we make extensive use of them to test out ideas; we’ve got 30+ running right now. Although the concept is simple, the execution is a bit tricker then you’d think. In this talk I will cover the common, and a few of the not so common, mistakes that can skew your results.

Citation preview

Page 1: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

1  

Page 2: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  hi,  I’m  corey,  from  etsy  (@coreyloose)  -­‐    Marketplace  where  people  around  the  world  connect  to  buy  and  sell  unique  goods  (not  all  that  different  from  the  art  fair  going  on  right  now)  -­‐  We  like  to  run  a  lot  of  a/b  tests  

2  

Page 3: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  This  talk  is  201,  but  here’s  the  quick  101  

3  

Page 4: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Have  a  theory  on  something  that  will  make  your  product  beJer  -­‐  Show  it  to  some  random  of  visitors  (but  keep  it  consistent)  “buckeMng”  -­‐  Try  both  for  a  bit  and  see  which  one  does  beJer  -­‐  Not  only  does  this  test  if  your  idea  is  good,  it  also  tests  your  implementaMon  and  

all  sorts  of  complex  interacMons  -­‐  Would  this  one  cause  an  Increased  error  rate  in  variaMon  selecMon?  

4  

Page 5: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  As  I  just  explained  it,  A/B  tesMng  sounds  simple  +  awesome  -­‐  And  it  is,  but  as  always  the  devil  is  in  the  details  -­‐  I’m  going  to  tell  a  bunch  of  stories  of  stuff  that  we  did  wrong,  not  to  be  negaMve  

but  it’s  just  more  interesMng  then  spraying  campaign  around  -­‐  Lets  start  with  a  really  common  no-­‐no  

5  

Page 6: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Trying  one  thing  for  a  week,  then  trying  another  

6  

Page 7: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Alluring  because  it  doesn’t  require  you  to  have  rich  metric  gathering  or  buckeMng  -­‐  You’re  going  to  need  some  tooling  -­‐  We  built  Feature  and  Catapult  

7  

Page 8: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  (only  code  in  the  presentaMon)  -­‐  Plenty  of  other  opMons  out  there,  but  we’re  happy  with  this  -­‐  Open  source  -­‐  Easy  enough  that  PMs  can  change  experiment  weights  -­‐  Uses  cookie  to  ensure  user  experience  stays  consistent  -­‐  You’ll  need  your  own  logging  to  do  analysis  

8  

Page 9: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Internal  tool  that  does  data  analysis  of  a/b  tests  based  on  data  processing  from  feature  event  logs  

-­‐  For  this  experiment:  more  pages  but  less  add  to  cart  -­‐  No  staMsMcal  significance  for  conversion  rate  

9  

Page 10: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  A  bit  sobering  but  you  goJa  have  a  lot  of  traffic,  or  make  a  big  change  to  do  this  

10  

Page 11: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  WriJen  by  an  Etsy  alumni  -­‐  To  detect  a  small  change  you  need  a  lot  of  Mme  

11  

Page 12: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  The  good  news  is  if  you  can  make  a  bigger  effect,  it  gets  much  easier  to  detect  (1%  =>  5%)  

12  

Page 13: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Have  a  hypothesis  going  in,  no  fishing  (lets  just  pump  some  people  full  of  this  new  chemical)  

-­‐  Lets  get  into  some  more  interesMng  failures  

13  

Page 14: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Going  to  tell  a  few  stories  about  a  first  type  of  failure  -­‐  Mechanical  

14  

Page 15: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  All  users  get  bucketed  but  only  Australian  users  are  eligible  for  an  experiment  

15  

Page 16: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  This  is  what  really  happens,  since  the  rest  of  the  world  isn’t  eligible  -­‐  Going  to  under  represent  any  effects  

16  

Page 17: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Need  to  exclude  the  rest  

17  

Page 18: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  If  your  experiment  causes  the  page  to  be  a  lot  bigger,  weirdness  can  happen  -­‐  Page  loads  slower  

18  

Page 19: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  This  ensures  the  user  actually  saw  the  page  +  we  have  access  to  more  informaMon  

19  

Page 20: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Slow  network  speed  on  mobile  -­‐  The  combo  led  to  experiments  being  under-­‐reported  -­‐  NoMced  because  experiment  group  would  appear  to  have  far  less  people  in  it  -­‐  Lesson:  Watch  page  weight  

20  

Page 21: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  We  don’t  support  ie7  -­‐  We  ran  an  experiment  once  that  looked  like  this  in  Ie7  -­‐  Was  sMll  enough  traffic  to  tank  experiment  -­‐  Lesson:  Slice  by  user  groups  in  the  analysis  

21  

Page 22: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  (hal  9000)  -­‐  Ran  an  experiment  on  our  acMvity  feed,  small  %  -­‐  All  the  metrics  tanked  -­‐  Turned  out  a  bot  we  have  to  monitor  page  Mmes  was  bucketed  in  -­‐  Lesson:  a/b  tooling  ignore  your  bots  

22  

Page 23: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Previous  stories  were  mechanical,  but  the  real  power  of  A/B  tesMng  is  seeing  how  your  idea  interacts  with  the  world  

23  

Page 24: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Implemented  as  a  monolithic  release  -­‐  A/B  test  kept  as  a  hurdle  at  the  end  

24  

Page 25: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Go  check  out  dan  mckinley’s  talk  

25  

Page 26: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  It  failed  terribly,  purchases  down  over  20%  -­‐  Since  we  built  it  all  at  once,  we  had  nothing  to  pin  it  on  -­‐  What  if  we  had  done  something  simple,  are  more  items  beJer?  –  40  v.  80  items  on  

a  page  -­‐  Lesson:  test  ideas  in  isolaMon  

26  

Page 27: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  Here’s  a  story  about  an  A/B  test  telling  us  something  our  product  intuiMon  didn’t  -­‐  Seems  like  an  obvious,  simple  win  -­‐  Logins  are  way  down  -­‐  Turns  out  average  users  use  way  worse  passwords  then  employees  -­‐  Ended  up  being  a  no-­‐go  for  other  reasons  -­‐  Lesson:  unintended  consequences  

27  

Page 28: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  You  can’t  measure  everything  that  maJers  -­‐  Can  iron  out  the  mechanical  issues  -­‐  Can  run  Mghtly  scoped  tests  that  allow  you  to  make  confident  decisions  -­‐  What  if  you  asked  ½  of  the  people  you  met  for  the  rest  of  the  day  for  a  $1  -­‐  You’d  end  up  with  more  money  

28  

Page 29: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

-­‐  That’s  what  you’re  doing  with  this  -­‐  If  you  a/b  test  it,  you’ll  get  more  signups  +  probably  beJer  Mme-­‐on-­‐page  -­‐  Maybe  a  few  more  bounces  -­‐  But  goodwill  &  brand  impression  is  hard  to  measure  

29  

Page 30: Madison+ UX 2014: A/B Testing - The Good, The Bad, and The Ugly

30