Funnel Analysis in Hadoop at Etsy

Preview:

DESCRIPTION

As an ecommerce site with more than 800,000 different sellers, Etsy is particularly interested in understanding how shoppers find the items they seek. Part of this understanding involves attributing successful purchases to specific features on the site. This attribution model allows us to compare and refine Etsy’s features, but also provides valuable signals for A/B testing, search quality, and recommenders. However, the path to a successful handmade purchase often involves multiple features over the course of several visits. This talk will discuss the challenges of funnel analysis at Etsy and the corresponding deficiencies of several widely used web analytics tools. We’ll then dive into our event sequence matching tool, which we’ve successfully applied to hundreds of millions of visits in a single Hadoop job and is widely used across our big data stack. Finally, we’ll take a look at some of our applications of the tool and compare it to related work.

Citation preview

Funnel Analysis in Hadoop at Etsy

Steve Mardenfeld, Wil Stuckey, Matt Walker

Tuesday, March 12, 13

Tuesday, March 12, 13

Handmade MarketPlace

Tuesday, March 12, 13

Tuesday, March 12, 13

Data Driven Development

• Use data to make informed decisions

• Use data to evaluate the efficacy of our products

Tuesday, March 12, 13

Funnel AnalysisFunnel Analysis

Tuesday, March 12, 13

Registration Funnel

Tuesday, March 12, 13

Registration Funnel

5 0 , 0 0 0 1 0 , 0 0 0

1 0 0 % 2 0 %

Tuesday, March 12, 13

Registration Funnel

5 0 , 0 0 0 1 0 , 0 0 0

1 0 0 % 6 0 %4 5 , 0 0 0

9 0 %

1 5 , 0 0 0

3 3 %1 0 0 % 9 0 % 2 9 % 2 0 %

Tuesday, March 12, 13

Funnels ++

• Funnels are more than just an optimization tool

• Use them to understand different pathways throughout our site

• Partition and compare these pathways by attributes

• A/B tests, categories, queries, cohorts

• Attribution model

Tuesday, March 12, 13

Attribution Models

• Tie conversions and successes to specific products and actions

• Use to gain understanding of our users’ interaction with Etsy

• Help us to measure gains in A/B testing

• Easily compare different varieties of the same product

• Attribution techniques for internal and external attribution

Tuesday, March 12, 13

Attributions Illustrated

Tuesday, March 12, 13

Attributions Illustrated

• last product interaction - browse

Tuesday, March 12, 13

Attributions Illustrated

• last product interaction - browse

• co-occurrence - search and browse

Tuesday, March 12, 13

Attributions Illustrated

• last product interaction - browse

• co-occurrence - search and browse

• direct funnel attribution - search

Tuesday, March 12, 13

Indirect Attribution Illustrated

Tuesday, March 12, 13

Indirect Attribution Illustrated

• direct funnel attribution - shop home

Tuesday, March 12, 13

Indirect Attribution Illustrated

• direct funnel attribution - shop home

• indirect funnel attribution - search

Tuesday, March 12, 13

Aggregate Results

Search Clicked Listing Purchased5 0 , 0 0 0 2 5 , 0 0 0 5 , 0 0 0E v e n t s

Direct

Tuesday, March 12, 13

Aggregate Results

Search Clicked Listing Purchased

Search Clicked Listing Shop Clicked Listing

5 0 , 0 0 0

5 0 , 0 0 0

2 5 , 0 0 0

2 5 , 0 0 0

5 , 0 0 0

1 0 , 0 0 0 5 , 0 0 0

E v e n t s

E v e n t s

Direct

IndirectPurchased

3 , 0 0 0

Tuesday, March 12, 13

Aggregate Results

Search Clicked Listing Purchased

Search Clicked Listing Shop Clicked Listing

5 0 , 0 0 0

5 0 , 0 0 0

2 5 , 0 0 0

2 5 , 0 0 0

5 , 0 0 0

1 0 , 0 0 0 5 , 0 0 0

E v e n t s

E v e n t s

Search Clicked Listing5 0 , 0 0 0 2 5 , 0 0 0E v e n t s

Direct

Indirect

Combined

Purchased3 , 0 0 0

Purchased8 , 0 0 0

Tuesday, March 12, 13

Segmenting Within Funnels

Search Clicked Listing5 0 , 0 0 0 2 0 , 0 0 0C o u n t s

Old AlgorithmPurchased

1 5 , 0 0 0

Search Clicked Listing1 0 0 , 0 0 0 6 0 , 0 0 0C o u n t s

New HotnessPurchased

1 5 , 0 0 0

Tuesday, March 12, 13

Segmenting Within Funnels

Search Clicked Listing1 0 0 % 5 0 %S t e p

1 0 0 % 5 0 %To t a l

Old AlgorithmPurchased

4 0 %2 0 %

Search Clicked Listing1 0 0 % 6 0 %S t e p

1 0 0 % 6 0 %To t a l

New HotnessPurchased

2 5 %1 5 %

Tuesday, March 12, 13

Segmenting Across Funnels

* Clicked Listing1 0 0 % 5 0 %S e a r c h

1 0 0 % 4 0 %B r o w s e

Purchased4 0 %3 0 %

1 0 0 % 6 0 %H o m e 3 6 %

1 0 0 % 6 2 %A c t i v i t y F e e d

1 0 0 % 4 7 %Ta s t e Te s t2 8 %3 1 %

1 0 0 % 4 5 %S e a r c h A d s 3 8 %

Tuesday, March 12, 13

Democratized FunnelsHow do we make this awesomeness available for everyone?

Tuesday, March 12, 13

Tuesday, March 12, 13

Awesome infrastructure but...

• Must be an engineer to write your own queries

• Engineering resources become the bottleneck

• Hard to scale as the company grows

Tuesday, March 12, 13

So we want to:

• Allow the data engineers to focus on higher priority things.

• Allow people to answer their own questions.

Tuesday, March 12, 13

How do we do this?

Tuesday, March 12, 13

The Path of Funnel Tools

google spreadsheets internal tools

• Funnel Cake

• Feature Funnel

Tuesday, March 12, 13

Funnel Cake

Tuesday, March 12, 13

Version 1

Tuesday, March 12, 13

Real Time

Tuesday, March 12, 13

Lets do it in PHP!

Tuesday, March 12, 13

Why?

Tuesday, March 12, 13

It’s the “Etsy Way”

• Already have existing infrastructure

• Operationally stable

• The same tools everyone else is using means a better adoption rate across Engineering

Tuesday, March 12, 13

• Event Stream

• Code runs on every page view

• Simple matching system

• (ab)Used memcached as a temporary storage

• Rolled up to DB every min (near real time)

Tuesday, March 12, 13

What happened?

Tuesday, March 12, 13

Broader adoption!Over 75 Funnels were created

Tuesday, March 12, 13

Shiny Visualized Results

Tuesday, March 12, 13

• Funnels had to be setup ahead of time (no backfills)

• Reconciliation is hard

• Limited to events in our web clickstream (ios/android would be excluded)

• Scaling and Operational issues

• Difficult to maintain multiple stacks

Tuesday, March 12, 13

Turns out that we don’t make Product decisions in real time

http://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analytics

Tuesday, March 12, 13

Version 2

Tuesday, March 12, 13

Tuesday, March 12, 13

Getting back on the elephant

• Able to carry over the User Interface from v1

• Standardized event sessionization

• Operationally supported infrastructure

• Nightly batch process

Tuesday, March 12, 13

Feature Funnel

Tuesday, March 12, 13

Attribution for ALL

Tuesday, March 12, 13

Built in metrics

• Click through rate

• Listing impressions

• Purchase rate

• Purchase from shop rate

• Favorite from shop rate

• More...

-

Tuesday, March 12, 13

Built in metrics

• Click through rate

• Listing impressions

• Purchase rate

• Purchase from shop rate

• Favorite from shop rate

• More...

- Direct Attribution

- Direct Attribution

- Direct Attribution

- Indirect Attribution

- Indirect Attribution

-

Tuesday, March 12, 13

Segmentation

• Primarily used for A/B test analysis

• Arbitrary segmentation

Tuesday, March 12, 13

How do we get it?

Tuesday, March 12, 13

How do we get it?

select (clicks/visits * 100.0) as "CTR"  from feature_funnel   where event_type = 'search'  and ab_test = 'sitewide'  and epoch_s = 1361318400  and group_name = 'ALL_GROUPINGS';

Tuesday, March 12, 13

The Future?

Tuesday, March 12, 13

Features

• Eliminate the need for an engineer to write the queries

• Robust segmentation

• Not be limited to visit sessions

• Run Ad Hoc queries

Tuesday, March 12, 13

Build your own Funnels

show the builder ui here

Tuesday, March 12, 13

Mechanics of Funnel Analysis

Tuesday, March 12, 13

Input

• Event sequences

• Typically visits to website

• Sessionization scheme left to developer

Tuesday, March 12, 13

Tuesday, March 12, 13

Funnel

• Search

• Listing

• Cart

Tuesday, March 12, 13

Tuesday, March 12, 13

Tuesday, March 12, 13

Tuesday, March 12, 13

Tuesday, March 12, 13

Tuesday, March 12, 13

Query

• Only discussed event types

• Funnel steps require additional constraints:

• Listing referred by search

• Added that listing to cart

Tuesday, March 12, 13

Cartadded_listing_id 119469855

cart_listing_ids 119469855, ...

cart_type guest

... ...

Tuesday, March 12, 13

Query

• Listing referred by search

• Added that listing to cart

Tuesday, March 12, 13

Cartadded_listing_id 119469855

cart_listing_ids 119469855, ...

cart_type guest

... ...

Tuesday, March 12, 13

Cartadded_listing_id 119469855

cart_listing_ids 119469855, ...

cart_type guest

... ...

cart.added_listing_id == listing.listing_id

Tuesday, March 12, 13

Query

• Search

• Listing & Referred

• Cart & AddedListing

Tuesday, March 12, 13

Pattern Matching

• Apply query to event sequence

• Select out tuples of matching events

Tuesday, March 12, 13

What About Incomplete Funnels?

Tuesday, March 12, 13

Tuesday, March 12, 13

null

Tuesday, March 12, 13

nullnull

Tuesday, March 12, 13

Funnel Analysis

• Replace events with 1

• Replace nulls with 0

• Sum

Tuesday, March 12, 13

null

nullnull

Tuesday, March 12, 13

null

nullnull

1 1 1

11

1

Tuesday, March 12, 13

0

00

1 1 1

11

1

Tuesday, March 12, 13

123

Tuesday, March 12, 13

min(1, 1)min(2, 1)min(3, 1)

Tuesday, March 12, 13

111

Tuesday, March 12, 13

What Do We Keep?

Tuesday, March 12, 13

Tuesday, March 12, 13

Search

query “dinosaur”

listing_ids 119469855, 90583707, ...

loc“http://www.etsy.com/search/handmade/patterns?q=dinosaur&order=most_relevant&view_type=gal

lery&ship_to=ZZ”

... ...

Listing

listing_id 119469855

ref“http://www.etsy.com/search/handmade/patterns?q=dinosaur&order=most_relevant&view_type=gal

lery&ship_to=ZZ”

shop_id 7415158

... ...

added_listing_id 119469855

cart_listing_ids 119469855, ...

cart_type guest

... ...

Cart

Tuesday, March 12, 13

Tuesday, March 12, 13

Segmenting Funnels

Tuesday, March 12, 13

Tuesday, March 12, 13

Search

query “dinosaur”

listing_ids 119469855, 90583707, ...

loc“http://www.etsy.com/search/handmade/patterns?q=dinosaur&order=most_relevant&view_type=gal

lery&ship_to=ZZ”

... ...

Listing

listing_id 119469855

ref“http://www.etsy.com/search/handmade/patterns?q=dinosaur&order=most_relevant&view_type=gal

lery&ship_to=ZZ”

shop_id 7415158

... ...

added_listing_id 119469855

cart_listing_ids 119469855, ...

cart_type guest

... ...

Cart

Tuesday, March 12, 13

added_listing_id 119469855

cart_listing_ids 119469855, ...

cart_type guest

... ...

Cart

listing_idquery

Tuesday, March 12, 13

query listing_id cart_type

Tuesday, March 12, 13

Segmented Funnel Analysis

• Extract segmenting properties

• Compute indicators as before

• Group on segmenting properties and sum

Tuesday, March 12, 13

MapReduce

• Work is done map-side

• Common first step in our jobs

• Expensive computation limited to first round mappers

Tuesday, March 12, 13

Event Sequence Pattern Matching

Tuesday, March 12, 13

Components

• Predicate: matches/rejects events

• Query: tuple of predicates

• Match: tuple of events

Tuesday, March 12, 13

Match Predicates

• Select an event based on:

• Full event sequence

• Prior matched events

• Current candidate

Tuesday, March 12, 13

Match Predicate DSL

• Combine predicates with logical operators

• val Query = Seq(Search, Listing & Referred, Cart & AddedListing)

Tuesday, March 12, 13

Semantics

• Fixed number of events in match

• Arbitrary number of matches per sequence

• Collect and extend all partial matches

Tuesday, March 12, 13

Tuesday, March 12, 13

Search Listing Shop

Listing Cart Home

Search Listing Search

Tuesday, March 12, 13

Search, Listing, Shop, Listing, Cart, Home, Search, Listing, Search

Tuesday, March 12, 13

Root

Search

Query

Search, Listing, Shop, Listing, Cart, Home, Search, Listing, Search

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

Search

Query

Search, Listing, Shop, Listing, Cart, Home, Search, Listing, Search

?

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

SearchSearch

Query

Listing, Shop, Listing, Cart, Home, Search, Listing, Search

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

SearchSearch

Query

Listing, Shop, Listing, Cart, Home, Search, Listing, Search

?

?

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

Search

Listing

Search

Query

Shop, Listing, Cart, Home, Search, Listing, Search

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

Search

Listing

Search

Query

Shop, Listing, Cart, Home, Search, Listing, Search

?

?

?

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

Search

Listing

Search

Query

Listing, Cart, Home, Search, Listing, Search

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

Search

Listing

Search

Query

Listing, Cart, Home, Search, Listing, Search

?

?

?

Cart & AddedListing

Listing & Referred

Tuesday, March 12, 13

Root

Search

Listing

Search

Query

Cart, Home, Search, Listing, Search

Listing & Referred

Cart & AddedListing

Tuesday, March 12, 13

Root

Search

Listing

Search

Query

Cart, Home, Search, Listing, Search

Listing & Referred

?

?

?

Cart & AddedListing

Tuesday, March 12, 13

Root

Search

Listing

Cart

Search

Query

Home, Search, Listing, Search

Listing & Referred

Cart & AddedListing

Tuesday, March 12, 13

Root

Search

Listing

Cart

Search

Query

Home, Search, Listing, Search

Listing & Referred

?

?

?

Cart & AddedListing

Tuesday, March 12, 13

Root

Search

Listing

Cart

Search

Query

Search, Listing, Search

Listing & Referred

Cart & AddedListing

Tuesday, March 12, 13

Root

Search

Listing

Cart

Search

Query

Search, Listing, Search

Listing & Referred

?

?

?

Cart & AddedListing

Tuesday, March 12, 13

Root

Search Search

Listing

Cart

Search

Query

Listing, Search

Listing & Referred

Cart & AddedListing

Tuesday, March 12, 13

Root

Search Search

Listing

Cart

Search

Query

Listing, Search

Listing & Referred

?

? ?

?

Cart & AddedListing

Tuesday, March 12, 13

Root

Search Search

Listing Listing

Cart

Search

Query

Search

Listing & Referred

Cart & AddedListing

Tuesday, March 12, 13

Root

Search Search

Listing Listing

Cart

Search

Query

Search

Listing & Referred

?

? ?

? ?

Cart & AddedListing

Tuesday, March 12, 13

Root

Search Search Search

Listing Listing

Cart

Search

Query

Listing & Referred

Cart & AddedListing

Tuesday, March 12, 13

null

nullnull

Tuesday, March 12, 13

Tuesday, March 12, 13

Tuesday, March 12, 13

Match Tree

• Purely functional data structure

• Holds matched events and indices

• Match prefixes shared

Tuesday, March 12, 13

Match Tree Algorithm

• Fold over sequence accumulating tree

• May extend any non-terminal node

• Each level in tree corresponds to predicate

Tuesday, March 12, 13

Practicality

• Explodes, but

• Queries are constant length

• Sequences are bounded (visits)

• Predicates constrain growth

Tuesday, March 12, 13

Summary

• Why funnels are interesting

• What we’ve built with them at Etsy

• Our approach to funnel analysis

Tuesday, March 12, 13

Questions?

• Steve Mardenfeld

• Wil Stuckey @quiiver

• Matt Walker @data_daddy

Tuesday, March 12, 13