Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

October 28, 2017

Giuseppe “Pino” Di Fabbrizio

Rakuten Institute of Technology – Boston

3

• Motivations

• Traditional information retrieval models

• Learning-to-rank models

• Relevance

• Ranking Metrics

• Algorithms

• Ranking optimization

• Use cases

• Summary

• What is next?

Disclaimer: If not otherwise specified, images in this presentation

comply with the (CC) creative commons publishing license

4

• E-commerce growing faster than traditional brick-and-mortar market ($4.06T by 2020)

• Mobile shopping adoption increasing worldwide (46% shoppers in Asia and 28% in North America)

• Online catalogs offering broader selections and competitive products

• Electronic money transactions gaining more consumers’ trust

• Massive data collected during web and mobile interactions providing foundation for machine learning-driven optimizations

1.61BShoppers

$1.86TSales

$150B*Revenues

ML

*2016 Combined revenues for Amazon, Otto Group, and Rakuten

https://www.statista.com/topics/871/online-shopping/

5

6

250M+ Products

40k+ Categories

7

How do we find

the most relevant

products for a

search query?

www.rakuten.com

Oct 10, 2017

8Query

Rankingfunction

Documents

www.rakuten.com

Nov 2016

1 2 3

4 5 6

7 8 9

9

• Relevance is estimated by lexical matches of query terms with document terms

• Examples:

• Boolean models

• Vector space models

• Latent semantic indexing

• Okapi BM25

Index

Indexer

Query

Documents

Scoring

model

Top-n retrieved

documentsOn-line

Off-line

10

www.rakuten.com

Oct 10, 2017

Query (Q)

Document 1 (D1)

Document 2 (D2)

iphone

7

case

iphone 7 Case

Q 1 1 1

D1 2 2 2

D2 3 1 0

Q

D1D2

11

• Basic ideas

• Lexical similarity metrics

• Penalizing repeated occurrences of the same term

• Penalizing term frequency for longer documents

• Only few features

• Manually hand-tuned feature weights based on heuristic

• Cannot include important search signals such as user’s feedback, product popularity, purchase history, etc.

• Fast and scalable

12

• Data-driven approach

• Directly optimize products rank based on relevance (different from classification and regression ML tasks)

• Handle thousands of features

• Robust to noisy data

• Handle personalization

• Industry & research state-of-the-art (Amazon, eBay, Microsoft, Yahoo!, Yandex, etc.)

13

A document is relevant if contains the information the user was looking for when submitted the query

Relevance is subjective and depends on many factors:• context (what is displayed and how)

• task (purchase, search info, answer, etc.)

• novelty (unexpected data, ads, ext.)

• time and user’s effort involved

14

1

32

www.rakuten.com

Nov 2016

15

buyclick add

www.rakuten.com

Nov 2016

16

• Clickthrough data (user’s implicit feedback) as source of relevance for search query / document pairs

• Pros

• Abundant and easy to harvest

• Always fresh

• Unbiased

• Cons

• Noisy

• Long tail queries

• Simple relevance mapping:

• score = 0 (not relevant), score = 3 (highly relevant)

• Purchase > cart > click > impression

Score User’s implicit feedback

3 Product purchased

2 Product added to the shopping cart

1 Product clicked

0 No clicks

17

Seen products

Potentially

seen products

Unseen

products

Browser

viewport

Click

www.rakuten.com

Aug 2017

18

Documents

Normalized and Discounted Cumulative Gain

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

NDCG

19

• Tree ensemble method

• Handle sparse data

• Handle missing values and various value types

• Robust to outliers

• Learn higher-order feature interactions

• Invariant to feature scaling

• Highly scalable and optimized open source implementation (XGBoost)

20

Point-wise

• Input: single documents / Output: class labels or scores

• Classify each document as relevant or non-relevant.

• Adjust w to reduce classification errors

Pairwise ranking

• Input: document pairs / Output: partial order preferences

• Classify pairs of documents – D1 > D2?

• Adjust w to reduce discordant pairs

List-wise ranking

• Input: document collections / ranked document list

• Score permutations -- Is {D1,D2,…} > {D1’,D2’,…} ?

• Adjust w to directly maximize ranking measure of interest (NDCG)

Di

Q

QDjDi >

QDjDi > Dk>

21

Green = relevant

Gray = not-relevant

Blue arrows = boost for pair-wise loss function

Red arrows = boost for list-wise loss function

(a) is the perfect ranking;

(b) is ranking with 10 pairwise errors;

(c) is ranking with 8 pairwise errors

22

• Relevance: User’s behavior signals

• Ranking Metrics: NDCG

• Machine Learning Algorithm: Gradient Tree Boosting

• Ranking optimization: List-wise with NDCG metrics

23

Index

Indexer

Query

Documents

Scoring

model

Scores

Query

Features

Training

data

Learning

to rank

Re-ranking

model

Top-n ranked

documents (n > 1M)Top-m re-ranked

documents (m < 1k)

On-line

Off-line

Relevance

24www.rakuten.com

Mar 2017

25

Search Query: “40inch tv”

Regular text

search

Search with user’s signals

and learning-to-rank models

Not relevant

Not relevant

Not relevant

26

Conversion Rate(Simulation)

NDCG CTR SimulatedQueries

Relative gain 15.58% 7.50% 10,000

Depth / Estimators

5 / 500 3 / 500 10 / 500 3 / 500

NDCG 0.687 0.688 0.685 0.689

Relative gain 15.14% 15.41% 14.92% 15.58%

Training time (56 cores)

2:45:48 1:20:57 35:25:44 1:58:07

27

Automatic Speech

Recognition

ComputerVision

Natural Language

Processing

Information Retrieval

2011 2013 2013-2015 2017?

28Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of

Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).





31

• Traditional IR methods do not scale to modern e-commerce needs

• User’s implicit feedback is a proxy for search query / document pairs

relevance

• Learning-to-rank (LTR) methods scale to thousand of features and are

robust to data noise

• LTR with listwise-based loss function substantially improve search

relevance (15.6% NDCG increase on e-commerce data)

• NDCG improvements directly correlate to conversion rates (7.5% CTR

increase on e-commerce data)

• DNN methods for IR are starting to outperform traditional ML methods