32
October 28, 2017 Giuseppe “Pino” Di Fabbrizio Rakuten Institute of Technology Boston

Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

Embed Size (px)

Citation preview

Page 1: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

October 28, 2017

Giuseppe “Pino” Di Fabbrizio

Rakuten Institute of Technology – Boston

Page 2: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale
Page 3: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

3

• Motivations

• Traditional information retrieval models

• Learning-to-rank models

• Relevance

• Ranking Metrics

• Algorithms

• Ranking optimization

• Use cases

• Summary

• What is next?

Disclaimer: If not otherwise specified, images in this presentation

comply with the (CC) creative commons publishing license

Page 4: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

4

• E-commerce growing faster than traditional brick-and-mortar market ($4.06T by 2020)

• Mobile shopping adoption increasing worldwide (46% shoppers in Asia and 28% in North America)

• Online catalogs offering broader selections and competitive products

• Electronic money transactions gaining more consumers’ trust

• Massive data collected during web and mobile interactions providing foundation for machine learning-driven optimizations

1.61BShoppers

$1.86TSales

$150B*Revenues

ML

*2016 Combined revenues for Amazon, Otto Group, and Rakuten

https://www.statista.com/topics/871/online-shopping/

Page 5: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

5

Page 6: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

6

250M+ Products

40k+ Categories

Page 7: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

7

How do we find

the most relevant

products for a

search query?

www.rakuten.com

Oct 10, 2017

Page 8: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

8Query

Rankingfunction

Documents

www.rakuten.com

Nov 2016

1 2 3

4 5 6

7 8 9

Page 9: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

9

• Relevance is estimated by lexical matches of query terms with document terms

• Examples:

• Boolean models

• Vector space models

• Latent semantic indexing

• Okapi BM25

Index

Indexer

Query

Documents

Scoring

model

Top-n retrieved

documentsOn-line

Off-line

Page 10: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

10

www.rakuten.com

Oct 10, 2017

Query (Q)

Document 1 (D1)

Document 2 (D2)

iphone

7

case

iphone 7 Case

Q 1 1 1

D1 2 2 2

D2 3 1 0

Q

D1D2

Page 11: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

11

• Basic ideas

• Lexical similarity metrics

• Penalizing repeated occurrences of the same term

• Penalizing term frequency for longer documents

• Only few features

• Manually hand-tuned feature weights based on heuristic

• Cannot include important search signals such as user’s feedback, product popularity, purchase history, etc.

• Fast and scalable

Page 12: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

12

• Data-driven approach

• Directly optimize products rank based on relevance (different from classification and regression ML tasks)

• Handle thousands of features

• Robust to noisy data

• Handle personalization

• Industry & research state-of-the-art (Amazon, eBay, Microsoft, Yahoo!, Yandex, etc.)

Page 13: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

13

A document is relevant if contains the information the user was looking for when submitted the query

Relevance is subjective and depends on many factors:• context (what is displayed and how)

• task (purchase, search info, answer, etc.)

• novelty (unexpected data, ads, ext.)

• time and user’s effort involved

Page 14: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

14

1

32

www.rakuten.com

Nov 2016

Page 15: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

15

buyclick add

www.rakuten.com

Nov 2016

Page 16: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

16

• Clickthrough data (user’s implicit feedback) as source of relevance for search query / document pairs

• Pros

• Abundant and easy to harvest

• Always fresh

• Unbiased

• Cons

• Noisy

• Long tail queries

• Simple relevance mapping:

• score = 0 (not relevant), score = 3 (highly relevant)

• Purchase > cart > click > impression

Score User’s implicit feedback

3 Product purchased

2 Product added to the shopping cart

1 Product clicked

0 No clicks

Page 17: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

17

Seen products

Potentially

seen products

Unseen

products

Browser

viewport

Click

www.rakuten.com

Aug 2017

Page 18: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

18

Documents

Normalized and Discounted Cumulative Gain

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

NDCG

Page 19: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

19

• Tree ensemble method

• Handle sparse data

• Handle missing values and various value types

• Robust to outliers

• Learn higher-order feature interactions

• Invariant to feature scaling

• Highly scalable and optimized open source implementation (XGBoost)

Page 20: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

20

Point-wise

• Input: single documents / Output: class labels or scores

• Classify each document as relevant or non-relevant.

• Adjust w to reduce classification errors

Pairwise ranking

• Input: document pairs / Output: partial order preferences

• Classify pairs of documents – D1 > D2?

• Adjust w to reduce discordant pairs

List-wise ranking

• Input: document collections / ranked document list

• Score permutations -- Is {D1,D2,…} > {D1’,D2’,…} ?

• Adjust w to directly maximize ranking measure of interest (NDCG)

Di

Q

QDjDi >

QDjDi > Dk>

Page 21: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

21

Green = relevant

Gray = not-relevant

Blue arrows = boost for pair-wise loss function

Red arrows = boost for list-wise loss function

(a) is the perfect ranking;

(b) is ranking with 10 pairwise errors;

(c) is ranking with 8 pairwise errors

Page 22: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

22

• Relevance: User’s behavior signals

• Ranking Metrics: NDCG

• Machine Learning Algorithm: Gradient Tree Boosting

• Ranking optimization: List-wise with NDCG metrics

Page 23: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

23

Index

Indexer

Query

Documents

Scoring

model

Scores

Query

Features

Training

data

Learning

to rank

Re-ranking

model

Top-n ranked

documents (n > 1M)Top-m re-ranked

documents (m < 1k)

On-line

Off-line

Relevance

Page 24: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

24www.rakuten.com

Mar 2017

Page 25: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

25

Search Query: “40inch tv”

Regular text

search

Search with user’s signals

and learning-to-rank models

Not relevant

Not relevant

Not relevant

Page 26: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

26

Conversion Rate(Simulation)

NDCG CTR SimulatedQueries

Relative gain 15.58% 7.50% 10,000

Depth / Estimators

5 / 500 3 / 500 10 / 500 3 / 500

NDCG 0.687 0.688 0.685 0.689

Relative gain 15.14% 15.41% 14.92% 15.58%

Training time (56 cores)

2:45:48 1:20:57 35:25:44 1:58:07

Page 27: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

27

Automatic Speech

Recognition

ComputerVision

Natural Language

Processing

Information Retrieval

2011 2013 2013-2015 2017?

Page 28: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

28Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of

Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).

Page 29: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

29Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of

Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).

Page 30: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

30Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of

Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).

Page 31: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

31

• Traditional IR methods do not scale to modern e-commerce needs

• User’s implicit feedback is a proxy for search query / document pairs

relevance

• Learning-to-rank (LTR) methods scale to thousand of features and are

robust to data noise

• LTR with listwise-based loss function substantially improve search

relevance (15.6% NDCG increase on e-commerce data)

• NDCG improvements directly correlate to conversion rates (7.5% CTR

increase on e-commerce data)

• DNN methods for IR are starting to outperform traditional ML methods

Page 32: Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale