Upload
rakuten-inc
View
107
Download
0
Embed Size (px)
Citation preview
October 28, 2017
Giuseppe “Pino” Di Fabbrizio
Rakuten Institute of Technology – Boston
3
• Motivations
• Traditional information retrieval models
• Learning-to-rank models
• Relevance
• Ranking Metrics
• Algorithms
• Ranking optimization
• Use cases
• Summary
• What is next?
Disclaimer: If not otherwise specified, images in this presentation
comply with the (CC) creative commons publishing license
4
• E-commerce growing faster than traditional brick-and-mortar market ($4.06T by 2020)
• Mobile shopping adoption increasing worldwide (46% shoppers in Asia and 28% in North America)
• Online catalogs offering broader selections and competitive products
• Electronic money transactions gaining more consumers’ trust
• Massive data collected during web and mobile interactions providing foundation for machine learning-driven optimizations
1.61BShoppers
$1.86TSales
$150B*Revenues
ML
*2016 Combined revenues for Amazon, Otto Group, and Rakuten
https://www.statista.com/topics/871/online-shopping/
5
6
250M+ Products
40k+ Categories
7
How do we find
the most relevant
products for a
search query?
www.rakuten.com
Oct 10, 2017
8Query
Rankingfunction
Documents
www.rakuten.com
Nov 2016
1 2 3
4 5 6
7 8 9
9
• Relevance is estimated by lexical matches of query terms with document terms
• Examples:
• Boolean models
• Vector space models
• Latent semantic indexing
• Okapi BM25
Index
Indexer
Query
Documents
Scoring
model
Top-n retrieved
documentsOn-line
Off-line
10
www.rakuten.com
Oct 10, 2017
Query (Q)
Document 1 (D1)
Document 2 (D2)
iphone
7
case
iphone 7 Case
Q 1 1 1
D1 2 2 2
D2 3 1 0
Q
D1D2
11
• Basic ideas
• Lexical similarity metrics
• Penalizing repeated occurrences of the same term
• Penalizing term frequency for longer documents
• Only few features
• Manually hand-tuned feature weights based on heuristic
• Cannot include important search signals such as user’s feedback, product popularity, purchase history, etc.
• Fast and scalable
12
• Data-driven approach
• Directly optimize products rank based on relevance (different from classification and regression ML tasks)
• Handle thousands of features
• Robust to noisy data
• Handle personalization
• Industry & research state-of-the-art (Amazon, eBay, Microsoft, Yahoo!, Yandex, etc.)
13
A document is relevant if contains the information the user was looking for when submitted the query
Relevance is subjective and depends on many factors:• context (what is displayed and how)
• task (purchase, search info, answer, etc.)
• novelty (unexpected data, ads, ext.)
• time and user’s effort involved
14
1
32
www.rakuten.com
Nov 2016
15
buyclick add
www.rakuten.com
Nov 2016
16
• Clickthrough data (user’s implicit feedback) as source of relevance for search query / document pairs
• Pros
• Abundant and easy to harvest
• Always fresh
• Unbiased
• Cons
• Noisy
• Long tail queries
• Simple relevance mapping:
• score = 0 (not relevant), score = 3 (highly relevant)
• Purchase > cart > click > impression
Score User’s implicit feedback
3 Product purchased
2 Product added to the shopping cart
1 Product clicked
0 No clicks
17
Seen products
Potentially
seen products
Unseen
products
Browser
viewport
Click
www.rakuten.com
Aug 2017
18
Documents
Normalized and Discounted Cumulative Gain
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
NDCG
19
• Tree ensemble method
• Handle sparse data
• Handle missing values and various value types
• Robust to outliers
• Learn higher-order feature interactions
• Invariant to feature scaling
• Highly scalable and optimized open source implementation (XGBoost)
20
Point-wise
• Input: single documents / Output: class labels or scores
• Classify each document as relevant or non-relevant.
• Adjust w to reduce classification errors
Pairwise ranking
• Input: document pairs / Output: partial order preferences
• Classify pairs of documents – D1 > D2?
• Adjust w to reduce discordant pairs
List-wise ranking
• Input: document collections / ranked document list
• Score permutations -- Is {D1,D2,…} > {D1’,D2’,…} ?
• Adjust w to directly maximize ranking measure of interest (NDCG)
Di
Q
QDjDi >
QDjDi > Dk>
21
Green = relevant
Gray = not-relevant
Blue arrows = boost for pair-wise loss function
Red arrows = boost for list-wise loss function
(a) is the perfect ranking;
(b) is ranking with 10 pairwise errors;
(c) is ranking with 8 pairwise errors
22
• Relevance: User’s behavior signals
• Ranking Metrics: NDCG
• Machine Learning Algorithm: Gradient Tree Boosting
• Ranking optimization: List-wise with NDCG metrics
23
Index
Indexer
Query
Documents
Scoring
model
Scores
Query
Features
Training
data
Learning
to rank
Re-ranking
model
Top-n ranked
documents (n > 1M)Top-m re-ranked
documents (m < 1k)
On-line
Off-line
Relevance
24www.rakuten.com
Mar 2017
25
Search Query: “40inch tv”
Regular text
search
Search with user’s signals
and learning-to-rank models
Not relevant
Not relevant
Not relevant
26
Conversion Rate(Simulation)
NDCG CTR SimulatedQueries
Relative gain 15.58% 7.50% 10,000
Depth / Estimators
5 / 500 3 / 500 10 / 500 3 / 500
NDCG 0.687 0.688 0.685 0.689
Relative gain 15.14% 15.41% 14.92% 15.58%
Training time (56 cores)
2:45:48 1:20:57 35:25:44 1:58:07
27
Automatic Speech
Recognition
ComputerVision
Natural Language
Processing
Information Retrieval
2011 2013 2013-2015 2017?
28Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
29Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
30Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of
Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).
31
• Traditional IR methods do not scale to modern e-commerce needs
• User’s implicit feedback is a proxy for search query / document pairs
relevance
• Learning-to-rank (LTR) methods scale to thousand of features and are
robust to data noise
• LTR with listwise-based loss function substantially improve search
relevance (15.6% NDCG increase on e-commerce data)
• NDCG improvements directly correlate to conversion rates (7.5% CTR
increase on e-commerce data)
• DNN methods for IR are starting to outperform traditional ML methods