40
Machine Learning at Quora Nikhil Dandekar (@nikhilbd) 2/26/201 6

Machine Learning at Quora (2/26/2016)

Embed Size (px)

Citation preview

Page 1: Machine Learning at Quora (2/26/2016)

Machine Learning at Quora

Nikhil Dandekar (@nikhilbd)

2/26/2016

Page 2: Machine Learning at Quora (2/26/2016)

Our Mission

“To share and grow the world’s knowledge”

● Millions of questions & answers

● Millions of users● Over a million topics● ...

Page 3: Machine Learning at Quora (2/26/2016)

Demand

What we care about

Quality

Relevance

Page 4: Machine Learning at Quora (2/26/2016)

● The core data● Feed ranking● Other Machine Learning● Data Science @Quora● Personalization

Agenda

Page 5: Machine Learning at Quora (2/26/2016)

The Core Data

Page 6: Machine Learning at Quora (2/26/2016)

Lots of data relations

Page 7: Machine Learning at Quora (2/26/2016)

Complex network propagation effects

Page 8: Machine Learning at Quora (2/26/2016)

Importance of topics & semantics

Page 9: Machine Learning at Quora (2/26/2016)

Feed ranking@Quora

Page 10: Machine Learning at Quora (2/26/2016)

Ranking - Feed• Goal: Present most interesting

stories for a user at a given time• Interesting = topical relevance +

social relevance + timeliness• Stories = questions + answers

• Relevance-ordered vs time-ordered = big gains in engagement

• Challenges:• potentially many candidate stories• real-time ranking• optimize for relevance

• Use Machine Learning for feed ranking

Page 11: Machine Learning at Quora (2/26/2016)

Feed dataset: impression logs

click

upvote

downvote

expand

share

click

answer pass

downvote

follow

Page 12: Machine Learning at Quora (2/26/2016)

● Value of showing a story to a user, e.g. weighted sum of actions: v = ∑a va 1{ya = 1}

● Goal: predict this value for new stories. 2 possible approaches:○ predict value directly

v_pred = f(x)■ pros: single regression model■ cons: can be ambiguous, coupled

○ predict probabilities for each action, then compute expected value:v_pred = E[ V | x ] = ∑a va p(a | x)

■ pros: better use of supervised signal, decouples action models from action values

■ cons: more costly, one classifier per action

What is relevance?

Page 13: Machine Learning at Quora (2/26/2016)

● Essential for getting good ranking● Better if updated in real-time (more reactive)● Main sets of features:

○ user (e.g. age, country, recent activity)○ story (e.g. popularity, trendiness, quality)○ interactions between the two (e.g. topic or author affinity)

Feature engineering

Page 14: Machine Learning at Quora (2/26/2016)

● Linear○ simple, fast to train○ manual, non-linear transforms for

richer representation (buckets, ngrams)

● Decision trees○ learn non-linear representations

● Tree ensembles○ Random forests○ Gradient boosted decision trees

● In-house C++ training code, third-party libraries for prototyping new models

Models

Page 15: Machine Learning at Quora (2/26/2016)

Scalability: feed backend system

Aggregator 1

Aggregator 2

Aggregator 3

Leaf 1 Leaf 2 Leaf 3

Aggregator

Leaf

Requests from Web (python)

...

...

...

user_id

object_id

Page 16: Machine Learning at Quora (2/26/2016)

Machine Learning@Quora

Page 17: Machine Learning at Quora (2/26/2016)

Ranking - Answer rankingWhat is a good Quora answer?

• truthful• reusable• provides explanation• well formatted• ...

Page 18: Machine Learning at Quora (2/26/2016)

Ranking - Answer rankingHow are those dimensions translated into features?

• Features that relate to the text quality itself

• Interaction features (upvotes/downvotes, clicks, comments…)

• User features (e.g. expertise in topic)

Page 19: Machine Learning at Quora (2/26/2016)

How we think of search

Page 20: Machine Learning at Quora (2/26/2016)

Ranking - Search ranking

● Match user queries to Quora entities

● Corpus: Quora questions, answers, topics, users, blogs etc.

● Ranking: Traditional IR scores (e.g. BM25), hand-tuned or ML-ranking

● Focus on long-term satisfaction○ If a question exists, but the

answer is unsatisfactory, let the user “Re-Ask” the question

Page 21: Machine Learning at Quora (2/26/2016)

Question Asking

Goal: Find the best people to answer a question● Understand the question● Find people who can best answer

the question● “Ask to Answer”: Route the

question to these people●Either manual or automated A2A

Page 22: Machine Learning at Quora (2/26/2016)

Recommendations - Topics

Goal: Recommend new topics for the user to follow• Based on

• Other topics followed• Users followed• User interactions• Topic-related features• ...

Page 23: Machine Learning at Quora (2/26/2016)

Recommendations - Users

Goal: Recommend new users to follow• Based on:

• Other users followed• Topics followed• User interactions• User-related features• ...

Page 24: Machine Learning at Quora (2/26/2016)

Related Questions

• Given interest in question A (source) what other questions will be interesting?

• Not only about similarity, but also “interestingness”

• Features such as:• Textual• Co-visit• Topics• …

• Important for logged-out use case

Page 25: Machine Learning at Quora (2/26/2016)

Duplicate Questions• Important issue for Quora

• Want to make sure we don’t disperse knowledge to the same question

• Solution: binary classifier trained with labelled data

• Features• Textual vector space models• Usage-based features• ...

Page 26: Machine Learning at Quora (2/26/2016)

User Trust/Expertise InferenceGoal: Infer user’s trustworthiness in relation to a given topic• We take into account:

• Answers written on topic• Upvotes/downvotes received• Endorsements• ...

• Trust/expertise propagates through the network

• Must be taken into account by other algorithms

Page 27: Machine Learning at Quora (2/26/2016)

Spam Detection/Moderation• Very important for Quora to keep quality of

content• Pure manual approaches do not scale• Hard to get algorithms 100% right• ML algorithms detect content/user issues

• Output of the algorithms feed manually curated moderation queues

Page 28: Machine Learning at Quora (2/26/2016)

Trending TopicsGoal: Highlight current events that are interesting for the user• We take into account:

• Global “Trendiness”• Social “Trendiness”• User’s interest• ...

• Trending topics are a great discovery mechanism

Page 29: Machine Learning at Quora (2/26/2016)

Models

Page 30: Machine Learning at Quora (2/26/2016)

Models● Logistic Regression● Elastic Nets● Gradient Boosted Decision

Trees● Random Forests● (Deep) Neural Networks● LambdaMART● Matrix Factorization● LDA● ...

Page 31: Machine Learning at Quora (2/26/2016)

Data Science @Quora

Page 32: Machine Learning at Quora (2/26/2016)

Data Science at Quora

Page 33: Machine Learning at Quora (2/26/2016)

● Both ML engineers and data scientists are involved in machine learning

● ML engineers build, implement, and maintain production machine learning systems.

● Data scientists conduct research to generate ideas about machine learning projects, and perform analysis to understand the metrics impact of machine learning systems.

Data Science at Quora

Page 34: Machine Learning at Quora (2/26/2016)

Extensive A/B testing, data-driven decision-makingSeparate, orthogonal “layers” for different parts of the

systemExperiment framework showing comparisons for

various metrics

Experimentation

Page 35: Machine Learning at Quora (2/26/2016)

Personalization

Page 36: Machine Learning at Quora (2/26/2016)

Importance of Personalization

The importance of personalization is inversely proportional to how specific the user intent is.

The importance of personalization is directly proportional to the number of “right answers”.

Page 37: Machine Learning at Quora (2/26/2016)

Importance of Personalization

Page 38: Machine Learning at Quora (2/26/2016)

Other contexts

● At a high-level personalization is adding a “user” context to relevance tasks

● Other contexts:○ Location○ Time○ etc.

● Previous learnings generalize to these other contexts

Page 39: Machine Learning at Quora (2/26/2016)

The Search-Recommendation-Notification Spectrum

Page 40: Machine Learning at Quora (2/26/2016)

Questions?