45
Lessons Learned from Building real-life Recsys Xavier Amatriain (Quora) Deepak Agarwal (LinkedIn)

Recsys2016 Tutorial by Xavier and Deepak

Embed Size (px)

Citation preview

Page 1: Recsys2016 Tutorial by Xavier and Deepak

Lessons Learned from Building real-life Recsys

Xavier Amatriain (Quora) Deepak Agarwal (LinkedIn)

Page 2: Recsys2016 Tutorial by Xavier and Deepak

What is a recommender system ?

A recommender system recommends items to users to optimize a utility composed of one or more objectives Almost every website is powered by a recommender system

Page 3: Recsys2016 Tutorial by Xavier and Deepak

Web Recommender Problem

User i with user features xi

(demographics, browse history, geo-location, search history, topics of questions answered, Topics interested in, …)

visits

item j with item features xj

(keywords, content categories, author, ...)

Algorithm selects

(i, j) : response yij

Interaction (Click, share, like, answer, ask, follow,..)/no-interaction

Which item should we select? •  The one with highest predicted utility •  The one most useful for improving the utility prediction model

Exploit Explore

Page 4: Recsys2016 Tutorial by Xavier and Deepak

Today • We are going to talk about recommender systems at

Page 5: Recsys2016 Tutorial by Xavier and Deepak

Agenda • Recommender Systems at LinkedIn (Deepak)

• Context & Overview • End-to-end of recommender systems in practice:

• Examples --- Jobs Recommendation, LinkedIn Feed • Lessons Learned

• Recommender Systems at Quora (Xavier) • Context & Overview • Lessons Learned

•  Conclusion (Xavier)

Page 6: Recsys2016 Tutorial by Xavier and Deepak

6

Our vision Create economic opportunity for every member of the global workforce

Our mission Connect the world’s professionals to make them more productive and successful

Our core value Members first!

Page 7: Recsys2016 Tutorial by Xavier and Deepak

Companies Jobs Skills People Schools Knowledge

Page 8: Recsys2016 Tutorial by Xavier and Deepak

Actors and value propositions

Page 9: Recsys2016 Tutorial by Xavier and Deepak

Value proposition for Users (Members)

CONNECT

with your professional world

STAY INFORMED

through professional news and knowledge

GET HIRED

and build your career

Page 10: Recsys2016 Tutorial by Xavier and Deepak

Value proposition for Customers

HIRE MARKET SELL @WORK

Page 11: Recsys2016 Tutorial by Xavier and Deepak

Several Recommendation Problems

• Member experience •  LinkedIn Feed •  PYMK (People you may Know) • Job recommendation • …..

Page 12: Recsys2016 Tutorial by Xavier and Deepak

Recommendation Problems continued ….

• Customer experience • Recruiter (source candidates for recruiters) • Sales Solution (close deals with companies) • Linkedin Learning (course recommendation) • Recommend user segments in advertising

Page 13: Recsys2016 Tutorial by Xavier and Deepak

Recommendations: Delivery Mechanisms

• Pull Model: Serve most relevant when the user visits • Desktop, mobile web, mobile app, tablet,..

• Push Model: Get in touch with user to deliver recommendations {Email, Notifications}

• Higher relevance bar (do not spam and inundate the users) • Right message, right user, right time, right frequency, right channel

Done through ML and optimization

Page 14: Recsys2016 Tutorial by Xavier and Deepak

MATCH-MAKING: Know your items, your users and their interactions

Page 15: Recsys2016 Tutorial by Xavier and Deepak

User Characteristics Profile Information

Title, seniority, skills, education, endorsements, presentations,… Behavioral

Activities, search,.. Edge features (ego-centric network)

Connection strength, content affinities,..

 Professional profile of record

Page 16: Recsys2016 Tutorial by Xavier and Deepak

Item Features Articles

author, sharer, keywords, named entities, topics, category, likes, comments, latent representation, etc.

Jobs

company, title,skills, keywords, geo, …

.......

Page 17: Recsys2016 Tutorial by Xavier and Deepak

User Intent

• Why are you here ? • Hire, get hired, stay informed, grow network, nurture connections, sell, market,..

• Explicit (e.g., visiting jobs homepage, search query), • Implicit (needs to be inferred, e.g., based on activities)

Page 18: Recsys2016 Tutorial by Xavier and Deepak

How to Scale Recommendations?

•  Formulate objectives to optimize

• Optimize via ML models • incorporate both implicit and explicit signals about user and items

• Automate

Page 19: Recsys2016 Tutorial by Xavier and Deepak

Connecting long-term objectives to proxies that can be optimized by machines/algorithms

Long-term objectives (return visits to site, connections,

quality job applies,,..)

Short-term proxies (CTR, connection prob, apply prob, …)

Large scale optimization via ML, UI changes,..

Experiment Learn Innovate

Page 20: Recsys2016 Tutorial by Xavier and Deepak

Automation Optimize proxies with short feedback loop via Machine Learning !!

Whom?

User!Profile,!User!Intent!Item!Filtering,!Understanding!

Context What?

Interaction Data

INPUT SIGNALS

MACHINE LEARNING RANK%Items%Sort!by!Score!Mul:;objec:ve!Business!rule!

SCORE%Items%P(Click),!P(Share)!Similarity,…!

Page 21: Recsys2016 Tutorial by Xavier and Deepak

Under the Hood of a Typical Recommender System at LinkedIn

21

Page 22: Recsys2016 Tutorial by Xavier and Deepak

Example Application: Job Recommendation

Page 23: Recsys2016 Tutorial by Xavier and Deepak

Objective: Job Applications

Predict the probability that a user i would apply for a job j

given … • User features

•  Profile: Industry, skills, job positions, companies, education • Network: Connection patterns

•  Item (i.e., job) features •  Source: Poster, company, location • Content: Keywords, title, skills

• Data about users’ past interactions with diff types of items •  Items: Jobs, articles, updates, courses, comments •  Interactions: Apply, click, like, share, connect, follow

Page 24: Recsys2016 Tutorial by Xavier and Deepak

System Architecture

Front End Service

Ranking Service

Item Index

User Feature Stores User DB

Item DB

Offline Data Pipelines

Item Feature Pipelines

User Feature Pipelines Data Stream Processing

User Activity Data Streams

Live Index Updater

ETL ETL Online Offline

Model Training Pipelines

Offline Index Builder

User

Photon-ML

Apache

Hadoop, Pig, Scalding, Spark, … Search Index

Experimentation platform

Ranking Library

Page 25: Recsys2016 Tutorial by Xavier and Deepak

Feature Generation •  Types: User features, item features, activity features •  Processing methods: Streaming, offline

Streaming example: Skills required by a job new job j

Job DB Live Index Updater Item

Index

Kafka Skill Extraction Pipeline

Skill Extraction Pipeline

Skill extractor - ML model - Predict p(job j requires skill s) based on job description, … - Skills are standardized

Distributed data/event delivery and queueing system

Metadata

Data ETLed to Hadoop

Page 26: Recsys2016 Tutorial by Xavier and Deepak

Model Training Raw User Features Raw Item Features

DAG of Transformers

DAG of Transformers

DAG of Transformers

Feature Vector of User i xi

Matching Feature Vector mij

Feature Vector of Item j zj

(trees, similarities)

Parameter vector for each user i

Parameter vector for each item j

p(i applies for j) = f( xi, zj, mij | θ, αi, βj )

Feature Processing

Parameter Learning

Global parameter vector

Page 27: Recsys2016 Tutorial by Xavier and Deepak

Model Deployment

User Feature Stores

Live Index Updater Item

Index

Parameter vector for each user i

Parameter vector for each item j

p(i applies for j) = f( xi, zj, mij | θ, αi, βj ) Global parameter vector

Page 28: Recsys2016 Tutorial by Xavier and Deepak

Online Ranking

User Feature Stores User DB

User Feature Pipelines Data Stream Processing

User Activity Data Stream

Ranking Service

Item Index

Offline Data Pipelines

ETL ETL Online Offline

Model Training Pipelines

Offline Index Builder

Front End Service

User

User Features & Parameters

Item DB Live Index Updater

Item Feature Pipelines

Page 29: Recsys2016 Tutorial by Xavier and Deepak

Online A/B Experiments

Experiment setting - Select a user segment - Allocate traffic to different models

Result reporting - Report experimental results - Impact on a large number of metrics

Page 30: Recsys2016 Tutorial by Xavier and Deepak

LinkedIn Feed

Page 31: Recsys2016 Tutorial by Xavier and Deepak

TheFeed:

31

Page 32: Recsys2016 Tutorial by Xavier and Deepak

•  Deliver on the Value Propositions: •  Stay connected with your Network (your network is your identity!) •  Ability to build your professional reputation •  Stay informed with relevant professional knowledge •  Discover opportunities •  Generate revenue (directly or indirectly)

Function of the Feed

Page 33: Recsys2016 Tutorial by Xavier and Deepak

•  Heterogeneity of Types. . •  Organic Content

•  Articles by Influencers, Articles by Network, Shares by network, Content by topic (follows), Jobs, PYMK,group discussions, etc

•  Sponsored •  Sponsored updates, Jobs ads, ..

Challenges of the Feed

Page 34: Recsys2016 Tutorial by Xavier and Deepak

TheFeed:Notalltypesareequal

34

Action rates per type (Normalized)

Page 35: Recsys2016 Tutorial by Xavier and Deepak

Impression Discounting

• Reduce the chance of showing the same item to the same user repeatedly

• Decay the score of an item based on #times that the user saw the item before

• Using real-time feedback

• Discounting by user segments and item types

Global (over all types)

Impression discounting curves of a few item types

Page 36: Recsys2016 Tutorial by Xavier and Deepak

Diversification • Users’ experience deteriorates when exposed to the same kind of items multiple

times on the same page

• Decay relevance scores of repeat items from the same actor and of the same type

Discounting actor repetitions

Group Discussion CTR Drop 2 adjacent discussions 21% 3 adjacent discussions 48%

Page 37: Recsys2016 Tutorial by Xavier and Deepak

How to Combine Different Objectives

•  Thefeedsystemservesupdatesbasedonrelevancescores•  Adjusttheservingstrategytoop?mizerevenuewhileenforcing

engagement(e.g.CTR)constraints

Foruserx,itemiRankby:eCPI(i|x)+SB*pCTR(i|x)

maximizerevenuesuchthatengagement>=engagementtarget

–  eCPI:es?matedrevenueforagivenupdate–  Fororganicupdates,eCPI=0–  SB:shadowbid(intrinsicvalua?onoforganicclickstoLinkedIn)

Page 38: Recsys2016 Tutorial by Xavier and Deepak

TradeoffsPointsandEfficientFron?er

Revenue gain (relative)

Engagement gain (relative) 0

Conservative (high SB)

Aggressive (low SB)

Original System (no Optim)

- +

Better efficient frontier More aggressive (very low SB)

Page 39: Recsys2016 Tutorial by Xavier and Deepak

Encouraging Viral loops: Some heuristics •  Value of share, comment, like > Value of click • Rank by using linear combination of CTR and Viral Action Rates

• Lose CTR but gain more viral actions (shares, likes, comments) • Increasing viral actions increases unique user visits & feed sessions

• Viral action triggers notification to actors in many cases (e.g., like/comment on a post written by your connection)

• Encourage users to share/comment/like more • Boost article scores by users who share good stuff and who don’t share very often

• May lose some CTR in short-term but increase cohort that shares on LinkedIn

Page 40: Recsys2016 Tutorial by Xavier and Deepak

Update Type 1 … Update

Type N Each type scores and orders its potential

updates

TheFeed:AthreestageRanker

Mulitiple Objective

The third stage adjusts for diversity, impression discounting, balance of objectives: engagement &

revenue

Blending Results The second stage rank orders every update using ML model

Page 41: Recsys2016 Tutorial by Xavier and Deepak

LESSONS LEARNT

Page 42: Recsys2016 Tutorial by Xavier and Deepak

1. Cost of a Bad Recommendation

• How ML works where a few bad recommendations can hurt brand ? • Maximize precision without hurting performance metrics significantly

• Collect negative feedback from users, crowd; incorporate within algorithms • Create better product focus, filter unnecessary content from inventory

• E.g., unprofessional content on Feed

• Better insights/explanations associated with recommendations help build trust

Page 43: Recsys2016 Tutorial by Xavier and Deepak

2. Data Tracking • Proper data tracking and monitoring is not always easy!

• Data literacy and understanding across organization (front-end, UI, SRE) • Proper tooling, continuous monitoring very important to scale the process

• Our philosophy: Loose coupling between FE and BE teams! • FE (client) emits limited events along with trackingid • BE emits more details and joins against trackingid

• Tracking events can have significant impact • View-port tracking (what the user actually saw) for more informative negatives

Page 44: Recsys2016 Tutorial by Xavier and Deepak

3. Content Inventory • High quality and comprehensive content inventory as important as

recommendation algorithms

• Examples: Learning, Jobs, Feed

• Supply and demand analysis, gap analysis, proactively producing more high quality content for inventory

Page 45: Recsys2016 Tutorial by Xavier and Deepak

4. A/B Testing with Network Interference • Random treatment assignments (spillover effects, need to adjust)

• Treatment recommendations affect control group as well

• A like/share in treatment may create a new item when ranking in control

45