1
Recommendations @ LinkedIn
2
Think PlatformLeverage Hadoop
The world’s largest professional network Over 50% of members are now international
3
*as of Nov 4, 2011**as of June 30, 2011
2004 2005 2006 2007 2008 2009 2010
2 48
17
32
55
90
LinkedIn Members (Millions)
135M+*
75%Fortune 100 Companies use LinkedIn to hire
Company Pages
>2M**
**
New Members joining
~2/sec
Recommendations Opportunity
4
The Recommendations Opportunity
5
Pandora Search for People
Events YouMay BeInterested In
Groups browse maps
6
50%
7
PositionsEducation
Summary
Experience
Skills
Are all titles the same?
- Software Engineer- Technical Yahoo- Member Technical Staff- Software Development Engineer- SDE
Are all companies the same?
‘IBM’ has 8000+ variations- ibm – ireland- ibm research- T J Watson Labs- International Bus. Machines
Recommendation Trade-offsThe need for a common platform
10
Real Time
Time Independent
Recommendation Trade-offsThe need for a common platform
11
Content Analysis
Collaborative
Recommendation Trade-offsThe need for a common platform
12
Recall
Precision
Related TitlesRelated CompaniesRelated Industries
Related TitlesRelated CompaniesRelated Industries
TitleSpecialtyEducationExperienceLocationIndustry
SenioritySkills
TitleSpecialtyEducationExperienceLocationIndustry
SenioritySkills Specialty -> Specialty
Seniority -> Seniority
Skills -> Skills
Title -> Title
Summary -> Summary
Title -> Related Title
Education -> Education
.
.
.
BinaryExact match
Exact match in bucket
Soft Match v1 = tf * idf
CosΘ = v1*v2
|v1|*|v2|
Matching 0.58
0.94
0.26
0.18
0.98
0.16
0.40
Importance
weight vector
(Skills-> Skills)
Similarity
score vector
(Skills-> Skills)
Normalization, Scoring
& RankingFiltering
LocationCompanyIndustry
Fee
db
ack
0.94
0.70
Technologies
16
Hadoop Case Studies
• Scaling • Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting
1717
ScalingBillions of Recommendations
Latency > 1 sec
Latency < 1 sec
Recall = Low
Latency < 1 sec
Recall = High
Minhashing
18
Hadoop Case Studies
• Scaling ✔• Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting
19
Blending Recommendation Algorithms
Co-View Impact Latency ~ Minutes
Complexity = High
Co-View Impact Latency ~ Hours
Complexity = Low
20
Hadoop Case Studies
• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting
21
GrandfatheringAdding and Changing Features
No Time Guarantees
Minimal Disruption
Next Profile Edit
Time ~ Week
Significant Systems Work
Parallel Feature
Extraction Pipeline
Time ~ Hour
Minimal Disruption
Grandfather When Ready
22
Hadoop Case Studies
• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection• A/B Testing• Tracking and Reporting
232323
Model Selection
`
• Features • Models• Parameters
SVM
Logistic
RegressionContent,Collaborative
SVMDecision Trees
L1+L2
Regularization
24
Hadoop Case Studies
• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing• Tracking and Reporting
252525
A/B TestingIs Option A Better Than Option B? Let’s Test
`
10%
90%
New
Model
Old
Model
A
B
Traffic
Send 10% of members who have more than 100 connections AND
who have logged in the past one week, AND who are based in Europe
26
Hadoop Case Studies
• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing ✔• Tracking and Reporting
27
Tracking and ReportingK-way joins across billions of rows
Up to the minute reportingNearsightedness
K-way join complexity
Lacks up to the minute reporting
Simple k-way joins
28
Think PlatformLeverage Hadoop
2929
Come work with us at LinkedIn
Applied Research
Engineer
You