Upload
fabian-abel
View
305
Download
0
Embed Size (px)
Citation preview
What's wrong with you,
Recruiter-John? A non-
trivial recommender
challenge.Budapest, June 2016
@fabianabel
http://recsyschallenge.com
Challenge
Given a user, the goal is to recommend job postings…
1. that the user may be interested in and
2. for which the user is an appropriate candidate.
2
Scala Dev(m/w)
ScalaEngineer
Scala Dev, Hamburg
user
job postings
Job
recommende
r
companies
Job recommendations
Job recommendations
5
Title
Company
Employment type
and career level
Full-text
description
Key properties of a job posting
6
Key sources for understanding user demands
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop skills & co.
Interactions
data
web
social media
clicks, shares,
ratings
big data
kununu
Interactions of
similar users
similar usershadoop
scala
7
Relevance Estimation
Social Network
explicit and
implicit
connections
Profile
Fabian Abel
Data Scientist
Haves:
Interests:
web science
big data, hadoop skills & co.
Interactions
data
web
social media
clicks, shares,
ratings
big data
kununu
Interactions of
similar users
similar usershadoop
scala
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Relevance
Estimation(regression model)
Logistic Regression
P(relevant | x) = 1
1 + e -(b0 + bi xi)i
n
feature vector impact of feature xi
8
Relevance Estimation + Additional Filters
Content-
based
features
Collaborative
features
Social
features
Usage
behavior
features
Relevance
Estimation(regression model)
Location-
based
filtering
Content-
based
diversification
Monetary-
based
diversification
Career Level
filtering
Filtering &
Diversification
0.92 0.8 0.76
…
ChallengesIssues that we have to fight with…
9
What John writes…
10
And what he means…
Recruiter-John
International Sales Manager Call Center Agent(10 EUR per hour)
Sales Manager Sales Manager for B2B
customers(80K EUR per year)
Data Scientist skilled in Hadoop,
Scala, Elasticsearch, … with PhD in …
Data Analyst(skilled in SAS or Excel)
What Paul says he is…
11
And what he means…
Paul, the Candidate
CEO Network Engineer(currently unemployed)
BI Engineer(skilled in old-school ETL)
Shopman(in a kiosk)
Data Scientist with 100+ skills
Sales Manager
12
Understanding the meaning of things that people write
in job postings and in their profiles is not trivial…
13
Profiles vs. People’s wishes for
their future
past
past
Profile describes a
user‘s past/current
position(s), not future
wishes
14
Career path patterns: locationsHow far away are the jobs that the users bookmark?
0-50 km
35%
51-200 km
22%
>200 km43%
15
Career path patterns: locationsClimbing up the ladder?
junior
junior
senior
manger
senior manger
Today
Next
ste
p
53%
senior
72%
manger
54%
senior manger
52%
CTR of job recommender over time
16
2014 2015 2016
CTR
New year,
new job
bad CTR
No Love for
job RecSys
Intensified
love for job
RecSys
B/A test: are the changes
in the job RecSys really
responsible for increased
CTRs?
Increase of job inventory
(from 100k to ca. 750k)
Using algorithms
from RecSys
Challenge
- Feedback App
- LSI for MoreLikeThis component
- Entity resolution
- Explanations
- …
Running out of ideas :-)
recsyschallenge.com
RecSys Challengehttp://recsyschallenge.com
17
Challenge
Given a user, the goal is to recommend job postings…
1. that the user may be interested in and
2. for which the user is an appropriate candidate.
18
Scala Dev
(m/w)
Scala
Engineer
Scala Dev,
Hamburg
user
job postings
Job
recommende
r
companies
RecSys Challenge
Given a user, the goal is to predict those job postings that the
user will interact with.
19
Scala
(m/w)
?Scala Dev,
Hamburg
job postings
Scala
Engineer
2 months of impressions & interactions
click
bookmark
Datasets
1. Training data:
• User demographics (jobtitle, discipline, industry, career level, # CV entries,
country, region) [1M]
• Job postings (title, discipline, industry, career level, country region) [1M]
• Interactions (user_id, item_id, interaction_type, timestamp) [10M, 2 months]
• Impressions (user_id, item_id, week) [30M, 2 months]
2. Task files:
• Users (= User IDs for whom recommendations should be computed) [150k]
• Candidate items (= item IDs that are allowed to be recommended) [300k]
3. Solution (secret)
• Interactions (user_id, item_id) [1M, 1 week]
Anonymization (Strings IDs; users and interactions are enriched with
artitificial noise) 20
http://recsyschallenge.com
Evaluation Measure
Mixture of…
- Precision@k (k = 2, 4, 6, 20)= fraction of relevant items in the top k
- Recall@30 = fraction of relevant
items in the top k
- Success@30 = probability that at
least one relevant item was
recommended in the top 30
21
http://recsyschallenge.com
Current Status
22
http://recsyschallenge.com
Current Status (ordered by rank)
23
http://recsyschallenge.com
Join the challenge!
• Deadline for submissions: June 26th 2016
Current leaders: >600k points (ca. 20% of max. possible points)
Prizes: 1st = 3,000 EUR; 2nd = 1,500 EUR, 3rd = 500 EUR
• Workshop at RecSys 2016 in Boston: Sep 15th
• RecSys Challenge 2017:
Dream = online evaluation
24
http://recsyschallenge.com
Thank you @fabianabel
http://recsyschallenge.com
www.xing.com