Upload
karen-price
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
1
LCARS: A Location-Content-AwareRecommender System
Hongzhi Yin† , Yizhou Sun‡, Bin Cui†
Zhiting Hu†, Ling Chen†Peking University ‡Northeastern University
University of Technology, Sydney
2
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
3
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
4
Background
■ Location-based Social NetworksFacebook PlacesLoopt Foursquare
Users share photos, comments or check-ins at a location
Expanded rapidly, e.g., Foursquare gets over 3 million check-ins every day
5
Background
6
Background
■ Event-based Social Networks (E.g. Meetup.com)
7
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
8
Problem Definition
■ We aim to mine useful knowledge from the user activity history data in LBSNs and EBSNs to answer two typical questions in our daily life
If we want to visit venues in a city such as Beijing, where should we go?
If we want to attend local events such as dramas and exhibitions in a city, which events should we attend?
For simplicity, we propose the notion of spatial items to denote both venues and events in a unified way, so that we can define our problem as follows: given a querying user with a querying city , find k interesting spatial items within , that match the preference of .
9
Challenge(1/5)
■ Spatial Item Recommendations in LBSN and EBSN
■ Existing Solutions Based on item/user collaborative filtering Similar users gives the similar ratings to similar items Latent Factor models
Visit some spatial items
User activity
histories Build recommendation
models
SimilarUsers
Similar Items
Recommendationuser + querying
city users
So, what is the PROBLEM
here?
Mao Ye, Peifeng Yin, Wang-Chien Lee: “Location recommendation for location-based social networks.” GIS2010Justin J. Levandoski, Mohamed Sarwat, Ahmed Eldawy, and Mohamed F. Mokbel: “LARS: A Location-Aware Recommender System.” ICDE2012
based on the model of co-rating and co-visit Why?
10
V1 V2 V3 … … … Vm-2 Vm-1 Vm
User
U0
…
Ui
Uj
…
Un
Challenge(2/5)
■ User-item rating/visiting matrix
Millions of locations around the world
A user visits ~100 spatial items
Recommendation queries target an area (very specific subset)
New York CityLos Angeles
Noulas, S. Scellato, C Mascolo and M Pontil “An Empirical Study of Geographic User Activity Patterns in Foursquare ” ( ICWSM 2011).
User activity histories are locally clustered
11
Challenge(3/5)
■ User’s activities are very limited in distant locations May NOT get any recommendations in some areas Things can get worse in NEW Areas (small cities and abroad)
(Where you need recommendations the most)
12
Challenge(4/5)
V1 V2 V3
U1 U2 U3 U4
V4 V5 V6
U5 U6 U7 U8
Los Angeles New York City
User activity histories are locally clustered
Gap
New City Problem: When U3 travels to New York City that is new to him since he has no activity history there, how can we recommend spatial items to her? In other words, how to link the users in one side to the items in the other side?
Both User-based and Item-based CF methods would fail in this scenario.
13
Challenge(5/5)
■ Existing Latent factor models also fail to alleviate the new city problem. When we use these existing topic models to analyze user activity history data, spatial items in the discovered topics are clustered by their locations so, the topics describe the user’s spatial area of
activity rather than users' interest related features (e.g, categories and genres of spatial items ) such as concert, film and exhibition.Table 1: Topics discovered by LDA in an event-based social network
14
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
15
Framework of LCARS
16
Our Main Ideas (1/3)
2. Local Preference1. User Personal
Interests/Preferences
Movie
Food
Shopping
RecommenderSystem
For spatial item recommendation, we need to consider (1) the querying user’s interest; (2) the local preference of the querying city, i.e., the local word-of-mouth opinion for a spatial item in the querying city.
17
Our Main Ideas (2/3)
Local Preference in a querying city
User Personal Interests/Preferen
ces
Movie
FoodShopping
Main idea #2:Discover local preference in a specific querying city
Main idea #1: Identify user interest using semantic information from the user activity history
Main idea #3:Combine user interest & local preference for recommendation in a unified way
18
V1 V2 V3
U1 U2 U3 U4
V4 V5 V6
U5 U6 U7 U8
Los Angeles New York City
Content Words of Items Such as tags and category (e.g., movie, shopping, nigh life)
Our Main Ideas (3/3)
The users in one side and the items in the other side can be linked together by the item contents.
19
Offline Modeling LCA-LDA Model
■ Some basic definitions User Profile: For each user in the dataset, we create a user profile , which is a set of triples <v, , >. denotes the location of item v in a region-level (e.g., city). is a content word, such as a tag or a category word, associated with v . Topic: Each topic z in our work has two topic models and The former is a probability distribution over items (item ID) and the latter is a probability distribution over content words. User Interest: The intrinsic interest of user 𝑢 is represented by Local Preference: The local preference in region is represent by , a probability distribution over topics.
20
The Generative Process of LCA-LDA
We use LCA-LDA model to simulate the process of user decision-making for visiting behaviors.
21
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
22
Online Recommendation
■ Once we have inferred model parameters in LCA-LDA model, such as user interest , the local preference , topics and , and mixing weights , in the offline modeling phase, the online recommendation part computes a ranking score for each spatial item v within querying region , and then returns top-k ranked spatial items as the recommendations.
The ranking score of v w.r.t query (u, )
The preference weigh of query (u, ) on topic z
The score of item v on topic z
23
Naïve online algorithm
■ Given a query (u, )
■ Compute the ranking scores for all items within the querying region
■ Find the best one, then the second best one, …, the k-th best one
■ Good for small-scale problem
■ Still not feasible for large-scale, e.g., there are millions of items in the dataset
24
Threshold-based Algorithm
■ For each region , we pre-compute K sorted lists of spatial items. In each list , the items are sorted by their score on topic z, i.e., F(v, , z).
■ Given a query (u, ), we sequentially access the items and compute their ranking scores in each sorted list.
■ For each list , let be the last item examined under sorted access. Define the threshold value as follows:
As soon as at least k items have been examined whose ranking score is equal or large than the threshold value, then halt. Let be a list containing k items that have been examined with
the highest ranking scores. Return to the querying users.
25
Nice Properties of TA
■ The TA algorithm is able to correctly find the top-k items by examining the minimum number of items, since our defined the ranking function is strictly monotone.
■ The threshold value is obtained by aggregating the maximum represented by the last seen item in each list . Consequently, it is the maximum possible ranking score that can be achieved by remaining unexamined items. Hence, if the smallest ranking score of the k examined items is no less than the threshold score, the algorithm can terminate immediately because no remaining item will have a higher ranking score than the found k items.
26
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
27
Experimental Data Sets
■ Data Sets DoubanEvent. DoubanEvent is China’s largest event-based social
networking site where users can publish and participate in social events. This data set consists of 100,000 users, 300,000 events and 3,500,000 check-ins.
Foursquare: This dataset contains 11, 326 users, 182, 968 venues and 1, 385, 223 check-ins.
User and Event Distributions over Cities in DoubanEvent
28
Evaluation Method (1/2)
■ We design two real settings to evaluate the recommendation effectiveness of our LCA-LDA model: Querying cities are new cities to querying users; Querying cities are home cities to querying users;
■ We then divide a user’s activity history into a test set and a training set. We adopt two different dividing strategies with respect to the two settings. For the first setting, we randomly select a visited non-home city as the
new city, mark off all spatial items visited by the user in the city as the test set and use the rest of the user's activity history in other cities as the training set.
For the second setting, we randomly select 20% of spatial items visited by the user in personal home city as the test set, and use the rest of personal activity history as the training set.
29
Evaluation Method (2/2)
■ For each test case (, , ) in the test set Randomly select 1000 additional items located at lv and unrated by
user . Compute the ranking score for the test item as well as the additional
1000 spatial items. Form a ranked list by ordering 1001 items according to the ranking
scores. Let p denote the rank of the item within this list. (The best result: p=0).
Form a top-k recommendation list by picking the k top ranked items from the list. If p<k we have a hit. Otherwise we have a miss.
■ For any single test case recall for a single test can assume either 0 (miss) or 1(hit) The overall recall is defined by averaging over all test cases
30
Baseline Methods
■ USG: A unified location recommendation framework which linearly fuses User interest along with Social influence and Geographical influence.
■ User-based CF methods CKNN: A Category-based k-Nearest Neighbors algorithm. CKNN projects
a user's activity history into the category space and models user preference using a weighted category hierarchy. The similarity between two users in CKNN is computed according to their weights in the category hierarchy
IKNN: A Item-based k-Nearest Neighbors algorithm. The similarity between two users is computed by the Cosine similarity between two users' item vectors.
■ LDA: A user is viewed as a document, and the items visited by her is viewed as words in the document.
■ Location-Aware LDA (LA-LDA):One component of LCA-LDA
■ Content-Aware LDA(CA-LDA):Another component of LCA-LDA
31
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
32
Experimental Results
■ Recommendation Effectiveness
33
■ Recommendation Effectiveness
Experimental Results
34
■ Efficiency of online recommendation, querying cities are Beijing and Shanghai
Experimental Results
35
Experimental Results
■ In order to clearly see the performance of LCARS, we zoom the results as follows.
36
Latent Information Analysis
37
Outline
■ Introduction Background Challenges
■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm
■ Experiments Experimental Setup Experimental Results
■ Conclusions
38
Conclusion
■ Spatial item Recommendations Data sparsity is a big challenge in recommendation systems New city problem amplify the data sparsity challenge Mobile scenario requires the recommender system to generate
real-time response to the user query.
■ Our Solution - LCARS Exploit the Local Preference of the querying city to alleviate the
data sparsity. Local word-of-mouth is a valuable resource for making a recommendation.
Take advantage of Content Information of items to overcome the sparsity. The contents build a bridge between users and items from disjoint regions.
Extend the Threshold-based algorithm (TA) to produce fast online recommendations
■ Result LCARS can produce more effective and more efficient
39
Thanks
Q&A