1 LCARS: A Location-Content-Aware Recommender System Hongzhi Yin †, Yizhou Sun ‡, Bin Cui † Zhiting Hu †, Ling Chen † Peking University ‡ Northeastern

1

LCARS: A Location-Content-AwareRecommender System

Hongzhi Yin† , Yizhou Sun‡, Bin Cui†

Zhiting Hu†, Ling Chen†Peking University ‡Northeastern University

University of Technology, Sydney

2

Outline

■ Introduction Background Challenges

■ Our Solution – LCARS Offline Modeling - LCA-LDA Online Recommendation – TA algorithm

■ Experiments Experimental Setup Experimental Results

■ Conclusions

3

Outline




■ Conclusions

4

Background

■ Location-based Social NetworksFacebook PlacesLoopt Foursquare

Users share photos, comments or check-ins at a location

Expanded rapidly, e.g., Foursquare gets over 3 million check-ins every day

5

Background

6

Background

■ Event-based Social Networks (E.g. Meetup.com)

7

Outline




■ Conclusions

8

Problem Definition

■ We aim to mine useful knowledge from the user activity history data in LBSNs and EBSNs to answer two typical questions in our daily life

If we want to visit venues in a city such as Beijing, where should we go?

If we want to attend local events such as dramas and exhibitions in a city, which events should we attend?

For simplicity, we propose the notion of spatial items to denote both venues and events in a unified way, so that we can define our problem as follows: given a querying user with a querying city , find k interesting spatial items within , that match the preference of .

9

Challenge(1/5)

■ Spatial Item Recommendations in LBSN and EBSN

■ Existing Solutions Based on item/user collaborative filtering Similar users gives the similar ratings to similar items Latent Factor models

Visit some spatial items

User activity

histories Build recommendation

models

SimilarUsers

Similar Items

Recommendationuser + querying

city users

So, what is the PROBLEM

here?

Mao Ye, Peifeng Yin, Wang-Chien Lee: “Location recommendation for location-based social networks.” GIS2010Justin J. Levandoski, Mohamed Sarwat, Ahmed Eldawy, and Mohamed F. Mokbel: “LARS: A Location-Aware Recommender System.” ICDE2012

based on the model of co-rating and co-visit Why?

10

V1 V2 V3 … … … Vm-2 Vm-1 Vm

User

U0

…

Ui

Uj

…

Un

Challenge(2/5)

■ User-item rating/visiting matrix

Millions of locations around the world

A user visits ~100 spatial items

Recommendation queries target an area (very specific subset)

New York CityLos Angeles

Noulas, S. Scellato, C Mascolo and M Pontil “An Empirical Study of Geographic User Activity Patterns in Foursquare ” ( ICWSM 2011).

User activity histories are locally clustered

11

Challenge(3/5)

■ User’s activities are very limited in distant locations May NOT get any recommendations in some areas Things can get worse in NEW Areas (small cities and abroad)

(Where you need recommendations the most)

12

Challenge(4/5)

V1 V2 V3

U1 U2 U3 U4

V4 V5 V6

U5 U6 U7 U8

Los Angeles New York City

User activity histories are locally clustered

Gap

New City Problem: When U3 travels to New York City that is new to him since he has no activity history there, how can we recommend spatial items to her? In other words, how to link the users in one side to the items in the other side?

Both User-based and Item-based CF methods would fail in this scenario.

13

Challenge(5/5)

■ Existing Latent factor models also fail to alleviate the new city problem. When we use these existing topic models to analyze user activity history data, spatial items in the discovered topics are clustered by their locations so, the topics describe the user’s spatial area of

activity rather than users' interest related features (e.g, categories and genres of spatial items ) such as concert, film and exhibition.Table 1: Topics discovered by LDA in an event-based social network

14

Outline




■ Conclusions

15

Framework of LCARS

16

Our Main Ideas (1/3)

2. Local Preference1. User Personal

Interests/Preferences

Movie

Food

Shopping

RecommenderSystem

For spatial item recommendation, we need to consider (1) the querying user’s interest; (2) the local preference of the querying city, i.e., the local word-of-mouth opinion for a spatial item in the querying city.

17


Local Preference in a querying city

User Personal Interests/Preferen

ces

Movie

FoodShopping

Main idea #2:Discover local preference in a specific querying city

Main idea #1: Identify user interest using semantic information from the user activity history

Main idea #3:Combine user interest & local preference for recommendation in a unified way

18

V1 V2 V3

U1 U2 U3 U4

V4 V5 V6

U5 U6 U7 U8

Los Angeles New York City

Content Words of Items Such as tags and category (e.g., movie, shopping, nigh life)


The users in one side and the items in the other side can be linked together by the item contents.

19

Offline Modeling LCA-LDA Model

■ Some basic definitions User Profile: For each user in the dataset, we create a user profile , which is a set of triples <v, , >. denotes the location of item v in a region-level (e.g., city). is a content word, such as a tag or a category word, associated with v . Topic: Each topic z in our work has two topic models and The former is a probability distribution over items (item ID) and the latter is a probability distribution over content words. User Interest: The intrinsic interest of user 𝑢 is represented by Local Preference: The local preference in region is represent by , a probability distribution over topics.

20

The Generative Process of LCA-LDA

We use LCA-LDA model to simulate the process of user decision-making for visiting behaviors.

21

Outline




■ Conclusions

22

Online Recommendation

■ Once we have inferred model parameters in LCA-LDA model, such as user interest , the local preference , topics and , and mixing weights , in the offline modeling phase, the online recommendation part computes a ranking score for each spatial item v within querying region , and then returns top-k ranked spatial items as the recommendations.

The ranking score of v w.r.t query (u, )

The preference weigh of query (u, ) on topic z

The score of item v on topic z

23

Naïve online algorithm

■ Given a query (u, )

■ Compute the ranking scores for all items within the querying region

■ Find the best one, then the second best one, …, the k-th best one

■ Good for small-scale problem

■ Still not feasible for large-scale, e.g., there are millions of items in the dataset

24

Threshold-based Algorithm

■ For each region , we pre-compute K sorted lists of spatial items. In each list , the items are sorted by their score on topic z, i.e., F(v, , z).

■ Given a query (u, ), we sequentially access the items and compute their ranking scores in each sorted list.

■ For each list , let be the last item examined under sorted access. Define the threshold value as follows:

As soon as at least k items have been examined whose ranking score is equal or large than the threshold value, then halt. Let be a list containing k items that have been examined with

the highest ranking scores. Return to the querying users.

25

Nice Properties of TA

■ The TA algorithm is able to correctly find the top-k items by examining the minimum number of items, since our defined the ranking function is strictly monotone.

■ The threshold value is obtained by aggregating the maximum represented by the last seen item in each list . Consequently, it is the maximum possible ranking score that can be achieved by remaining unexamined items. Hence, if the smallest ranking score of the k examined items is no less than the threshold score, the algorithm can terminate immediately because no remaining item will have a higher ranking score than the found k items.

26

Outline




■ Conclusions

27

Experimental Data Sets

■ Data Sets DoubanEvent. DoubanEvent is China’s largest event-based social

networking site where users can publish and participate in social events. This data set consists of 100,000 users, 300,000 events and 3,500,000 check-ins.

Foursquare: This dataset contains 11, 326 users, 182, 968 venues and 1, 385, 223 check-ins.

User and Event Distributions over Cities in DoubanEvent

28

Evaluation Method (1/2)

■ We design two real settings to evaluate the recommendation effectiveness of our LCA-LDA model: Querying cities are new cities to querying users; Querying cities are home cities to querying users;

■ We then divide a user’s activity history into a test set and a training set. We adopt two different dividing strategies with respect to the two settings. For the first setting, we randomly select a visited non-home city as the

new city, mark off all spatial items visited by the user in the city as the test set and use the rest of the user's activity history in other cities as the training set.

For the second setting, we randomly select 20% of spatial items visited by the user in personal home city as the test set, and use the rest of personal activity history as the training set.

29

Evaluation Method (2/2)

■ For each test case (, , ) in the test set Randomly select 1000 additional items located at lv and unrated by

user . Compute the ranking score for the test item as well as the additional

1000 spatial items. Form a ranked list by ordering 1001 items according to the ranking

scores. Let p denote the rank of the item within this list. (The best result: p=0).

Form a top-k recommendation list by picking the k top ranked items from the list. If p<k we have a hit. Otherwise we have a miss.

■ For any single test case recall for a single test can assume either 0 (miss) or 1(hit) The overall recall is defined by averaging over all test cases

30

Baseline Methods

■ USG: A unified location recommendation framework which linearly fuses User interest along with Social influence and Geographical influence.

■ User-based CF methods CKNN: A Category-based k-Nearest Neighbors algorithm. CKNN projects

a user's activity history into the category space and models user preference using a weighted category hierarchy. The similarity between two users in CKNN is computed according to their weights in the category hierarchy

IKNN: A Item-based k-Nearest Neighbors algorithm. The similarity between two users is computed by the Cosine similarity between two users' item vectors.

■ LDA: A user is viewed as a document, and the items visited by her is viewed as words in the document.

■ Location-Aware LDA (LA-LDA):One component of LCA-LDA

■ Content-Aware LDA(CA-LDA):Another component of LCA-LDA

31

Outline




■ Conclusions

32

Experimental Results

■ Recommendation Effectiveness

33

■ Recommendation Effectiveness


34

■ Efficiency of online recommendation, querying cities are Beijing and Shanghai


35


■ In order to clearly see the performance of LCARS, we zoom the results as follows.

36

Latent Information Analysis

37

Outline




■ Conclusions

38

Conclusion

■ Spatial item Recommendations Data sparsity is a big challenge in recommendation systems New city problem amplify the data sparsity challenge Mobile scenario requires the recommender system to generate

real-time response to the user query.

■ Our Solution - LCARS Exploit the Local Preference of the querying city to alleviate the

data sparsity. Local word-of-mouth is a valuable resource for making a recommendation.

Take advantage of Content Information of items to overcome the sparsity. The contents build a bridge between users and items from disjoint regions.

Extend the Threshold-based algorithm (TA) to produce fast online recommendations

■ Result LCARS can produce more effective and more efficient

39

Thanks

Q&A

Documents

1 LCARS: A Location-Content-Aware Recommender System Hongzhi Yin †, Yizhou Sun ‡, Bin Cui † Zhiting Hu †, Ling Chen † Peking University ‡ Northeastern