Upload
bruno-tate
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Multiple Location Profiling for Users and Relationships
from Social Network and Content
Rui Li, Shengjie Wang, Kevin Chen-Chuan ChangUniversity of Illinois at Urbana-Champaign
2
Users’ Locations are important for many information services
and many others.
Lives in: Los Angeles
Carol
User
Social Network
Content Provider
Local Content Recommendation
Local Friends Recommendation
3
Community has explored social network and content to profile users’ locations.
Profiling a User’s Home Location
Location: Los Angeles
Tweets
Terrible LA traffic!
Want to go to Honolulu for Spring vacation!
See Gaga in Hollywood.
Good Morning!
Mike
LA
Carol
?
Lucy
Austin
Gaga
NY
Bob
San Diego
Jean
?
Social Network
4
Problem 1 They only profile a single home location.
Locations of a user’s friends
Locational Word Frequencies
Paramount 1
Los Angeles 1
Hollywood 2
Austin 2
Tweeted Locational Words
Carol lives Los Angeles and studied at Uni. of Texas at Austin
o incompleteo inaccurate
5
Problem 2 They totally miss profiling relationships.
Relationships Profiling
Carol follows Bob
Carol follows Lucy
Carol tweets Hollywood
both Carol and Lucy studied at AustinCarol lives Los Angeles
both Carol and Bob work at Los Angeles
o useful !
6
We focus on multiple location profiling for users and relationships.
Carol in Real-worldLocation: Los Angeles Education: Uni. of Texas at Austin
Terrible LA traffic!
Want to go to Honolulu for Spring vacation!
See Gaga in Hollywood.
Good Morning!
Mike
LA
Carol
?
Lucy
Austin
Gaga
NY
Bob
San Diego
Jean
?
Carol’s Location Profile: Los Angeles, AustinCarol follows Lucy: Austin, Austin
7
Our approach is to build a model to connect known relationships with unknown locations.
Known Relationships
Following Relationships
Carol follows Lucy
Carol follows Mike
….
Tweeting Relationships
Carol tweets Hollywood
Carol tweets Honolulu
….
Users’ Locations
?
Unknown Locations
MLP Model
Generation Model
Inference Algorithm
8
Challenge 1 How to connect users’ locations with relationships?A. from users’ locations to following relationshipsB. from users’ locations to tweeting relationships
Challenge 2 How to model that the relationships are mixed?A. some relationships are not based on locations.B. each relationship is based on a different location.
Challenge 3 How to utilize home locations from labeled users?
There are three challenges for building MLP.
9
Challenge 1.A We need to connect following relationships with two users’ locations.
Even a user has only one location follows others from different locations.
Tweeting Probability
Carol at Los Angeles follows Bob in San Diego. 20%
Carol at Los Angeles follows Mike in Los Angeles. 30%
…
The following probability as the probability generating a following relationship from a user to another user based on their locations
10
Observation We explore following probability via investigating a corpus
• It captures our intuition well.
• It fits a power law distribution.
11
Solution: We derive location-based following model for following probability.
The location-based following model
12
Challenge 1.B We need to connect tweeting relationships with a user’s location.
User at a location tweets different locations.
The tweeting probability as the probability generating a tweeting relationship from a user to a venue based on a location
Probability of Tweeting
Carol at Los Angeles tweets about watching a show in Hollywood. 30%
Carol at Los Angeles tweets about traffic in Los Angeles. 40%
…
13
• They capture our intuition well.
• They can be modeled as a set of multinomial distributions.
Observation We explore tweeting probability via investigating a corpus.
14
Solution: We derive location-based tweeting model for tweeting probability.
The location-based tweeting model
A
ݖ ǡ�� ݐ ሺ�ǡ���ሻ �
L
K
15
Noisy relationships are not useful!
Noisy Relationships
Carol follows Lady Gaga
Carol tweets Honolulu
Location-based Relationshipsb
Carol follows Lucy
Carol tweets Los Angeles
Challenge 2.A There are both noisy and location-based relationships.
16
Solution: We propose a mixture component for two types of relationships.
1. A relationship is generated based on either a location-based model or a random model.
2. A binary model selector μ indicates which model is used.
3. The selector is generated via a binomial distribution
17
Challenge 2.B Location-based relationships are related to multiple locations.
Location-based relationships
Carol follows Lucy
Carol tweets Hollywood
Accurate!Complete!
both Carol and Lucy studied at Austin
Carol lives Los Angeles
18
Solution: We fundamentally model users multiple locations in generating relationships.
Carol
{Los Angels 0.1, Austin 0.1, … }
Location profile as a multinomial distribution over locations.
Each relationship is based on one particular location from his profile.
19
Challenge 3 We should utilize observed locations from some users’ profiles.
Mike
LA
Carol
?
Lucy
Austin
Gaga
NY
Bob
San Diego
Jean
?
they are useful for profiling locations! we cannot use them directly to generate
relationships!
20% users provide their home locations in their profiles.
Solution: We utilize observed locations from as priors to generate users’ profiles.
Bob
{San Diego 0.9, Los Angels 0.05, …}
We assume users profiles are generated prior distributions.
Home locations of users are likely to be generated.
21
Therefore, we arrive a complete model.
22
We crawled a subset of Twitter. There are 139K users, 50
million tweets and 2 million following relationships.
We evaluate our model on a large Twitter corpus.
23
Task 1 profiling users’ home locations, MLP performs accurately and improves baselines.
24
Task 2 profiling users’ multiple locations, MLP proforms accurately and completely.
Precision and Recall at Rank 2
Case Studies
Locations in a similar region
Locations in different areas
Accurately
Completely
25
Task 3 profiling following relationships, MLP achieves 57% accuracy.
26
Thanks and Questions !
27
Backup for Questions
28
Experiments 1
• We use the home location provided in users’ profiles as ground truth.
• We compare two baseline methods proposed in literature.
29
Experiments 2
• We manually labeled multiple locations of 1000 users, and obtained 585 users, who clearly have multiple locations.
• We compare the same baseline methods as in the previous task.
• We measure the performance in terms of “precision” and “recall”.
30
Experiments 3
• We manually labeled location assignments of 585 users, whose multiple locations are known to us, and obtained 4426 relationships.
• We design a meaningful baseline method, which profile a relationship based users home locations.
31
MLP defines the joint probability of observations, parameters, and latent variables.
We infer users’ locations and locations assignments with the observed relationships and the given parameters.
We develop our algorithm based on the Gibbs sampling method.
We infer users’ locations and location assignments for relationships as latent variable in the joint probability.