Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Transferring Heterogeneous Links across Location-Based Social Networks
Xiangnan Kong
University of Illinois at Chicago
Jiawei Zhang Philip S. Yu
Social Links
Contents: Tweets
Locations
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activities
Social Network: Who Where What When
location Links
Problem Description: Collective Link Prediction
Social Links
Contents: Tweets
Locations
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activities
social links to be predicted
location links to be predicted
Solve Challenge 1: Heterogeneous Features
Social Links
Contents: Tweets
Locations
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activities
Extract Heterogeneous Features (1)
Social
Spatial
Temporal
Content
?
social links
common friend
Extract Heterogeneous Features (2)
Social
Spatial
Temporal
Content
?
location links
extended common “friend”
Solve Challenge 2: Collective Link Prediction
Social
Spatial
Temporal
Content
?
location links
Social
Spatial
Temporal
Content
?
social links
?
?
?
?
√
√
?
?
√
√
Solve Challenge 3: Cold Start Problem
Social Links
Contents: Tweets
Locations
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activities
New NetworkBrand New Network
Locations
Tips
8 AM 12 PM 4 PM 8 PM 11 PM
User Accounts
Temporal Activities
New
User Accounts
Locationslocate
locate
Tweets
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activities
Oldtarget network source network
Anchor Links across Aligned Networks
Locations
Tips
8 AM 12 PM 4 PM 8 PM 11 PM
User Accounts
Temporal Activities
New
User Accounts
Locationslocate
locate
Tweets
8 AM 12 PM 4 PM 8 PM 11 PM
Temporal Activities
Oldtarget network source network
Experiments
Data Setsanchor links based upon the ranking scores of the classifier.Our solution is motivated by the stable marriage problem [8]in mathematics.
We first use a toy example in Figure 2 to illustrate themain idea of our solution. Suppose in Figure 2(a) we aregiven the ranking scores from the classifiers. We can see inFigure 2(b) that link prediction methods with a fixed thresh-old may not be able to predict well, because the predictedlinks do not satisfy the constraint of one-to-one relationship.Thus one user account in the source network can be linkedwith multiple accounts in the target network. In Figure 2(c),weighted maximium matching methods can find a set of linkswith maximum sum of weights. However, it is worth notingthat the input scores are uncalibrated, so maximum weightmatching may not be a good solution for anchor link predic-tion problems. The input scores only indicate the ranking ofdi↵erent user pairs, i.e., the preference relationship amongdi↵erent user pairs.
Here we say ‘node x prefers node y over node z’, if thescore of pair (x, y) is larger than the score of pair (x, z). Forexample, in Figure 2(c), the weight of pair a, i.e., Score(a) =0.8, is larger than Score(c) = 0.6. It shows that user u
s1
(thefirst user in the source network) prefers u
t1
over u
t2
. Theproblem with the prediction result in Figure 2(c) is that,the pair (us
1
, u
t1
) should be more likely to be an anchor linkdue to the following reasons: (1) u
s1
prefers u
t1
over u
t2
; (2)u
t1
also prefers u
s1
over u
s2
.Definition (Blocking Pair): A pair (us
i , utj) is a blocking
pair i↵ u
si and u
tj both prefer each other over their current
assignments respectively in the predicted set of anchor linksA0.Definition (Stable Matching): An inferred anchor link setA0 is stable if there is no blocking pair.
We propose to formulate the anchor link prediction prob-lem as a stable matching problem between user accounts insource network and accounts in target network. Assume thatwe have two sets of unlabeled user accounts, i.e., Us = {u
si }i
in source network and U t = {u
tj}j in target network. Each
u
si has a ranking list or preference list P (us
i ) over all the useraccounts in target network (ut
j 2 U t) based upon the inputscores of di↵erent pairs. For example, in Figure 2(a), thepreference list of node u
s1
is P (us1
) = (ut1
> u
t2
), indicatingthat node u
t1
is preferred by u
s1
over u
t2
. The preference listof node u
s2
is also P (us2
) = (ut1
> u
t2
). Similarly, we alsobuild a preference list for each user account in the targetnetwork. In Figure 2(a), P (ut
1
) = P (ut2
) = (us1
> u
s2
).The proposed Mna method for anchor link prediction is
shown in Algorithm 1. In each iteration, we first randomlyselect a free user account u
si from the source network. Then
we get the most prefered user node u
tj by u
si in its preference
list P (usi ). We then remove u
tj from the preference list, i.e.,
P (usi ) = P (us
i ) � u
tj .
If u
tj is also a free account, we add the pair of accounts
(usi , u
tj) into the current solution set A0. Otherwise, u
tj is
already occupied with u
sp in A0. We then examine the pref-
erence of u
tj . If u
tj also prefers u
si over u
sp, it means that the
pair (usi , u
tj) is a blocking pair. We remove the blocking pair
by replacing the pair (usp, u
tj) in the solution set A0 with the
pair (usi , u
tj). Otherwise, if u
tj prefers u
sp over u
si , we start
the next iteration to reach out the next free node in thesource network. The algorithm stops when all the users inthe source network are occupied, or all the preference listsof free accounts in the source network are empty.
Table 2: Properties of the Heterogeneous SocialNetworks
network
property Twitter Foursquare
# nodeuser 5,223 5,392tweet/tip 9,490,707 48,756location 297,182 38,921
# linkfriend/follow 164,920 31,312write 9,490,707 48,756locate 615,515 48,756
4. EXPERIMENTS
4.1 Data PreparationIn order to evaluate the performance of the proposed ap-
proach for anchor link prediction, we tested our algorithm ontwo real-world social networks as summarized in Table 2. Wechose Twitter and Foursquare as our data sources becausepublic tweets and Foursquare tips can be easily collected bytheir APIs.
1) Foursquare: the first network we crawled is the Foursquarewebsite, a representative location-based social network(LBSN). We collected a dataset consisting of 5,392users and 94,187 tips. For each tip, the location data(latitude and longitude) as well as the timestamp areavailable. Moreover, Foursquare network also providesdata about whether one user is following or a friendof another user. These links can indicate the socialrelationship among the users.
2) Twitter: The second network we crawled is Twitter,an online social microblogging network. We collected5,223 users and 9,490,707 tweets. In Twitter network,all tweets include time data, and some tweets includelocation data. In total, we have 615,515 tweets withlocation data (latitude and longitude), which is about6.5% of all the tweets we collected.
In order to conduct experiments, we pre-process these rawdata to obtain the ground-truth of users’ anchor links. InFoursquare network, we can collect some users’ Twitter IDsin their account pages. We use these information to build theground-truth of anchor links between user accounts acrossthe two networks. If a Foursquare user has shown his/herTwitter ID in the website, we treat it as an anchor linkbetween this user’s Foursquare account and Twitter account.
4.2 Comparative MethodsIn order to study the e↵ectiveness of the proposed ap-
proach, we compare our method with eight baseline meth-ods, which are commonly used baselines including both su-pervised and unsupervised link prediction approaches. Thecompared methods are summarized as follows:
• Multi-Network Anchoring(Mna): the proposed methodin this paper. Mna can explicitly exploit four types ofinformation from both networks to infer anchor links,i.e., social, spatial, temporal and text data. In ad-dition, Mna incorporates the one-to-one constraint inthe inference process. We argue that by combining the
Evaluation Metric1. Ground Truth
2. Evaluation Metric
existing social and location links
(1) Accuracy(2) AUC
Experiment Resultscollective link prediction
independent link prediction
Parameter Analysis
Summary1. we study the collective link prediction problem
simultaneously: social links & location links
2. we use information from multiple aligned networks simultaneously: new network & aligned old network.
3. we propose a tentative method to solve the cold start problem!
Q&A