Transferring Heterogeneous Links across Location-Based ...jzhang2/files/2014_wsdm_slides.pdf · two real-world social networks as summarized in Table 2. We chose Twitter and Foursquare

Transferring Heterogeneous Links across Location-Based Social Networks

Xiangnan Kong

University of Illinois at Chicago

Jiawei Zhang Philip S. Yu

Social Links

Contents: Tweets

Locations

8 AM 12 PM 4 PM 8 PM 11 PM

Temporal Activities

Social Network: Who Where What When

location Links

Problem Description: Collective Link Prediction

Social Links

Contents: Tweets

Locations

8 AM 12 PM 4 PM 8 PM 11 PM

Temporal Activities

social links to be predicted

location links to be predicted

Solve Challenge 1: Heterogeneous Features

Social Links

Contents: Tweets

Locations

8 AM 12 PM 4 PM 8 PM 11 PM

Temporal Activities

Extract Heterogeneous Features (1)

Social

Spatial

Temporal

Content

?

social links

common friend

Extract Heterogeneous Features (2)

Social

Spatial

Temporal

Content

?

location links

extended common “friend”

Solve Challenge 2: Collective Link Prediction

Social

Spatial

Temporal

Content

?

location links

Social

Spatial

Temporal

Content

?

social links

?

?

?

?

√

√

?

?

√

√

Solve Challenge 3: Cold Start Problem

Social Links

Contents: Tweets

Locations

8 AM 12 PM 4 PM 8 PM 11 PM

Temporal Activities

New NetworkBrand New Network

Locations

Tips

8 AM 12 PM 4 PM 8 PM 11 PM

User Accounts

Temporal Activities

New

User Accounts

Locationslocate

locate

Tweets

8 AM 12 PM 4 PM 8 PM 11 PM

Temporal Activities

Oldtarget network source network

Anchor Links across Aligned Networks

Locations

Tips

8 AM 12 PM 4 PM 8 PM 11 PM

User Accounts

Temporal Activities

New

User Accounts

Locationslocate

locate

Tweets

8 AM 12 PM 4 PM 8 PM 11 PM

Temporal Activities

Oldtarget network source network

Experiments

Data Setsanchor links based upon the ranking scores of the classifier.Our solution is motivated by the stable marriage problem [8]in mathematics.

We first use a toy example in Figure 2 to illustrate themain idea of our solution. Suppose in Figure 2(a) we aregiven the ranking scores from the classifiers. We can see inFigure 2(b) that link prediction methods with a fixed thresh-old may not be able to predict well, because the predictedlinks do not satisfy the constraint of one-to-one relationship.Thus one user account in the source network can be linkedwith multiple accounts in the target network. In Figure 2(c),weighted maximium matching methods can find a set of linkswith maximum sum of weights. However, it is worth notingthat the input scores are uncalibrated, so maximum weightmatching may not be a good solution for anchor link predic-tion problems. The input scores only indicate the ranking ofdi↵erent user pairs, i.e., the preference relationship amongdi↵erent user pairs.

Here we say ‘node x prefers node y over node z’, if thescore of pair (x, y) is larger than the score of pair (x, z). Forexample, in Figure 2(c), the weight of pair a, i.e., Score(a) =0.8, is larger than Score(c) = 0.6. It shows that user u

s1

(thefirst user in the source network) prefers u

t1

over u

t2

. Theproblem with the prediction result in Figure 2(c) is that,the pair (us

1

, u

t1

) should be more likely to be an anchor linkdue to the following reasons: (1) u

s1

prefers u

t1

over u

t2

; (2)u

t1

also prefers u

s1

over u

s2

.Definition (Blocking Pair): A pair (us

i , utj) is a blocking

pair i↵ u

si and u

tj both prefer each other over their current

assignments respectively in the predicted set of anchor linksA0.Definition (Stable Matching): An inferred anchor link setA0 is stable if there is no blocking pair.

We propose to formulate the anchor link prediction prob-lem as a stable matching problem between user accounts insource network and accounts in target network. Assume thatwe have two sets of unlabeled user accounts, i.e., Us = {u

si }i

in source network and U t = {u

tj}j in target network. Each

u

si has a ranking list or preference list P (us

i ) over all the useraccounts in target network (ut

j 2 U t) based upon the inputscores of di↵erent pairs. For example, in Figure 2(a), thepreference list of node u

s1

is P (us1

) = (ut1

> u

t2

), indicatingthat node u

t1

is preferred by u

s1

over u

t2

. The preference listof node u

s2

is also P (us2

) = (ut1

> u

t2

). Similarly, we alsobuild a preference list for each user account in the targetnetwork. In Figure 2(a), P (ut

1

) = P (ut2

) = (us1

> u

s2

).The proposed Mna method for anchor link prediction is

shown in Algorithm 1. In each iteration, we first randomlyselect a free user account u

si from the source network. Then

we get the most prefered user node u

tj by u

si in its preference

list P (usi ). We then remove u

tj from the preference list, i.e.,

P (usi ) = P (us

i ) � u

tj .

If u

tj is also a free account, we add the pair of accounts

(usi , u

tj) into the current solution set A0. Otherwise, u

tj is

already occupied with u

sp in A0. We then examine the pref-

erence of u

tj . If u

tj also prefers u

si over u

sp, it means that the

pair (usi , u

tj) is a blocking pair. We remove the blocking pair

by replacing the pair (usp, u

tj) in the solution set A0 with the

pair (usi , u

tj). Otherwise, if u

tj prefers u

sp over u

si , we start

the next iteration to reach out the next free node in thesource network. The algorithm stops when all the users inthe source network are occupied, or all the preference listsof free accounts in the source network are empty.

Table 2: Properties of the Heterogeneous SocialNetworks

network

property Twitter Foursquare

# nodeuser 5,223 5,392tweet/tip 9,490,707 48,756location 297,182 38,921

# linkfriend/follow 164,920 31,312write 9,490,707 48,756locate 615,515 48,756

4. EXPERIMENTS

4.1 Data PreparationIn order to evaluate the performance of the proposed ap-

proach for anchor link prediction, we tested our algorithm ontwo real-world social networks as summarized in Table 2. Wechose Twitter and Foursquare as our data sources becausepublic tweets and Foursquare tips can be easily collected bytheir APIs.

1) Foursquare: the first network we crawled is the Foursquarewebsite, a representative location-based social network(LBSN). We collected a dataset consisting of 5,392users and 94,187 tips. For each tip, the location data(latitude and longitude) as well as the timestamp areavailable. Moreover, Foursquare network also providesdata about whether one user is following or a friendof another user. These links can indicate the socialrelationship among the users.

2) Twitter: The second network we crawled is Twitter,an online social microblogging network. We collected5,223 users and 9,490,707 tweets. In Twitter network,all tweets include time data, and some tweets includelocation data. In total, we have 615,515 tweets withlocation data (latitude and longitude), which is about6.5% of all the tweets we collected.

In order to conduct experiments, we pre-process these rawdata to obtain the ground-truth of users’ anchor links. InFoursquare network, we can collect some users’ Twitter IDsin their account pages. We use these information to build theground-truth of anchor links between user accounts acrossthe two networks. If a Foursquare user has shown his/herTwitter ID in the website, we treat it as an anchor linkbetween this user’s Foursquare account and Twitter account.

4.2 Comparative MethodsIn order to study the e↵ectiveness of the proposed ap-

proach, we compare our method with eight baseline meth-ods, which are commonly used baselines including both su-pervised and unsupervised link prediction approaches. Thecompared methods are summarized as follows:

• Multi-Network Anchoring(Mna): the proposed methodin this paper. Mna can explicitly exploit four types ofinformation from both networks to infer anchor links,i.e., social, spatial, temporal and text data. In ad-dition, Mna incorporates the one-to-one constraint inthe inference process. We argue that by combining the

Evaluation Metric1. Ground Truth

2. Evaluation Metric

existing social and location links

(1) Accuracy(2) AUC

Experiment Resultscollective link prediction

independent link prediction

Parameter Analysis

Summary1. we study the collective link prediction problem

simultaneously: social links & location links

2. we use information from multiple aligned networks simultaneously: new network & aligned old network.

3. we propose a tentative method to solve the cold start problem!

Q&A

Documents

Transferring Heterogeneous Links across Location-Based ...jzhang2/files/2014_wsdm_slides.pdf · two real-world social networks as summarized in Table 2. We chose Twitter and Foursquare