52
That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships David Jurgens Sapienza University of Rome Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI / NBC) contract number D12PC00285. The IARPA research focuses solely on Latin America. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBE, or the U.S. Government. Work done while at HRL Laboratories, LLC Tuesday, July 9, 13

That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

That’s What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships

David Jurgens✢Sapienza University of Rome

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI / NBC) contract number D12PC00285. The IARPA research focuses solely on Latin America. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBE, or the U.S. Government.

✢Work done while at HRL Laboratories, LLC

Tuesday, July 9, 13

Page 2: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Location matters

Regional collapseor local occurrence?

Tuesday, July 9, 13

Page 3: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Location matters

Regional collapseor local occurrence?

Budding epidemicor just a case of the flu?

Tuesday, July 9, 13

Page 4: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Location matters

Regional collapseor local occurrence?

Budding epidemicor just a case of the flu?

The start of a mass riotor just an unhappy person?

Tuesday, July 9, 13

Page 5: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

But good location data is relatively sparse

0

25

50

75

100

Perc

ent

of d

aily

vol

ume

GPSWho knows

TwitterTuesday, July 9, 13

Page 6: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

But good location data is relatively sparse

0

25

50

75

100

Perc

ent

of d

aily

vol

ume

GPSWho knows

Twitter

User-provided locationsHecht et al. (2011), Pontes et al. (2012)

Tuesday, July 9, 13

Page 7: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

But good location data is relatively sparse

0

25

50

75

100

Perc

ent

of d

aily

vol

ume

GPSWho knows

Twitter

User-provided locationsHecht et al. (2011), Pontes et al. (2012)

Message ContentCheng et al. (2010), Mahmud et al. (2012),

Ikawa et al. (2012), Bo et al. (2013)

Tuesday, July 9, 13

Page 8: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

But good location data is relatively sparse

0

25

50

75

100

Perc

ent

of d

aily

vol

ume

GPSWho knows

Twitter

User-provided locationsHecht et al. (2011), Pontes et al. (2012)

Message ContentCheng et al. (2010), Mahmud et al. (2012),

Ikawa et al. (2012), Bo et al. (2013)

Social NetworkBackstrom et al. (2010), Davis Jr. et al. (2011),

Salidek et al. (2012)

Tuesday, July 9, 13

Page 9: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

But good location data is relatively sparse

0

25

50

75

100

Perc

ent

of d

aily

vol

ume

GPSWho knows

Twitter

User-provided locationsHecht et al. (2011), Pontes et al. (2012)

Message ContentCheng et al. (2010), Mahmud et al. (2012),

Ikawa et al. (2012), Bo et al. (2013)

Social NetworkBackstrom et al. (2010), Davis Jr. et al. (2011),

Salidek et al. (2012)

Tuesday, July 9, 13

Page 10: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Locality is still a dominant factor in the social relationships

people have online

Sociological Contribution

Tuesday, July 9, 13

Page 11: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Pragmatic Contribution

Geo-tag 77% of all Twitter data

Tuesday, July 9, 13

Page 12: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Pragmatic Contribution

Geo-tag 77% of all Twitter dataindependent of country

Tuesday, July 9, 13

Page 13: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Pragmatic Contribution

Geo-tag 77% of all Twitter dataindependent of countryindependent of language

Tuesday, July 9, 13

Page 14: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Pragmatic Contribution

Geo-tag 77% of all Twitter dataindependent of countryindependent of language

(mostly) independent of ego-network size

Tuesday, July 9, 13

Page 15: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Pragmatic Contribution

Geo-tag 77% of all Twitter dataindependent of countryindependent of language

(mostly) independent of ego-network size

Median error ~ 10km

Tuesday, July 9, 13

Page 16: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

In olden times, your social network was only people nearby

Tuesday, July 9, 13

Page 17: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Does location matter if we can be friends with anyone, anywhere?

Tuesday, July 9, 13

Page 18: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Location is still alive in online social networks

Where you are

Frequency distribution of where your friends

are relative to you

Based on 20.5M relationships in Twitter

Tuesday, July 9, 13

Page 19: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Online Social Networks under focus

• Twitter

• Bi-Directional @mentions

• Bi-Directional followers

• Foursquare

• Explicit friendships

All have location data

Tuesday, July 9, 13

Page 20: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Twitter Social Networks

• Bi-directional followers (crawled)

• ~96K individuals and 16.6M relationships

• Bi-directional mentions

• from a 10% sample of Twitter over 7 months

• 47.7M individuals and 254M relationships

• 5.3% tagged with user-level location

Tuesday, July 9, 13

Page 21: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Foursquare overview

• Built from a crawl over 3 months

• ~4M individuals and17.6M relationships

• 1.6M also had linked Twitter accounts

• 52.8% of Foursquare relationships for Twitter-linked accounts also had bi-directional mentions in Twitter

• Self-reported location was highly accurate, so we mapped 68.8% of users to a location

Tuesday, July 9, 13

Page 22: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

How close is the closest friend?

Tuesday, July 9, 13

Page 23: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

How close is the closest friend?

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 10 100 1000 10000

F(d

ista

nce

)

Distance of Closest Neighbor (km)

Twitter Bidirectional @MentionTwitter Bidirectional Follower

Foursquare Friends

P(di

st. ≤

x)

Tuesday, July 9, 13

Page 24: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

High-level Algorithm: Your location is a function of your

friends’ locations

Tuesday, July 9, 13

Page 25: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

High-level Algorithm: Your location is a function of your

friends’ locations

do this for a while:

1. Get their friends’ locations

2. Pick one of them (smartly) as the user’s location

for everyone in the network:

Tuesday, July 9, 13

Page 26: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Label Propagation

Chicago

Chicago

Chicago

ChicagoChicago

Rockford

Madison

Tuesday, July 9, 13

Page 27: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Label Propagation

Chicago

Chicago

Chicago

ChicagoChicago

Rockford

Madison

Tuesday, July 9, 13

Page 28: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

The slight problem with Label Propagation

Chicago

Chicago

Brooklyn

Statin IslandQueens

Bronx

Manhattan

Tuesday, July 9, 13

Page 29: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

The slight problem with Label Propagation

Chicago

Chicago

Brooklyn

Statin IslandQueens

Bronx

Manhattan

Tuesday, July 9, 13

Page 30: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Spatial Label Propagation

Location data is actuallylatitude and longitude

Pick the geometric median of the friends’ locations

Tuesday, July 9, 13

Page 31: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Comparisons

do this for a while:

1. Get their friends’ locations

2. Pick one of them (smartly) as the user’s location

for everyone in the network:

1. Pick any random user’s location

2. Pick a random friend’s location

3. Pick the most frequent location name among friends’

(assumes coordinates have been converted to names)

Tuesday, July 9, 13

Page 32: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Evaluation Methodology

• Partition users with known locations into five sets

• Hold out one set, run method on complete graph using other four as seed locations (~2M seeds; 4% of network labeled)

• Measure error on held out set (0.5M test)

Tuesday, July 9, 13

Page 33: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

ResultsP(

dist

. ≤ x

)

Error from true location (km)

Tuesday, July 9, 13

Page 34: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

P(di

st. ≤

x)

Country-level Results

Using geometric median only

Tuesday, July 9, 13

Page 35: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Results per ego-network sizeP(

dist

. ≤ x

)

Using geometric median only

Tuesday, July 9, 13

Page 36: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Convergence is quick(a while ≈ 4)

These users account for 77% of

the daily Twitter volume

Tuesday, July 9, 13

Page 37: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Can we do better?

Tuesday, July 9, 13

Page 38: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

RQ1: Does triadic structure predict locality? No

Pick the geometric median among the locations for closed triads in the ego-network

<?

Tuesday, July 9, 13

Page 39: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

P(di

st. ≤

x)

0.278 correlation with the size of the ego-network and distance to friends

RQ1: Does triadic structure predict locality? No

Tuesday, July 9, 13

Page 40: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

• Two representations of all of a user’s tweets

• A unigram language model

• A vector-space based model

• Correlate the similarity of two users’ representations with their distances

RQ2: Does linguistic similarity predict geographic closeness? No*

< ?

Cannot wait for Pats preseason

to start

Going to the FC Barcelona game

2mrw

Pats are goin all da way to the

superbowl!

Tuesday, July 9, 13

Page 41: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

• 0.030 Spearman’s correlation for language model

• 0.011 Spearman’s correlation for vector space

• Correlation was consistent across country and ego-network size

RQ2: Does linguistic similarity predict geographic closeness? No*

Tuesday, July 9, 13

Page 42: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

• Leverage self-reported Time Zone data

• Remove a relationship between two users if their set of time zones is disjoint

• But only if they self-report

• Pruned 96.7M edges from network (38%)

RQ3: Can we improve using platform metadata? Sort of

Tuesday, July 9, 13

Page 43: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

RQ3: Can we improve using platform metadata? Sort of

P(di

st. ≤

x)

3X performance improvement

No loss in accuracy Some loss in coverage

Tuesday, July 9, 13

Page 44: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

What if we had no ground truth?

Tuesday, July 9, 13

Page 45: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Option 1: Use whatever users provide

• Conservatively map self-reported location names to coordinates

• 11.3M users tagged (23.7%)

• Run using only self-reported data and test against held-out GPS data

Tuesday, July 9, 13

Page 46: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Option 1: Use whatever users provide

P(di

st. ≤

x)

Tuesday, July 9, 13

Page 47: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Option 2: Get the locations from another online social network

Tuesday, July 9, 13

Page 48: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Option 2: Get the locations from another online social network

• Goal: Predict locations of Foursquare users using only location data from Twitter

• Merge the networks using the 1.6M of the 4M Foursquare users who have identities in both platforms

• Test on Foursquare-only users

Tuesday, July 9, 13

Page 49: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Option 2: Get the locations from another online social network

P(di

st. ≤

x)

Tuesday, July 9, 13

Page 50: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Insights

• Social networks provide a huge source of location information

• A little bit of good location data goes a long way, but even bad data is okay

• Multi-platform identities enable having new types of geolocated data

Tuesday, July 9, 13

Page 51: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Open Questions

• What types of communications do predict locality?

• How does the structure of the ego-network relate to locality?

• What benefit can be seen by applying both network-based and linguistic based geolocation approaches

Tuesday, July 9, 13

Page 52: That’s What Friends Are For: Inferring Location in …jurgens/docs/icwsm-2013-slides.pdfGet their friends’ locations 2. Pick one of them (smartly) as the user’s location for

Thank you

David [email protected]

Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center (DoI / NBC) contract number D12PC00285. The IARPA research focuses solely on Latin America. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBE, or the U.S. Government.

Tuesday, July 9, 13