1
Exploiting Text and Network Context for Geolocation of Social Media Users Afshin Rahimi, Duy Vu, Trevor Cohn, and Timothy Baldwin Department of Computing and Information Systems, Department of Mathematics and Statistics, The University of Melbourne OVERVIEW Task: Find the location of Twitter users based on text and net- work information Previous Shortcoming: No comparison of text-based and network-based models, no use of both. Datasets: 3 Twitter geolocation datasets: GeoText, Twitter-US, Twitter-World. Sample Format: userid, text, mention-list, latitude/longitude Y OU ARE WHERE YOUR WORDS SAY YOU ARE Usage of mountain in U.S. T EXT - BASED M ODEL ( LR ) Logistic regression with l 1 regularisation over k -d tree discretisation of latitude/longitude. Y OU ARE WHERE YOUR FRIENDS ARE Most of our online interactions are local. Twitter mention N ETWORK - BASED M ODEL ( LP ) Label Propagation in @-mention Network: Build an @-mention network. Initialise the location of training nodes with their known location. Iteratively update non-training nodes’ location to the median of their neighbours. Converges after 10 iterations. N ETWORK VERSUS T EXT For connected users, Network-based models are more accurate. For disconnected users (about 20% of the nodes), Text-based models are more accurate. Solution: Utilise both text and network informa- tion together! L ABEL P ROPAGATION OVER T EXT P REDICTIONS Initialise training nodes with their known location and test nodes with their text-based prediction. Iteratively update the location of non-training nodes to the median of their neighbours. Converges after 10 iterations. Isolated test nodes will keep their text-based prediction. D ENVER S T OP F EATURES R ESULTS State of the art results over all three datasets!

Exploiting Text and Network Context for Geolocation of Social Media Users

Embed Size (px)

Citation preview

Exploiting Text and Network Context for Geolocation of Social Media UsersAfshin Rahimi,♥ Duy Vu,♠ Trevor Cohn,♥ and Timothy Baldwin♥

♥ Department of Computing and Information Systems, ♠ Department of Mathematics and Statistics, The University of Melbourne

OVERVIEW

Task: Find the location of Twitter users based on text and net-work information

Previous Shortcoming: No comparison of text-based andnetwork-based models, no use of both.

Datasets: 3 Twitter geolocation datasets:GeoText, Twitter-US, Twitter-World.

Sample Format: userid, text, mention-list, latitude/longitude

YOU ARE WHERE YOUR WORDS SAY YOU ARE

Usage of mountain in U.S.

TEXT-BASED MODEL (LR)

Logistic regression with l1 regularisationover k-d tree discretisation of latitude/longitude.

130 120 110 100 90 80 70 60Longitude

25

30

35

40

45

50

Lati

tude

YOU ARE WHERE YOUR FRIENDS ARE

Most of our online interactions are local.

Twitter mention

NETWORK-BASED MODEL (LP)

Label Propagation in @-mention Network:

• Build an @-mention network.

• Initialise the location of training nodes with theirknown location.

• Iteratively update non-training nodes’ locationto the median of their neighbours.

• Converges after 10 iterations.

NETWORK VERSUS TEXT

• For connected users, Network-based modelsare more accurate.

• For disconnected users (about 20% of thenodes), Text-based models are more accurate.

• Solution: Utilise both text and network informa-tion together!

LABEL PROPAGATION OVER TEXT PREDICTIONS

• Initialise training nodes with their known locationand test nodes with their text-based prediction.

• Iteratively update the location of non-trainingnodes to the median of their neighbours.

• Converges after 10 iterations.

• Isolated test nodes will keep their text-basedprediction.

DENVER’S TOP FEATURES

RESULTS

State of the art results over all three datasets!

GEOTEXT TwitterUS TwitterWorld0

100

200

300

400

500

600

Med

ian

Erro

r in

km

Text-based Method (LR)Network-based Method (LP)Hybrid Method (LP-LR)Wing and Baldrige (2014)Ahmed et al. (2013)