Upload
ronald-yelland
View
213
Download
0
Embed Size (px)
Citation preview
1
Using GIS to Understand Behavior Patterns of
Twitter Users
Yue LiM.S. Civil/Geomatics Engineering
Purdue UniversityCommittee: Dr.Jie Shan (Chair), Dr.Nicole Kong,
Dr.James Bethel
2
Introduction
• Volunteered Geographic Information (VGI)1
− Emergency management, event detection, tourist behavior, knowledge discovery…
− The most popular micro-blogging site
− Tweets with longitude and latitude
− A gold mine for scholars in geography, linguistics, sociology, economics, health, and psychology2
− Marketing, advertising, regulation,…
3
Research Goal
• To discover the spatio-temporal pattern of tweets
• To infer the human mobility patterns behind the tweets
• To understand the lifestyle of college students
4
Study Area
• College town/city, Big Ten Universities
• West Lafayette, IN
− Most densely populated city in IN
− Home of Purdue University
• Ann Arbor, MI
− University of Michigan
• Bloomington, IN
− Indiana University, Bloomington
• Columbus, OH
− Ohio State University
5
Data
• Geo-tagged tweets downloaded with Twitter Streaming API
• With longitude and latitude at time of posting
• Nov 18, 2013 to April 2, 2014
− West Lafayette : 59,238
− Ann Arbor: 220,117
− Bloomington :247,202
− Columbus: 1,936,238
6
Methods
• Pure Spatial
− Point density analysis
• Pure Temporal
• Spatio-Temporal
− Tweets in Land Use
− Event/Anomaly detection
− Individual twitter user patterns
7
Tweets in West Lafayette
8
Tweets in Ann Arbor
9
Tweets in Bloomington
10
Tweets in Columbus
11
Tweets by Hour
12
Tweets by Hour
13
Tweets and Land Use
• Land use in Ann Arbor, MI− Industrial
− Mixed Use
− Office
− Public/Education
− Recreation
− Residential
− Transportation
− Vacant
• Spatially join the tweets with land use
14
Tweets and Land Use
15
1 - Commercial; 2- Industrial; 3- Mixed Use; 4- Office; 5- Public/Education;6 – Recreation; 7- Residence; 8- Transportation; 9- Vacant/River
Tweets and Land Use
16
Event Detection
• Spatially and temporally aggregated
− Football game, concert, festival,…
• Use Purdue shooting on Jan 21, 2014 as an example
− Lockdown from around 12-14pm
• Temporally
− 710 tweets in 12-14pm Jan 21, 231 unique users
− 7443 tweets in 12-14pm in the whole datasets, 1080 unique users
• Spatially
− How to measure spatial anomaly?
17
Hypotheses
• Challenge: Inhomogeneous/clustered process even outside lockdown period
− Were tweets more significantly clustered during lockdown than average?
• Intensity of tweets is correlated with distance to campus buildings
• Extent of clustering is positively correlated with chi-sqare value
18
Covariate: Purdue Buildings
Purdue Building Shapefile converted to tesselation
R libraries: maptools, sp, spatstat
Functions: as.mask → im → tess
19
Randomization Test
Algorithm (by Ken Kellner):
1. Select 710 random tweets from dates 1/16/14 - 1/26/14 and hours 12am - 14pm without replacement
2. Call quadratcount() and quadrat.test() on new random dataset
3. Save chi-square value
4. Repeat 1000 times to obtain distribution of chi-square values
5. Compare actual chi-square value obtained on 1/21/14 with distribution
6. Quasi-p value: proportion of values more extreme than obtained value
Assumption: greater chi-square value = more inhomogenous/clustered
Tested with simulation
20
Randomization Test Result
Chi-square: 85162.85Quasi-p value: 0.038
• We were able to detect a change in the pattern of tweets during the lockdown, when presumably more people were stuck in Purdue buildings than average.
21
Event Detection
• We can see anomaly from Twitter data both temporally and spatially
• However, we are still looking for a complete and integrated algorithm, and apply it to other events
• To be cont’d
22
Frequent Twitter Users
• Top 10 Twitter users with the most tweets in Ann Arbor
• Plot the tweets of individual Twitter user
• Four typical patterns− Work-Home
− Work-Road-Home
− Work-Home-Short Visit
− Multiple Clusters
23
Frequent Twitter Users
24
Frequent Twitter Users
25
Future Work
• On-going research
• Complete analysis in all 4 study areas, and compare the patterns
• Develop/Find an algorithm for event detection
• …
• Any suggestions are welcomed!
26
References
• 1. Goodchild, M. F., 2007. Citizens as sensors: The world of volunteered geography, GeoJournal, 69, 211-221.
• 2. Ghosh, D., and R. Guha, 2013. What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System, Cartography and Geographic Information Science, 40(2), 90-102.
27
QUESTIONS?