Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011
Geo-Friends Recommendation in GPS-Based Cyber-Physical
Social Network
Xiao Yu, Ang Pan, Lu-An Tang, Zhenhui Li, Jiawei Han
University of Illinois at Urbana-Champaign
Acknowledgements: NSF, ARL, NASA, AFOSR (MURI), IBM & Boeing
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 2
Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework
• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on
Heterogeneous Information Network • Experiments • Conclusions
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 3
Motivation: Popularity of Mobile Devices • Mobile devices: Very popular, a major media of
communication
• Data from mobile devices (like real time GPS location, moving trajectories): Reflect users’ daily activities and real life social interactions
• Social network services: Allow users to store and share locations and trajectories collected from their mobile devices
A List of Major Location-Based Social Network Services
Foursquare Facebook Place Google Latitude Twitter Location Update
Yelp Check-in Google+ ……
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 4
Motivation: Geo-Friends Recommendation
• Social network with data collected from sensors is usually referred as Cyber-Physical Social Network
• Problem to be solved: Friend recommendation in GPS-based cyber-physical social networks, by combining GPS data with social network information
• Our method discovers real life friends on web-based social network
• Geo-Friends: Potential real life friends, who have both social similarities and geographical correlation
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 5
A Geo-Friend Finding Example
• Real life friends play an important role in off-line social events while most virtual on-line friends can fulfill such social function
Alex needs geo-friends join him in a local charity
event
Bob is college friend who lives in another
country now
Carlos is a co-worker but no social network
similarity with Alex
David shares common friends and goes to
same gym, same game store with Alex
David is more likely to be Alex’s geo-friend, but we cannot get this information by only analyzing social network or GPS data.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 6
Contribution • Propose a geo-friend recommendation problem, and
discuss the differences from previously studied link prediction problem
• Define and generate a set of GPS patterns to describe people’s real life social interaction and correlation
• Propose a random walk-based statistical framework for geo-friend recommendation
• Design and conduct a series of experiments on both synthetic and real-world datasets
• Demonstrate the power of our method in various situations
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 7
Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework
• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on
Heterogeneous Information Network • Experiments • Conclusions
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 8
Data Model • GPS Trajectory: Sequentially connecting GPS records
of a particular user, following the ascending order of timestamps
• GPS-Based Cyber Physical Social Network:
G(S, V, E): • V: Set of people in the
network
• E: Set of edges, represents all the links between people
• S: Set of GPS trajectories associated with people
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 9
Problem Definition
• Given G(S, V, E), and a particular query posed by person v∗
• Return a ranked list of people nodes in V and also for each element v′ in the list:
• What’s more, the ranking score in the process should be relevant to both GPS trajectory S and social network (V, E)
Evv >∉< '*,
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 10
Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework
• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on
Heterogeneous Information Network • Experiments • Conclusions
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 11
Geo-Friends Finding Framework: 3 Steps • GPS pattern extraction
• Convert raw, noisy GPS data to meaningful and representative GPS patterns
• Pattern-based heterogeneous information network building
• Combine geographical and social information together in one network
• Random walk with restart on the network
• Use random walk score to measure similarity between people vertices
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 12
GPS Pattern Extraction
• Based on empirical observations and heuristics, we propose four different GPS patterns to capture these information
• First, convert raw GPS trajectory dataset S to categorical dataset Scat , and sequential dataset Sseq
• Scat : Discard temporal information and keep discretized locations in an unordered manner
• Sseq : Locations are sequentially connected by the order of timestamps
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 13
FL-Pattern • FL-Pattern: Closed frequent patterns with support ≥
2 in Scat is defined as Frequent Location Patterns
• Frequent patterns in Scat could be generated using FP-Growth
• Heuristic: GPS locations can reflect people’s interests, and people tend to go to their interest-related locations more often
• If two people share common locations, which suggests they might share common interests, the probability that they become friends would be higher.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 14
FT-Pattern
• FT-Pattern: Closed sequential pattern with support ≥ 2 and length ≥ 2 in Sseq is Frequent Trajectory Pattern • Sequential Patterns in Sseq could be generated
using PrefixSpan • Heuristic: : GPS trajectory segments indicate people’s
habits and routines • People who share similar routines, tend to
become friends
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 15
FLT-Pattern • FLT-Pattern: For each FL-Pattern, if locations share
the same timestamp in all corresponding GPS trajectories, and no super-pattern with the same support can be generated by adding another time constrained location, this pattern is a Frequent Location with Time Constraint Pattern
• Heuristic:
• If two people share same locations at the same timestamps in their GPS trajectory, they should be geographically related.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 16
FTT-Pattern
• FTT-Pattern: Similarly to FLT-Pattern, Frequent Trajectory with Time Constraint Pattern can be defined as closed sequential pattern with support ≥ 2 and length ≥ 2 in Sseq and it shares the same time period in corresponding GPS trajectories
• Heuristic: Two people share same routine in a specific time period, which indicates they are hanging out in that time period
• If two people hang out, the probability of they becoming geo-friends would be higher
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 17
Pattern-Based Social Network
• Build a pattern-based heterogeneous information network by combining GPS patterns and social network structures
• Given G(S, V, E), first discard raw GPS trajectory set S
• Then for each GPS pattern, create an additional node p, and link corresponding person node v with p if this GPS pattern exists in person v’s GPS trajectory history
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 18
Pattern-Based Social Network (2) • Create a new edge <v, p>, and add it to E′. Set E′ in
contains three types of edges: edges between people, edges from person nodes to pattern nodes, and edges from pattern nodes to person nodes.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 19
Pattern Refinement
• Adding a large number of GPS patterns without selection may decrease the performance badly • Common locations contains no social similarity, e.g.,
bus stop, and hospital • Instead of manually refining patterns, we employ an
entropy-based thresholding measure* to refine and select discriminative GPS patterns • This method filter out patterns with high frequency
and low length * J.N. Kapur, P.K. Sahoo and A.K.C. Wong. A new method for gray-level picture
thresholding using the entropy of the histogram In Computer Vision, Graphics, and Image Processing, March 1985.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 20
Edge Weights: between Pattern Nodes and Person Nodes
• After the construction of the heterogeneous information network, edge weights between nodes need to be defined
• From different types of GPS pattern nodes to person nodes
Nbp(v) is the set of pattern nodes
length(p) denotes the length of pattern p
timespan(p) denotes time span of a time constraint pattern p
Parameters α, β, γ and θ controls pattern importance
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 21
Edge Weights (2) • From pattern nodes to person nodes
• Nbv(p) denotes the set of person nodes connecting to pattern node p
• From person nodes to person nodes
• Nbv(v) denotes the set of person nodes connected to person node v
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 22
Transition Matrix • In order to apply random walk with restart on the
network, we need to convert network into a transition matrix and then normalize edge weights of pattern nodes • Pr(V) is an |V| × |V| matrix representing the transition
probability between person nodes to person nodes • Pr(A) is a |P|× |V| matrix representing the transition
probability from GPS pattern nodes to person nodes
• Pr(B) is a |V| × |P| matrix representing the transition probability from person nodes to GPS pattern nodes
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 23
Why Choose Random Walk with Restart • Random Walk with Restart can simulate the following aspects of
friend finding in GPS-based social network • If a GPS pattern contains more geographical information, the
in-coming probability from person nodes to this pattern should be higher, which increases the probability from one person to another via this GPS pattern
• If two people share more GSP patterns, the overall probability for one person link to another via these GPS pattern nodes would be higher
• If one GPS pattern is rare, the out-going probability of this node would be larger, so that people connected to this pattern would have a higher probability to be linked together
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 24
Random Walk with Restart
• Denote the query person as v∗. The random walk process can be represented as:
• RN is a vector, that represents the link relevance from all the nodes to query person v*
• R(t)N represents the link relevance of each node at
the tth iteration
• We assign R(0)N(v*) = 1 where v* is the query
nodes, and all the other elements to 0
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 25
Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework
• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on
Heterogeneous Information Network • Experiments • Conclusions
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 26
Datasets
• We generate 4 synthetic datasets with different sizes, attributes and distributions in order to cover different scenarios and thoroughly test our framework
• Also, apply our method on MIT Reality Mining dataset
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 27
Competitor Methods
• Random: random selection
• Same Edge: choose friends based on number of same friends
• GPS Similarity: choose friends by measuring GPS location and trajectory similarity
• Random Walk without GPS Patterns: Recommend friends by applying random walk with restart on the original social network
• Bluetooth (only MIT dataset): Recommend friends by returning people who share high meeting frequency
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 28
Performance (1)
gpsnet120 precision gpsnet120 recall
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 29
Performance (2)
Mit dataset precision Mit dataset recall
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 30
Performance (3)
gpsnet120 dataset precision-recall curve MIT dataset precision-recall curve
Precision and recall curve between Random Walk with Restart without GPS information and our method
Please refer to the paper for more experiment results and analysis
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 31
Roadmap • Motivation • Background and Preliminaries • Geo-friend Finding Framework
• GPS Pattern Extraction • Build Pattern-Based Information Network • Random Walk with Restart on
Heterogeneous Information Network • Experiments • Conclusions
Data and Information Systems Laboratory University of Illinois Urbana-Champaign
ASONAM 2011 July, 2011 32
Conclusions
• Propose a problem of identifying geographically related friends, and also a three-step statistical framework which combines geo-information with social analysis
• Future work • Domain-oriented GPS pattern definition • Friends recommendation based on user and
his/her interests • Real time friend recommendation by tracking user
GPS usage on the fly