Upload
jade-hicks
View
214
Download
0
Embed Size (px)
Citation preview
CIKM 2011 :: Glasgow :: Oct 25, 2011
Toward Traffic-Driven Location-Based Web Search
Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas A&M University
Location-Based Search (aka Local Search)
20% of queries on Google are related to location (100 millions / day)Mobile search is taking off …
Sushi
Sorted byContent-Based Relevance:
Sushi
Sorted by Distance:
Sushi
Sorted by User Rating:
Sushi
Sorted by On-line Reputation(PageRank scores):
However,The real-world trails of human activityare missing!
•For example, what sushi places are “hot” in terms of traffic?•Which ones are relatively less “packed” at noon?•What places do people “like me” go to?
• Our focus: Location sharing services
3400% in 2010!
Over 1 billion in 2011!
Incorporating Check-in Data to Local Search
•How do we model the real-world trails of human activity from the check-ins?
• Our proposal: venue-based traffic pattern based on the timestamps of the check-ins @ the venue.
• Besides the traffic patterns, category label for locations will help refine the search quality.
Express “Traffic” for a Location
Daily Traffic Pattern for Walmart Weekly Traffic Pattern for Walmart
For check-ins from a specific location:• bucket the timestamps of the check-ins into 24 hours to
generate a daily traffic pattern;• and bucket the timestamps into 7 days to generate a weekly
traffic pattern.
Question 1: Do traffic patterns encode semantically meaningful information? (or are they just noise)
Question 2: If category labels are missing, can we predict the category label using traffic patterns?
Question 3: Can we find groups of venues that are semantically correlated by traffic pattern? (please refer to the paper)
Research Questions
•22.5 million unique check-ins, and 53.5% are from Foursquare.
•Data Format:• check-in ={user, text, location, time}
Gathering Check-ins
Gathering Venues
•20 million venues crawled from Foursquare
•From venue information, we can extract user-generated tags so that we can compare the traffic pattern based method with the typical method using text similarities.
Gathering Venues
•Among the 12.7 million venue analyzed by our the best-effort parser, 7.8% (1.0m) are tagged, and 56.4% (7.1m) are voluntarily categorized
43.6% of category info are missing!
•For each venue, a daily traffic pattern and a weekly traffic pattern are generated, and they are measured by corresponding Frequency Functions (FT):
FT = (ft1, ft2, …, ftn), in which fti is the frequency for time unit ti over the whole series of period T.
Exploring Semantic Correlation Between Venues
Daily Traffic Pattern for Walmart Weekly Traffic Pattern for Walmart
Recall Examples of traffic patterns for Walmart
Exploring Semantic Correlation Between Venues
•Apply Temporal Correlation to measure similarity between traffic patterns, and their frequency functions:
Low value of temporal correlation High value of temporal correlation
Exploring Semantic Correlation Between Venues
Top Pairs with Highest Correlation Correlation
Target -- Borders 0.949
Walgreens -- CVS Pharmacy 0.947
Staples -- Apple Store 0.946
Subway -- Jason's Deli 0.945
Starbucks -- Caribou Coffee 0.944
•Selected 271 venues with densest check-ins, and calculated pair-wise correlation between their daily traffic patterns.
Bottom Pairs with Lowest Correlation Correlation
Texas Roadhouse -- US Post Office 0.010
Hampton Inn by Hilton -- Discount Tire Co -0.278
Shell Station -- Denny’s -0.364
Dairy Queen -- Einstein Bros Bagels -0.379
White Castle -- Dunkin’ Donuts -0.653
Exploring Semantic Correlation Between Venues
•Groups of venues that are highly similar in terms of traffic pattern.
An natural idea to follow the work here is to see whether we can cluster venues according to the traffic patterns (refer to the paper).
Exploring Semantic Correlation Between Venues
Pair Temporal Correlation
McDonald’s – Bank of America ATM 0.890
McDonald’s – Verizon Wireless 0.883
Church’s Chicken – Victoria’s Secret 0.859
•Does it work all the time?
•So far, we only used daily traffic patterns. Will weekly traffic pattern helps?
T-Correlation0.564
Supervised Venue Categorization
•Recall that category information are missing for 43.6% of venues.
•Categorizing un-labeled venue is a crucial step towards traffic-based location search.
•In the experiments, we generated two sets of features:• daily, and weekly traffic patterns;• and text features from user generated tags
•Dataset:• Training Set: the aforementioned 271 venues set• Test Set: Sets of venues with a pre-specified minimum
number of check-ins.
Supervised Venue Categorization10-Fold Cross Validation on Training Data
1NN works best with traffic pattern feature.
Supervised Venue Categorization10-Fold Cross Validation on Training Data
Naïve Bayes works best with text features.
Supervised Venue Categorization
Metric Naïve Bayes 1NN AdaBoostM1 SimpleCart
F1-Measure 0.866 0.765 0.66 0.75
Traffic Pattern + Vector Space Models plus Chi-Square Feature Selection
10-Fold Cross Validation on Training Data
Feature selection and combination of both features help reach the best performance overall for categorization.
Supervised Venue Categorization
Min # of Check-ins 10 30 50 100 200 300 500
# of Venues 1392 983 695 353 142 60 21
Number of Venues in Test Sets
Impacts of Sparsity of Data
Evaluation with best-performing classifier 1NN
Worth mentioning, we do not use text features in evaluation due to data sparsity.
•So far, we observed that:• Traffic patterns generated from check-ins @ location
sharing services do reveal semantic correlation among related venues
[Chien WWW 2005; Ciaramita WWW 2008; Golder 2007]
• Traffic patterns can be used to accurately predict the semantic category of uncategorized locations with a F1-Measure of 0.8
Scenarios for Traffic-Based Local Search
•A group of people are looking for off-peak (i.e., quiet) restaurants around 7PM to have dinner and talk.
Brunch Restaurants Sandwich Shops
Bar & Grill
Scenarios for Traffic-Based Local Search
•Jerry is looking for a place which is similar in traffic comparing to Williams Park to play basketball.
Recommended Locations
Takeaways
•Traffic patterns do reveal semantic information.
•Traffic patterns can help accurately predict the semantic category of uncategorized locations.
•Traffic patterns and the category information can be naturally incorporated into traditional location-based search. [Markowetz 2005; McCurley WWW 2001; Watters JASIST 2003]
Future Work
•Incorporate new models of human dynamics based on location sharing services [Noulas ICWSM 2011; Yang WSDM 2011; Ye SIGSPATIAL 2010]
•Explore personalized traffic-based local search by considering users’ profiles
•Explore possibilities for activity-based local search.
Conclusion
Thanks!Questions?
For Paper + Slides + Dataset:http://infolab.tamu.edu/data/