CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Trafﬁc-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

CIKM 2011 :: Glasgow :: Oct 25, 2011

Toward Traffic-Driven Location-Based Web Search

Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas A&M University

Location-Based Search (aka Local Search)

20% of queries on Google are related to location (100 millions / day)Mobile search is taking off …

Sushi

Sorted byContent-Based Relevance:

Sushi

Sorted by Distance:

Sushi

Sorted by User Rating:

Sushi

Sorted by On-line Reputation(PageRank scores):

However,The real-world trails of human activityare missing!

•For example, what sushi places are “hot” in terms of traffic?•Which ones are relatively less “packed” at noon?•What places do people “like me” go to?

• Our focus: Location sharing services

3400% in 2010!

Over 1 billion in 2011!

Incorporating Check-in Data to Local Search

•How do we model the real-world trails of human activity from the check-ins?

• Our proposal: venue-based traffic pattern based on the timestamps of the check-ins @ the venue.

• Besides the traffic patterns, category label for locations will help refine the search quality.

Express “Traffic” for a Location

Daily Traffic Pattern for Walmart Weekly Traffic Pattern for Walmart

For check-ins from a specific location:• bucket the timestamps of the check-ins into 24 hours to

generate a daily traffic pattern;• and bucket the timestamps into 7 days to generate a weekly

traffic pattern.

Question 1: Do traffic patterns encode semantically meaningful information? (or are they just noise)

Question 2: If category labels are missing, can we predict the category label using traffic patterns?

Question 3: Can we find groups of venues that are semantically correlated by traffic pattern? (please refer to the paper)

Research Questions

•22.5 million unique check-ins, and 53.5% are from Foursquare.

•Data Format:• check-in ={user, text, location, time}

Gathering Check-ins

Gathering Venues

•20 million venues crawled from Foursquare

•From venue information, we can extract user-generated tags so that we can compare the traffic pattern based method with the typical method using text similarities.

Gathering Venues

•Among the 12.7 million venue analyzed by our the best-effort parser, 7.8% (1.0m) are tagged, and 56.4% (7.1m) are voluntarily categorized

43.6% of category info are missing!

•For each venue, a daily traffic pattern and a weekly traffic pattern are generated, and they are measured by corresponding Frequency Functions (FT):

FT = (ft1, ft2, …, ftn), in which fti is the frequency for time unit ti over the whole series of period T.

Exploring Semantic Correlation Between Venues

Daily Traffic Pattern for Walmart Weekly Traffic Pattern for Walmart

Recall Examples of traffic patterns for Walmart


•Apply Temporal Correlation to measure similarity between traffic patterns, and their frequency functions:

Low value of temporal correlation High value of temporal correlation


Top Pairs with Highest Correlation Correlation

Target -- Borders 0.949

Walgreens -- CVS Pharmacy 0.947

Staples -- Apple Store 0.946

Subway -- Jason's Deli 0.945

Starbucks -- Caribou Coffee 0.944

•Selected 271 venues with densest check-ins, and calculated pair-wise correlation between their daily traffic patterns.

Bottom Pairs with Lowest Correlation Correlation

Texas Roadhouse -- US Post Office 0.010

Hampton Inn by Hilton -- Discount Tire Co -0.278

Shell Station -- Denny’s -0.364

Dairy Queen -- Einstein Bros Bagels -0.379

White Castle -- Dunkin’ Donuts -0.653


•Groups of venues that are highly similar in terms of traffic pattern.

An natural idea to follow the work here is to see whether we can cluster venues according to the traffic patterns (refer to the paper).


Pair Temporal Correlation

McDonald’s – Bank of America ATM 0.890

McDonald’s – Verizon Wireless 0.883

Church’s Chicken – Victoria’s Secret 0.859

•Does it work all the time?

•So far, we only used daily traffic patterns. Will weekly traffic pattern helps?

T-Correlation0.564

Supervised Venue Categorization

•Recall that category information are missing for 43.6% of venues.

•Categorizing un-labeled venue is a crucial step towards traffic-based location search.

•In the experiments, we generated two sets of features:• daily, and weekly traffic patterns;• and text features from user generated tags

•Dataset:• Training Set: the aforementioned 271 venues set• Test Set: Sets of venues with a pre-specified minimum

number of check-ins.

Supervised Venue Categorization10-Fold Cross Validation on Training Data

1NN works best with traffic pattern feature.

Supervised Venue Categorization10-Fold Cross Validation on Training Data

Naïve Bayes works best with text features.


Metric Naïve Bayes 1NN AdaBoostM1 SimpleCart

F1-Measure 0.866 0.765 0.66 0.75

Traffic Pattern + Vector Space Models plus Chi-Square Feature Selection

10-Fold Cross Validation on Training Data

Feature selection and combination of both features help reach the best performance overall for categorization.


Min # of Check-ins 10 30 50 100 200 300 500

# of Venues 1392 983 695 353 142 60 21

Number of Venues in Test Sets

Impacts of Sparsity of Data

Evaluation with best-performing classifier 1NN

Worth mentioning, we do not use text features in evaluation due to data sparsity.

•So far, we observed that:• Traffic patterns generated from check-ins @ location

sharing services do reveal semantic correlation among related venues

[Chien WWW 2005; Ciaramita WWW 2008; Golder 2007]

• Traffic patterns can be used to accurately predict the semantic category of uncategorized locations with a F1-Measure of 0.8

Scenarios for Traffic-Based Local Search

•A group of people are looking for off-peak (i.e., quiet) restaurants around 7PM to have dinner and talk.

Brunch Restaurants Sandwich Shops

Bar & Grill

Scenarios for Traffic-Based Local Search

•Jerry is looking for a place which is similar in traffic comparing to Williams Park to play basketball.

Recommended Locations

Takeaways

•Traffic patterns do reveal semantic information.

•Traffic patterns can help accurately predict the semantic category of uncategorized locations.

•Traffic patterns and the category information can be naturally incorporated into traditional location-based search. [Markowetz 2005; McCurley WWW 2001; Watters JASIST 2003]

Future Work

•Incorporate new models of human dynamics based on location sharing services [Noulas ICWSM 2011; Yang WSDM 2011; Ye SIGSPATIAL 2010]

•Explore personalized traffic-based local search by considering users’ profiles

•Explore possibilities for activity-based local search.

Conclusion

Thanks!Questions?

For Paper + Slides + Dataset:http://infolab.tamu.edu/data/

Documents

CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Trafﬁc-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas