32
CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traf c-Driven Location-Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas A&M University

CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Embed Size (px)

Citation preview

Page 1: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

CIKM 2011 :: Glasgow :: Oct 25, 2011

Toward Traffic-Driven Location-Based Web Search

Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas A&M University

Page 2: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Location-Based Search (aka Local Search)

20% of queries on Google are related to location (100 millions / day)Mobile search is taking off …

Page 3: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Sushi

Sorted byContent-Based Relevance:

Page 4: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Sushi

Sorted by Distance:

Page 5: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Sushi

Sorted by User Rating:

Page 6: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Sushi

Sorted by On-line Reputation(PageRank scores):

Page 7: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

However,The real-world trails of human activityare missing!

•For example, what sushi places are “hot” in terms of traffic?•Which ones are relatively less “packed” at noon?•What places do people “like me” go to?

Page 8: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

• Our focus: Location sharing services

Page 9: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

3400% in 2010!

Page 10: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Over 1 billion in 2011!

Page 11: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Incorporating Check-in Data to Local Search

•How do we model the real-world trails of human activity from the check-ins?

• Our proposal: venue-based traffic pattern based on the timestamps of the check-ins @ the venue.

• Besides the traffic patterns, category label for locations will help refine the search quality.

Page 12: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Express “Traffic” for a Location

Daily Traffic Pattern for Walmart Weekly Traffic Pattern for Walmart

For check-ins from a specific location:• bucket the timestamps of the check-ins into 24 hours to

generate a daily traffic pattern;• and bucket the timestamps into 7 days to generate a weekly

traffic pattern.

Page 13: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Question 1: Do traffic patterns encode semantically meaningful information? (or are they just noise)

Question 2: If category labels are missing, can we predict the category label using traffic patterns?

Question 3: Can we find groups of venues that are semantically correlated by traffic pattern? (please refer to the paper)

Research Questions

Page 14: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

•22.5 million unique check-ins, and 53.5% are from Foursquare.

•Data Format:• check-in ={user, text, location, time}

Gathering Check-ins

Page 15: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Gathering Venues

•20 million venues crawled from Foursquare

•From venue information, we can extract user-generated tags so that we can compare the traffic pattern based method with the typical method using text similarities.

Page 16: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Gathering Venues

•Among the 12.7 million venue analyzed by our the best-effort parser, 7.8% (1.0m) are tagged, and 56.4% (7.1m) are voluntarily categorized

43.6% of category info are missing!

Page 17: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

•For each venue, a daily traffic pattern and a weekly traffic pattern are generated, and they are measured by corresponding Frequency Functions (FT):

FT = (ft1, ft2, …, ftn), in which fti is the frequency for time unit ti over the whole series of period T.

Exploring Semantic Correlation Between Venues

Daily Traffic Pattern for Walmart Weekly Traffic Pattern for Walmart

Recall Examples of traffic patterns for Walmart

Page 18: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Exploring Semantic Correlation Between Venues

•Apply Temporal Correlation to measure similarity between traffic patterns, and their frequency functions:

Low value of temporal correlation High value of temporal correlation

Page 19: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Exploring Semantic Correlation Between Venues

Top Pairs with Highest Correlation Correlation

Target -- Borders 0.949

Walgreens -- CVS Pharmacy 0.947

Staples -- Apple Store 0.946

Subway -- Jason's Deli 0.945

Starbucks -- Caribou Coffee 0.944

•Selected 271 venues with densest check-ins, and calculated pair-wise correlation between their daily traffic patterns.

Bottom Pairs with Lowest Correlation Correlation

Texas Roadhouse -- US Post Office 0.010

Hampton Inn by Hilton -- Discount Tire Co -0.278

Shell Station -- Denny’s -0.364

Dairy Queen -- Einstein Bros Bagels -0.379

White Castle -- Dunkin’ Donuts -0.653

Page 20: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Exploring Semantic Correlation Between Venues

•Groups of venues that are highly similar in terms of traffic pattern.

An natural idea to follow the work here is to see whether we can cluster venues according to the traffic patterns (refer to the paper).

Page 21: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Exploring Semantic Correlation Between Venues

Pair Temporal Correlation

McDonald’s – Bank of America ATM 0.890

McDonald’s – Verizon Wireless 0.883

Church’s Chicken – Victoria’s Secret 0.859

•Does it work all the time?

•So far, we only used daily traffic patterns. Will weekly traffic pattern helps?

T-Correlation0.564

Page 22: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Supervised Venue Categorization

•Recall that category information are missing for 43.6% of venues.

•Categorizing un-labeled venue is a crucial step towards traffic-based location search.

•In the experiments, we generated two sets of features:• daily, and weekly traffic patterns;• and text features from user generated tags

•Dataset:• Training Set: the aforementioned 271 venues set• Test Set: Sets of venues with a pre-specified minimum

number of check-ins.

Page 23: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Supervised Venue Categorization10-Fold Cross Validation on Training Data

1NN works best with traffic pattern feature.

Page 24: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Supervised Venue Categorization10-Fold Cross Validation on Training Data

Naïve Bayes works best with text features.

Page 25: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Supervised Venue Categorization

Metric Naïve Bayes 1NN AdaBoostM1 SimpleCart

F1-Measure 0.866 0.765 0.66 0.75

Traffic Pattern + Vector Space Models plus Chi-Square Feature Selection

10-Fold Cross Validation on Training Data

Feature selection and combination of both features help reach the best performance overall for categorization.

Page 26: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Supervised Venue Categorization

Min # of Check-ins 10 30 50 100 200 300 500

# of Venues 1392 983 695 353 142 60 21

Number of Venues in Test Sets

Impacts of Sparsity of Data

Evaluation with best-performing classifier 1NN

Worth mentioning, we do not use text features in evaluation due to data sparsity.

Page 27: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

•So far, we observed that:• Traffic patterns generated from check-ins @ location

sharing services do reveal semantic correlation among related venues

[Chien WWW 2005; Ciaramita WWW 2008; Golder 2007]

• Traffic patterns can be used to accurately predict the semantic category of uncategorized locations with a F1-Measure of 0.8

Page 28: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Scenarios for Traffic-Based Local Search

•A group of people are looking for off-peak (i.e., quiet) restaurants around 7PM to have dinner and talk.

Brunch Restaurants Sandwich Shops

Bar & Grill

Page 29: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Scenarios for Traffic-Based Local Search

•Jerry is looking for a place which is similar in traffic comparing to Williams Park to play basketball.

Recommended Locations

Page 30: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Takeaways

•Traffic patterns do reveal semantic information.

•Traffic patterns can help accurately predict the semantic category of uncategorized locations.

•Traffic patterns and the category information can be naturally incorporated into traditional location-based search. [Markowetz 2005; McCurley WWW 2001; Watters JASIST 2003]

Page 31: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Future Work

•Incorporate new models of human dynamics based on location sharing services [Noulas ICWSM 2011; Yang WSDM 2011; Ye SIGSPATIAL 2010]

•Explore personalized traffic-based local search by considering users’ profiles

•Explore possibilities for activity-based local search.

Page 32: CIKM 2011 :: Glasgow :: Oct 25, 2011 Toward Traffic-Driven Location- Based Web Search Zhiyuan Cheng, James Caverlee, Krishna Kamath, and Kyumin Lee Texas

Conclusion

Thanks!Questions?

For Paper + Slides + Dataset:http://infolab.tamu.edu/data/