23
Likes and Locations Adventure in Social Data Mining Gene Chuang Exec Dir of Social Eng, ATTi Masahji Stewart Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA

Likes and Locations - Adventure in Social Data Mining

Embed Size (px)

Citation preview

Page 1: Likes and Locations - Adventure in Social Data Mining

Likes and LocationsAdventure in Social Data Mining

Gene Chuang – Exec Dir of Social Eng, ATTi

Masahji Stewart – Founder, Synctree

Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA

Page 2: Likes and Locations - Adventure in Social Data Mining

Dedication

Page 3: Likes and Locations - Adventure in Social Data Mining

Background

Page 4: Likes and Locations - Adventure in Social Data Mining
Page 5: Likes and Locations - Adventure in Social Data Mining

Social Local Mobile Loco

Page 6: Likes and Locations - Adventure in Social Data Mining

Why Mine Social and Local Data?

• Signals to improve user experience

• Timely and “Placely”

• Engagement

• Provide value – save time, save money

• Opt In, Privacy

Page 7: Likes and Locations - Adventure in Social Data Mining

Yp.com Infrastructure

• Ruby on Rails for Web, Login and API

• Solr/Lucene for Search

• Hadoop for Data pipeline

• Hive for Ad Hoc queries on Hadoop

• Ruby ETL scripts

Page 8: Likes and Locations - Adventure in Social Data Mining

Oauth 2

• Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens

• Think Valet Key

Page 9: Likes and Locations - Adventure in Social Data Mining

YP.com Login/Registration

Page 10: Likes and Locations - Adventure in Social Data Mining

Login Layer

A

Page 11: Likes and Locations - Adventure in Social Data Mining

Oauth 2 Dance

Page 12: Likes and Locations - Adventure in Social Data Mining

Semi-Social Search

Page 13: Likes and Locations - Adventure in Social Data Mining
Page 14: Likes and Locations - Adventure in Social Data Mining
Page 15: Likes and Locations - Adventure in Social Data Mining

Social Mining - ExtractExtract Script

Pull data out of a database (like Oracle), Hive, Files, hit Facebook,or any other source and output JSON data to STDOUT:

For example to get count of the total users signed up by day:$ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14{"day":"2011-02-14","count":891,"total":1328636}{"day":"2011-02-15","count":1088,"total":1329724}{"day":"2011-02-16","count":1016,"total":1330740}{"day":"2011-02-17","count":1359,"total":1332099}{"day":"2011-02-18","count":1143,"total":1333242}{"day":"2011-02-19","count":660,"total":1333902}{"day":"2011-02-20","count":597,"total":1334499}{"day":"2011-02-21","count":874,"total":1335373}

Page 16: Likes and Locations - Adventure in Social Data Mining

Social Mining - Transform

Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT

For example, to add ypids to existing facebook likes then filter out location and ypidmatching data:

$ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_match ypids ypid_match_results id{"name":"Snuggle Bunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]}{"name":"Associate Construction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"}{"name":"PH Bistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"}{"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}

Page 17: Likes and Locations - Adventure in Social Data Mining

Social Mining - LoadLoad

Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard)

For example loading total facebook accounts by day into the web dashboard$ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total

Page 18: Likes and Locations - Adventure in Social Data Mining
Page 19: Likes and Locations - Adventure in Social Data Mining
Page 20: Likes and Locations - Adventure in Social Data Mining

Location Real-Time Fuzzy MatcherFP0 (exact match)

Append LISTING_NAME + ADDRESS + CITY + PHONETokenize, normalize, strip punctuation, and stemAppend tokens

FP3 (fuzzy match)

Append LISTING_NAME + ADDRESS + CITY + PHONETokenize, normalize, strip punctuation, and stemRemove tokens that are less than 2 chars longRemove upper-case short tokens (i.e., MD, CPA, DDS, etc)Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokensOrder tokens alphabeticallyAppend tokens

Example:Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710

FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai

Page 21: Likes and Locations - Adventure in Social Data Mining

Social Data

• Valid Facebook Access Tokens: 14K

• Total Unique Likes: 300K

• % Likes with Locations and/or Phones: 19%

• % Likes mapped to YPID: 38%

• Total Check-Ins: 530

Page 22: Likes and Locations - Adventure in Social Data Mining

Social Mining Mother Lode

• Social Search

• Local Recommendation Engine

• Discovery Wall

• Top 10 List

• Social e-Commerce

• Online Presence Management – Social CRM

Page 23: Likes and Locations - Adventure in Social Data Mining

Questions?

[email protected]

• http://www.twitter.com/genechuang

• http://www.quora.com/Gene-Chuang

• http://www.linkedin.com/in/genechuang