Upload
sen-xu
View
142
Download
0
Embed Size (px)
Citation preview
3 Easy Ways To Reach Financial Freedom:
How Twitter use Geo to win Advertising
Sen XuSIGSpatial 2016 MELT Workshop
Mobile Entity Localization, Tracking and Analysis
• Step one– Use a catching title
• Twitter has more than 284 million monthly active users. (October 2014)• 500 million Tweets are sent per day, or 1 billion every ~2 days. (August
2013)• More than 300 billion Tweets have been sent since company founding in
2006. (October 2013)
• TPS record: one-second peak of 143,199 Tweets per second, in Japan (August 2013)
• 80% of our active users are mobile users. (October 2014)• 40% of our active users simply consume content on Twitter. • Twitter supports 35 different languages. (March 2013)• 77% of Twitter accounts are outside the U.S. (October 2013)
• Content Generation– How to create features that make users want to share private information with us?
• How to get user to turn on locationservice?• How to collect user birthday?
– Import third party data: data plumbing– Data Correctness/Legal Issue/Disputed Territory
• Monetization– Features for the other side: advertisers– Targeting: Geo, Age, Interest, Behavior (Follow/Following)
• Service/Technology (AKA How to make your service faster)– QA your data source– Tech infra (Geohash-based Reverse-geocoding)
How to create features so attractive that users are willing to share data
Content Generation
Targeting criteria: Geo, Bio, BehaviorAds Analytics
Features for the other side: advertisers
Nielson DMA
SelectionCleaning/Plumbing
Service
Dealing with third party data
QA Vendor data is absolute necessary
Spot the difference?
PlaceType: COUNTRY PlaceType: TOWN
More interesting (potentially dangerous) insights:
Pitney Bowes Geometry (conflated) Zipcode:
United States Mexico
Same PlaceType in different country may have different coverage
Territory in Dispute
Territory in Dispute
DefinitionPlumbingCleaning
Geo Data Pipeline Infra
19
In the Geo Stack…
–A place has id, names, attributes, parents, and geography•place_id: unique u64 id•name: one place may have multiple names, but only one preferred name•Attributes (annotations): open-ended key-value store for custom attributes. For POI, address, phone, URL, twitter, existence, etc.•parents: upper administrative level, e.g., in US, City’s closest parent is State (Admin1). Or determined by geometry containment, e.g., POI can have Neighborhood as parent if it contained by it.•geography: point (for POI), polygon/multi-polygon (for all other place types). line geometry
Place:
Glossary
POI: Point of Interest. Using a point (lat,lon) as a simplified representation of places, common POIs are restaurant, landmarks, parks, and dentist offices*
*although POIs can all be interesting/useful under certain occasions, some will be more interesting than others for geotagging purposes.
Pitney Bowes:Factual:
Polygonal data vendor (188 countries)
POI data vendor (49 countries)
What kind of data do we need for a fully-fledged Geo Service?
Service Required Data Set Rockdove GeoduckGeocoding(text to lat/lon)Reverse-Geocoding(lat/lon to
text)
• Popular Geopolitical names and geometry (e.g., Neighborhood, City, State, Country)
Unresolved merge of 13 different data source of various data quality
Pitney Bowes
• Polygonal data for specific marketing needs
Unlicensed simplified geometries
Nielson
• Useful, High quality POI
UGC… Factual
IP reverse-lookup
IP blocks to lat/lon or Place (confidence)
NetAcuity NetAcuity with User modeling
22
User generate places (e.g., Mom’s basement)Overlaps within the same PlaceType (data bug!)
Historically…rockdove allows
23
• Geometries within each PlaceType do not overlap against each other• Keep Reverse-Geocoding (RGC) Trie sane
• Maintain Rockdove ID• Historically geo-tagged Tweets will display correctly (deleted)
• Reuse Rockdove ID and update with geometry• Historically geotagged “New York City” tweets will be related to the same PlaceID, with updated geometry and attributes
Requirement for Geoduck
Geoduck Data Pipleine (v1)
25
Data Pipeline
26
• Duplicate places coming from different vendors with slightly different name and geometry• Simple Solution: For each incoming place, find potential candidates
(name-match, Levenshtein distance) then validate using geometry
Conflation Challenges
from O(N log N) to O(1)
Reverse-Geocoding
Geohash
Geometry/Geography Input
Output Data Structure
Transform into Geohash with precision set
arbitrarily (e.g., precision = 7)
Geohash-based Reverse-Geocoding
Geohash-based Reverse geocoding
• What would happen when user don’t share GPS?– IP: NetAcuity, MaxMind, NeoStar– DIY?• Blacklist• Whitelist• Requires polygons
Mapping Uber’s Future: Uber Maps is Hiring
*https://newsroom.uber.com/mapping-ubers-future/
“Over the past decade mapping innovation has disrupted industries and changed daily life in ways I couldn’t have imagined when I started. That progress will only accelerate in the coming years especially with technologies like self-driving cars. I remain excited by the prospect of how maps can put the world at our fingertips, improve everyday life, impact billions of people and enable innovations we can’t even imagine today.
”--Brian McClendon, VP of Engineering, Uber
Twitter: @alex_senxuWechat: senxu_alexEmail: [email protected]