Upload
andres-buritica
View
650
Download
1
Embed Size (px)
DESCRIPTION
Talk for the socal piggies meetup 2013-01-17 at Telesign's office.
Citation preview
Reliable and Efficient Facebook data
processing
Andres BuriticaSocal Piggies2013-01-17
http://thelinuxkid.com
Python developer
Facebook Graph API experience
Ubernear
FounderDating
Meta
Draws on lessons from Ubernear
Based on traversal of public nodes in Graph
Use case
Pages or users (owners) with public events
Event discovery
Flow
Owner IDsUpdate ownersUpdate owner events
Partial eventsExpire eventsEvent details
Complete events
Why?
Separation of concerns
Parallel processing with separate user/apps/servers
Less data load on batch requests
Want to store all data
Update owners
Check for migrated owners○ (#21) Page ID <old_id> was migrated to page ID
<new_id>. Please update your API calls to the new ID
Move migrated owners to separate table
Add new owners
Update owner events
Events for owners not checked since datetime
Stop at last event previously collected
Expire events
end_time has passed move to another table
False○ Data might exist
Should have returned False?○ [100] Unsupported get request○ Data might exist
Expire events
Alias not found○ (#803) Some of the aliases you requested do not
exist...
Event details
Skip completed (no refreshing)
Transient errors (retry)○ None○ OAuthException...Error validating application○ (#1) "Unknown error occured" ○ "(#2) Service temporarily unavailable"○ "(#4) User request limit reached" (throttle)○ "(#4) Application request limit reached" (throttle)○ "(#17) User request limit reached" (throttle)
Datetimes
All in ISO-8601
Events○ date_format modifier has no effect○ timezones after "Events Timezone Migration"○ is_date_only○ legacy without timezone
Batch requests
POST
50 requests in one
Large or complex can time out
Nested calls count towards rate limiting
One top level access token, many nested
Batch request example
User's profile and 50 friends
batch=[{ "method":"GET", "relative_url":"me" }, { "method":"GET", "relative_url":"me/friends?limit=50"}]
Batch dependencies
Reference results of a previous operation
JSONPath
Child operation executed after parent
Parent returns None unless forced
Batch dependencies example
Get details of 5 friends
batch=[{ "method":"GET", "name":"get-friends", "relative_url":"me/friends?limit=5", }, { "method":"GET", "relative_url":"?ids={result=get-friends:$.data.*.id}"}]
Field expansion
GET
"join" multiple graph queries into a single call
Replacement for FQL
fields, connections, modifiers and identifiers
No JSONPath
Field expansion example
User's name and birthday plus id and picture of the last 10 photos
/me? fields= name, birthday, photos.limit(10).fields(id,picture)
Batch request with field expansions
User's profile and picture link of 10 photos
batch=[{ "method":"GET", "relative_url":"me" }, { "method":"GET", "relative_url":"me?fields=photos.limit(10).fields(picture)"}]
Process
Facebook is constantly improving
Testing crucial
Beta tier
Not covered
Refreshing data
Pagination
Throttling
Insights
Thanks
Sample code at http://ubernear.com
Questions
http://thelinuxkid.com