22
Reliable and Efficient Facebook data processing Andres Buritica Socal Piggies 2013-01-17

Reliable and Efficient Facebook data processing

Embed Size (px)

DESCRIPTION

Talk for the socal piggies meetup 2013-01-17 at Telesign's office.

Citation preview

Page 1: Reliable and Efficient Facebook data processing

Reliable and Efficient Facebook data

processing

Andres BuriticaSocal Piggies2013-01-17

Page 2: Reliable and Efficient Facebook data processing

http://thelinuxkid.com

Python developer

Facebook Graph API experience

Ubernear

FounderDating

Page 3: Reliable and Efficient Facebook data processing

Meta

Draws on lessons from Ubernear

Based on traversal of public nodes in Graph

Page 4: Reliable and Efficient Facebook data processing

Use case

Pages or users (owners) with public events

Event discovery

Page 5: Reliable and Efficient Facebook data processing

Flow

Owner IDsUpdate ownersUpdate owner events

Partial eventsExpire eventsEvent details

Complete events

Page 6: Reliable and Efficient Facebook data processing

Why?

Separation of concerns

Parallel processing with separate user/apps/servers

Less data load on batch requests

Want to store all data

Page 7: Reliable and Efficient Facebook data processing

Update owners

Check for migrated owners○ (#21) Page ID <old_id> was migrated to page ID

<new_id>. Please update your API calls to the new ID

Move migrated owners to separate table

Add new owners

Page 8: Reliable and Efficient Facebook data processing

Update owner events

Events for owners not checked since datetime

Stop at last event previously collected

Page 9: Reliable and Efficient Facebook data processing

Expire events

end_time has passed move to another table

False○ Data might exist

Should have returned False?○ [100] Unsupported get request○ Data might exist

Page 10: Reliable and Efficient Facebook data processing

Expire events

Alias not found○ (#803) Some of the aliases you requested do not

exist...

Page 11: Reliable and Efficient Facebook data processing

Event details

Skip completed (no refreshing)

Transient errors (retry)○ None○ OAuthException...Error validating application○ (#1) "Unknown error occured" ○ "(#2) Service temporarily unavailable"○ "(#4) User request limit reached" (throttle)○ "(#4) Application request limit reached" (throttle)○ "(#17) User request limit reached" (throttle)

Page 12: Reliable and Efficient Facebook data processing

Datetimes

All in ISO-8601

Events○ date_format modifier has no effect○ timezones after "Events Timezone Migration"○ is_date_only○ legacy without timezone

Page 13: Reliable and Efficient Facebook data processing

Batch requests

POST

50 requests in one

Large or complex can time out

Nested calls count towards rate limiting

One top level access token, many nested

Page 14: Reliable and Efficient Facebook data processing

Batch request example

User's profile and 50 friends

batch=[{ "method":"GET", "relative_url":"me" }, { "method":"GET", "relative_url":"me/friends?limit=50"}]

Page 15: Reliable and Efficient Facebook data processing

Batch dependencies

Reference results of a previous operation

JSONPath

Child operation executed after parent

Parent returns None unless forced

Page 16: Reliable and Efficient Facebook data processing

Batch dependencies example

Get details of 5 friends

batch=[{ "method":"GET", "name":"get-friends", "relative_url":"me/friends?limit=5", }, { "method":"GET", "relative_url":"?ids={result=get-friends:$.data.*.id}"}]

Page 17: Reliable and Efficient Facebook data processing

Field expansion

GET

"join" multiple graph queries into a single call

Replacement for FQL

fields, connections, modifiers and identifiers

No JSONPath

Page 18: Reliable and Efficient Facebook data processing

Field expansion example

User's name and birthday plus id and picture of the last 10 photos

/me? fields= name, birthday, photos.limit(10).fields(id,picture)

Page 19: Reliable and Efficient Facebook data processing

Batch request with field expansions

User's profile and picture link of 10 photos

batch=[{ "method":"GET", "relative_url":"me" }, { "method":"GET", "relative_url":"me?fields=photos.limit(10).fields(picture)"}]

Page 20: Reliable and Efficient Facebook data processing

Process

Facebook is constantly improving

Testing crucial

Beta tier

Page 21: Reliable and Efficient Facebook data processing

Not covered

Refreshing data

Pagination

Throttling

Insights

Page 22: Reliable and Efficient Facebook data processing

Thanks

Sample code at http://ubernear.com

Questions

http://thelinuxkid.com