Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries

Embed Size (px)

DESCRIPTION

#2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries Over 300 million active users More than 2 32 photos … 100 million search queries per day > 3.9 trillion feed actions processed per day 2 billion pieces of content per week 6 billion minutes per day

Citation preview

Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries Over 300 million active users More than 2 32 photos 100 million search queries per day > 3.9 trillion feed actions processed per day 2 billion pieces of content per week 6 billion minutes per day Growth Rate M Active Users Social Networks nikos | METIS | OSNs are popular! OSNs have become wildly popular over last few years, FB > 800M, Twitter > 230M etc. Distributed across the planet Changed how content is created + consumed: inherently long-tailed as only friends are interested Explosion of smartphones: -Photos/HD videos easy to shoot and share Scaling Social Networks Much harder than typical websites where... Typically 1-2% online: easy to cache the data Partitioning & scaling relatively easy What do you do when everything is interconnected? name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo System Architecture Overall achitecture: Facebook Facebook has 2 datacenters, 1 per coast reads spread across both writes only to W. Coast; periodically (~10 minutes) replicated to E. Coast >2000 MySQL servers, >25TB RAM for memcached Challenge: inconsistency due to stale data I change status message => Friends on East Coast datacenter dont see change for 10 min What if E.Coast person changes own status?? 10 Web at 100 feet: georeplication & CDNs Source: How Facebook Works,Technology Review, Jul/Aug Architecture Database (slow, persistent) Load Balancer (assigns a web server) Web Server (PHP assembles data) Memcache (fast, simple) Simple in-memory hash table Supports get/set,delete,multiget, multiset Not a write-through cache Pros and Cons The Database Shield! Low latency, very high request rates Can be easy to corrupt, inefficient for very small items Memcache Multithreading and efficient protocol code - 50k req/s Polling network drivers - 150k req/s Breaking up stats lock - 200k req/s Batching packet handling - 250k req/s Breaking up cache lock - future Memcache Optimization Network Incast Many Small Get Requests Memcache Switch PHP Client Memcache Switch PHP Client Many big data packets Network Incast Memcache Switch PHP Client Network Incast Memcache Switch PHP Client Network Incast Memcache 3 Objects PHP Client 3 round trips total1 round trip per server 4 Objects Memcache 3 Objects Memcache Clustering ScribeScribeScribe ScribeScribeScribe ScribeScribeScribe Thousands of MySQL servers in two datacenters MySQL has played a role from the beginning Photos Photos + Social Graph = Awesome! Photos: Scale 20 billion photos x4 = 80 billion Would wrap around the world more than 10 times! Over 40M new photos per day 600K photos / second Photos Scaling - The easy wins Upload tier - handles uploads, scales images, stores on NFS Serving tier: Images served from NFS via HTTP However... File systems are not good at supporting large number of files Metadata too large to fit in memory causing too many IOs for each file read Limited by I/O not storage density Easy wins CDN Cachr (http server + caching) NFS file handle cache