38
1 Kannan Muthukkaruppan & Karthik Ranganathan Jun/20/2013 How Big Data Technologies Power Facebook How Big Data Technologies Power Facebook Karthik Ranganathan September, 2013

Datacenter@Night: How Big Data Technologies Power Facebook

Embed Size (px)

DESCRIPTION

Karthik Ranganathan, ehemaliger Lead-Ingenieur bei Facebook, und heute bei NUTANIX beschäftigt erklärt in seiner Präsentation moderne Datencenter anhand des Business Use Case Facebook.

Citation preview

Page 1: Datacenter@Night: How Big Data Technologies Power Facebook

1

Kannan Muthukkaruppan & Karthik RanganathanJun/20/2013

How Big Data Technologies Power Facebook

How Big Data Technologies Power FacebookKarthik RanganathanSeptember, 2013

Page 2: Datacenter@Night: How Big Data Technologies Power Facebook

2

Introduction

Email: [email protected]: @KarthikRCurrent: Member of Technical Staff, NutanixBackground: Technical Engineering Lead at Facebook. Co-built Cassandra for Facebook Inbox Search and improved performance and resiliency of Hbase for Facebook Messages and Search Indexing.

Page 3: Datacenter@Night: How Big Data Technologies Power Facebook

3

Agenda

Big data at Facebook HBase use cases

• OLTP• Analytics

Operating at scale The Nutanix solution

Page 4: Datacenter@Night: How Big Data Technologies Power Facebook

4

Big Data at Facebook

OLTP• User databases (MySQL)• Photos (Haystack)

• Facebook Messages, Operational Data Store (HBase) Warehouse

• Hive Analytics• Graph Search Indexing

Page 5: Datacenter@Night: How Big Data Technologies Power Facebook

5

HBase in a nutshell

Apache project, modeled after BigTable Distributed, large scale data store Built on top of Hadoop DFS (HDFS) Efficient at random reads and writes

Page 6: Datacenter@Night: How Big Data Technologies Power Facebook

6

FB’s Largest Hbase Application

Facebook Messages

Page 7: Datacenter@Night: How Big Data Technologies Power Facebook

7

The New Facebook Messages

Page 8: Datacenter@Night: How Big Data Technologies Power Facebook

8

Why HBase?

Evaluated a bunch of different options• MySQL, Cassandra, building a custom storage system for

messages

Horizontal Scalability Automatic failover and load balancing Optimized for write-heavy workloads HDFS already battle-tested at Facebook HBase’s strong consistency model

Page 9: Datacenter@Night: How Big Data Technologies Power Facebook

9

Quick stats (as of Nov 2011)

Traffic to HBase• Billions of messages per day• 75B+ rpc’s per day

Usage pattern• 55% reads, 45% writes• Average write: 16 KV’s to multiple CF’s

Page 10: Datacenter@Night: How Big Data Technologies Power Facebook

10

Data Sizes

7PB+ online data• ~21PB with replication• LZO compressed• Excludes backups

Growth rate• 500TB+ per month• ~20PB of raw disk per year!

Page 11: Datacenter@Night: How Big Data Technologies Power Facebook

11

Growing with size

Constant need of features with growth Read and write path improvements

• Performance optimizations• IOPS reduction• New database file format

Intelligent data and compute placement• Shard level block placement• Locality based load-balancing

Page 12: Datacenter@Night: How Big Data Technologies Power Facebook

12

Other OLTP use cases of HBase

Operational Data Store Multi-tenant KeyValue store Site integrity – fighting spam

Page 13: Datacenter@Night: How Big Data Technologies Power Facebook

13

Warehouse use cases of HBase

Graph Search Indexing• Complex application logic• Multiple verticals

Hive over HBase• Realtime data ingest• Enables real-time analytics

Page 14: Datacenter@Night: How Big Data Technologies Power Facebook

14

Real-time monitoring and anomaly detection

Operational Data Store

Page 15: Datacenter@Night: How Big Data Technologies Power Facebook

15

ODS: Facebook’s #1 Debugging Tool

Collects metrics from production servers

Supports complex aggregations and transformations

Really well-designed UI

Page 16: Datacenter@Night: How Big Data Technologies Power Facebook

16

Quick stats

Traffic to HBase• 150B+ ops per day

Usage pattern• Heavy reads of recent data• Frequent MR jobs for rollups• TTL to expire older data

Page 17: Datacenter@Night: How Big Data Technologies Power Facebook

17

Real-time Analytics

Facebook Insights

Page 18: Datacenter@Night: How Big Data Technologies Power Facebook

18

Real-time URL/Domain Insights

Deep analytics for websites• Facebook widgets

Massive scale• Billions of URL’s• Millions of increments/sec

Page 19: Datacenter@Night: How Big Data Technologies Power Facebook

19

Detailed Insights

Tracks many metrics• Clicks, likes, shares,

impressions• Referral traffic

Detailed breakdown• Age buckets, gender,

location

Page 20: Datacenter@Night: How Big Data Technologies Power Facebook

20

Controlled Multi-tenancy

Generic KeyValue Store

Page 21: Datacenter@Night: How Big Data Technologies Power Facebook

21

A Multi-tenant solution on HBase

Generic Key-Value store• Multiple apps on the same cluster• Transparent schema design• Simple API

put(appid, key, value)value = get(appid, key)

Page 22: Datacenter@Night: How Big Data Technologies Power Facebook

22

Architecture

HBase

put(appid, key, value)

Memcache

get(appid, key)

ReadWrite

Page 23: Datacenter@Night: How Big Data Technologies Power Facebook

23

Multi-tenancy Issues

Not a self-service model• Each app is reviewed

Global and per-app metrics• Monitor RPCs by type, latencies, errors• Friendly names for apps

If things went wrong• Per-app kill switch

Page 24: Datacenter@Night: How Big Data Technologies Power Facebook

24

Powering FB’s Semantic Search Engine

Graph Search Indexing

Page 25: Datacenter@Night: How Big Data Technologies Power Facebook

25

Framework to build search indexes

Multiple, independent input sources HBase stores document info Output is the search index image

rowKey = document idvalue = terms, document

data

Page 26: Datacenter@Night: How Big Data Technologies Power Facebook

26

Architecture

HBase cluster

Document

source 2Document source 1

MR cluster

…Image files…

Page 27: Datacenter@Night: How Big Data Technologies Power Facebook

27

Do’s and Do-Not’s From Experience

Operating at Scale

Page 28: Datacenter@Night: How Big Data Technologies Power Facebook

28

Design for failures(!)

Architect for failures and manageability No single point of failure

• Killing any process is legit

Minimize manual intervention• Especially for frequent failures

Uptime is important• Rolling upgrades are the norm• Need to survive rack failures

Page 29: Datacenter@Night: How Big Data Technologies Power Facebook

29

Dashboard and Metrics

Single place to graph/report everything RPC calls SLA misses

• Latencies, p99, Errors• Per-request profiling

Cluster and node health Network Utilization

Page 30: Datacenter@Night: How Big Data Technologies Power Facebook

30

Health Checks

Constantly monitor nodes Auto-exclude nodes on failure

• Machine not ssh-able• Hardware failures (HDD failure, etc)• Do NOT exclude on rack failures

Auto-include nodes once repaired Rate limit remediation of nodes

Page 31: Datacenter@Night: How Big Data Technologies Power Facebook

31

In a nutshell…

Use commodity hardware Scaling out is #1 Efficiency is #2

• though pretty close behind scale-out

Design for failures• Frequent failures must be auto handled

Metrics, Metrics, Metrics!

Page 32: Datacenter@Night: How Big Data Technologies Power Facebook

32

Overview through comparison

The Nutanix Solution

Page 33: Datacenter@Night: How Big Data Technologies Power Facebook

33

Nutanix compared with HBase

Evaluated a bunch of different options• MySQL, Cassandra, building a custom storage system for

messages

Horizontal Scalability Just add more nodes to scale out

Automatic failover and load balancing When a node goes down, others take its place automatically Load of node that went down is distributed to many others

Page 34: Datacenter@Night: How Big Data Technologies Power Facebook

34

Nutanix compared with HBase philosophy

Optimized for write-heavy workloads Optimized for virtualized environments Read and write heavy workloads Transparent use of flash to boost perf

HDFS already battle-tested at Facebook Nutanix is also quite battle-tested

HBase’s strong consistency model Nutanix is also strongly consistent

Page 35: Datacenter@Night: How Big Data Technologies Power Facebook

35

Other aspects of Nutanix

Architected for failures and manageability No single point of failure Minimal manual intervention for frequent failures

Uptime is important Rolling upgrades are the norm• Need to survive rack failures

Single place to graph/report everything Prism UI to report and manage the entire cluster

Constantly monitor nodes Auto-exclude nodes on failure

Page 36: Datacenter@Night: How Big Data Technologies Power Facebook

36

In a nutshell about Nutanix…

Runs on commodity hardware Scaling out is #1

Drop in scale out for nodes

Efficiency is #2 Constant work on perf improvements

Design for failures Frequent failures auto handled Alerts in UI for many other states

Metrics, Metrics, Metrics! Prism UI gives insights into the cluster health

Page 37: Datacenter@Night: How Big Data Technologies Power Facebook

37

Questions?

Page 38: Datacenter@Night: How Big Data Technologies Power Facebook

38

Thank You

NUTANIX INC. – CONFIDENTIAL AND PROPRIETARY