66
ACM/SIGCOMM 2010 New Delhi, India The Little Engine(s) that could: Scaling Online Social Networks Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang Nikos Laoutaris, Parminder Chhabra and Pablo Rodriguez Telefonica Research, Barcelona

The little engine(s) that could: scaling online social networks

Embed Size (px)

DESCRIPTION

Presentation of the SIGCOMM 2010 paper: "The little engine(s) that could: scaling online social networks". The full paper is available at: http://ccr.sigcomm.org/online/?q=node/642 or at http://research.tid.es/jmps/

Citation preview

Page 1: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

The Little Engine(s) that could: Scaling Online Social Networks

Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang

Nikos Laoutaris, Parminder Chhabra and Pablo Rodriguez

Telefonica Research, Barcelona

Page 2: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Problem

Solution

Implementation

Conclusions

2

Page 3: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Scalability is a pain: the designers’ dilemma

“Scalability is a desirable

property of a system, a network,

or a process, which indicates its

ability to either handle growing

amounts of work in a graceful

manner or to be readily enlarged"

The designers’ dilemma:

wasted complexity and long

time to market

vs.

short time to market but risk

of death by success

3

Page 4: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Towards transparent scalability

The advent of the Cloud: virtualized machinery + gigabit network

Hardware scalability

Application scalability

4

Page 5: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Towards transparent scalability

Elastic resource allocation for the

Presentation and the Logic layer:

More machines when load increases.

Components are stateless, therefore,

independent and duplicable

Components are interdependent,

therefore, non-duplicable on demand.

5

Page 6: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend...

• Full replication

– Load of read requests decrease with the number of servers

– State does not decrease with the number of servers

6

Page 7: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend...

• Full replication

– Load of read requests decrease with the number of servers

– State does not decrease with the number of servers

– Maintains data locality

7

Page 8: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend...

• Full replication

– Load of read requests decrease with the number of servers

– State does not decrease with the number of servers

– Maintains data locality

Operations on the data (e.g. a query) can be resolved in a single

server from the application perspective

Application Logic

Data Store

Application Logic

Data Store

Data Store

Data Store

Centralized Distributed

8

Page 9: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend...

• Full replication

– Load of read requests decrease with the number of servers

– State does not decrease with the number of servers

– Maintains data locality

• Horizontal partitioning or sharding

– Load of read requests decrease with the number of servers

– State decrease with the number of servers

– Maintains data locality as long as…

9

Page 10: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Scaling the data backend...

• Full replication

– Load of read requests decrease with the number of servers

– State does not decrease with the number of servers

– Maintains data locality

• Horizontal partitioning or sharding

– Load of read requests decrease with the number of servers

– State decrease with the number of servers

– Maintains data locality as long as…

… the splits are disjoint (independent)

Bad news for OSNs!!

10

Page 11: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Why sharding is not enough for OSN?

• Shards in OSN can never be disjoint because:

– Operations on user i require access to the data of other users

– at least one-hop away.

• From graph theory: – there is no partition that for all

nodes all neighbors and itself are in the same partition if there is a single connected component.

Data locality cannot be

maintained by partitioning a

social network!!11

Page 12: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

A very active area…

• Online Social Networks are massive systems spanning hundreds of machines in multiple datacenters

• Partition and placement to minimize network traffic and delay

– Karagiannis et al. “Hermes: Clustering Users in Large-Scale Email Services” -- SoCC 2010

– Agarwal et al. “Volley: Automated Data Placement of Geo-Distributed Cloud Services” -- NSDI 2010

• Partition and replication towards scalability

– Pujol et al “Scaling Online Social Networks without Pains” --NetDB / SOSP 2009

12

Page 13: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Problem

Solution

Implementation

Conclusions

13

Page 14: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

14

Page 15: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

Me

15

Page 16: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[]

Inbox

Friends

16

Page 17: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[]

key-101

17

Page 18: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-101]

key-101

18

Page 19: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

19

Page 20: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

Let’s see what’s going onin the world

20

Page 21: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101zz

key-105

key-121

key-146

Let’s see what’s going onin the world

1) FETCH –inbox [from server ]A

21

Page 22: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

Let’s see what’s going onin the world

1) FETCH –inbox [from server ]

R= [key-146, key-121, key-105, key-101]

A

22

Page 23: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

Let’s see what’s going onin the world

1) FETCH –inbox [from server ]

2) FETCH text WHERE key IN R [from

server ]

R= [key-146, key-121, key-105, key-101]

A

?

23

Page 24: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

Let’s see what’s going onin the world

1) FETCH –inbox [from server ]

2) FETCH text WHERE key IN R [from

server ]

Where is the data?

R= [key-146, key-121, key-105, key-101]

A

?

24

Page 25: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s Operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

1) FETCH –inbox [from server ]

2) FETCH text WHERE key IN R [from

server ]

in , then data locality!

R= [key-146, key-121, key-105, key-101]

A

?

A

A AA

A

25

Page 26: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s Operations 101

1 2 3 4

5

6789

0

[key-146, key-121, key-105, key-101]

key-101

key-105

key-121

key-146

1) FETCH –inbox [from server ]

2) FETCH text WHERE key IN R [from

server ]

in , then no

data locality!

R= [key-146, key-121, key-105, key-101]

A

?

E

A FB

B F E

26

Page 27: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

OSN’s operations 101

1) Relational Databases Selects and joins across multiple shards of the database are possible but performance is poor (e.g. MySQL Cluster, Oracle Rack)

2) Key-Value Stores (DHT) More efficient than relational databases: multi-get primitives to transparently fetchdata from multiple servers.

But it’s not a silver bullet:o lose SQL query language =>

programmatic querieso lose abstraction from data operationso suffer from high traffic, eventually

affecting performance:• Incast issue• Multi-get hole• latency dominated by the worse

performing server

27

Page 28: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

28

Page 29: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0

Server 1 Server 2

Users are assigned toservers randomlyRandom partition

(DHT-like)

29

Page 30: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Random Partitioning – DHT

30

Page 31: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Random Partitioning – DHT

31

Page 32: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Random Partitioning – DHT

How do we enforce data locality?

32

Page 33: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Random Partitioning – DHT

How do we enforce data locality?

Replication

33

Page 34: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0

4 7

34

Page 35: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0

4 7

#4 Master + Replica#4 Master + Replica

35

Page 36: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 That does it for user #3, but

what about the others?

4 7

36

Page 37: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 That does it for user #3, but

what about the others?

The very same procedure…4 7

37

Page 38: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads for any user is

guaranteed!

4 7 1

0 5

3 2 8

9 6

After a while…

38

Page 39: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads for any user is

guaranteed!

4 7 1

0 5

3 2 8

9 6

39

Page 40: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads for any user is

guaranteed!

4 7 1

0 5

3 2 8

9 6

40

Page 41: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads for any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

This is full replication!!

41

Page 42: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads for any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

Can we do better?

42

Page 43: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads in any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

We can leverage the underlyingsocial structure for the partition…

43

Page 44: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads in any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

We can leverage the underlyingsocial structure for the partition…

44

Page 45: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads in any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

We can leverage the underlyingsocial structure for the partition…

0 43

9

2

68

5

7

1 Social Partition

45

Page 46: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads in any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

We can leverage the underlyingsocial structure for the partition…

0 43

9

2

68

5

7

1 Social Partition and Replication: SPAR

3 2

46

Page 47: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads in any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

We can leverage the underlyingsocial structure for the partition…

0 43

9

2

68

5

7

1 Social Partition and Replication: SPAR

… only two replicas to achieve data

locality

3 2

47

Page 48: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Maintaining data locality

1 2 3 45

6789

0

1 23

4

5

67

8

9

0 Data locality for reads in any user is

guaranteed!

Random partition + Replication can lead

to high replication overheads!4 7 1

0 5

3 2 8

9 6

We can leverage the underlyingsocial structure for the partition…

0 43

9

2

68

5

7

1 Social Partition and Replication: SPAR

… only two replicas to achieve data

locality

3 2

48

Page 49: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

SPAR Algorithm from 10000 feet

“All partition algorithms are equal, but some are more equal than others”

49

Page 50: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

SPAR Algorithm from 10000 feet

“All partition algorithms are equal, but some are more equal than others”

1) Online (incremental) a) Dynamics of the SN (add/remove node, edge)b) Dynamics of the System (add/remove server)

2) Fast (and simple) a) Local informationb) Hill-climbing heuristicc) Load-balancing via back-pressure

3) Stable a) Avoid cascades

4) Effective a) Optimize for MIN_REPLICA (i.e. NP-Hard)b) While maintaining a fixed number of replicas per

user for redundancy (e.g. K=2)

50

Page 51: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Looks good on paper, but...

Real OSN data

• Twitter – 2.4M users, 48M edges,

12M tweets for 15 days (Twitter as of Dec08)

• Orkut (MPI)– 3M users, 223M edges

• Facebook (MPI)– 60K users, 500K edges

Partition algorithms

• Random (DHT)

• METIS

• MO+

• modularity optimization + hack

• SPAR online

... let’s try it for real

51

Page 52: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated?

Twitter 16 partitions Twitter 128 partitions

Rep. overhead % over K=2 Rep. overhead % over K=2

SPAR 2.44 +22% 3.69 +84%

MO+ 3.04 +52% 5.14 +157%

METIS 3.44 +72% 8.03 +302%

Random 5.68 +184% 10.75 +434%

52

Page 53: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated?

Twitter 16 partitions Twitter 128 partitions

Rep. overhead % over K=2 Rep. overhead % over K=2

SPAR 2.44 +22% 3.69 +84%

MO+ 3.04 +52% 5.14 +157%

METIS 3.44 +72% 8.03 +302%

Random 5.68 +184% 10.75 +434%

2.44 replicas per user (on average)2 to guarantee redundacy

+ 0.44 to guarantee data locality (+22%)

2.44

53

Page 54: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated?

Twitter 16 partitions Twitter 128 partitions

Rep. overhead % over K=2 Rep. overhead % over K=2

SPAR 2.44 +22% 3.69 +84%

MO+ 3.04 +52% 5.14 +157%

METIS 3.44 +72% 8.03 +302%

Random 5.68 +184% 10.75 +434%

Replication with random partitioning, too costly!

54

Page 55: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

How many replicas are generated?

Twitter 16 partitions Twitter 128 partitions

Rep. overhead % over K=2 Rep. overhead % over K=2

SPAR 2.44 +22% 3.69 +84%

MO+ 3.04 +52% 5.14 +157%

METIS 3.44 +72% 8.03 +302%

Random 5.68 +184% 10.75 +434%

Other social based partitioning algorithmshave higher replication overhead than SPAR.

55

Page 56: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Problem

Solution

Implementation

Conclusions

56

Page 57: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

57

Page 58: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

Application, 3-tiers58

Page 59: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

Application, 3-tiers

Application, 3-tiersSPAR Middleware

(data-store specific)

59

Page 60: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

The SPAR architecture

Application, 3-tiers

Application, 3-tiersSPAR Middleware

(data-store specific)

SPAR Controller : Partiton manager and

Directory Service

60

Page 61: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

SPAR in the wild

• Twitter clone (Laconica, now Statusnet)

– Centralized architecture, PHP + MySQL/Postgres

• Twitter data

– Twitter as of end of 2008, 2.4M users, 12M tweets in 15 days

• The little engines

– 16 commodity desktops: Pentium Duo at 2.33Ghz, 2GB RAM connected with Gigabit-Ethernet switch

• Test SPAR on top of:

– MySQL (v5.5)

– Cassandra (v0.5.0)

61

Page 62: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

SPAR on top of MySQL

• Different read request rate (application level):– 1 call to LDS, 1 join to on the notice and notice inbox to get the last 20

tweets and its content

– User is queried only once per session (4min)

– Spar allows for data to be partitioned, improving both query times and cache hit ratios.

– Full replication suffers from low cache hit ratio and large tables (1B+)

MySQL Full-Replication SPAR + MySQL

99th < 150ms 16 req/s 2500 req/s

62

Page 63: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

SPAR on top of Cassandra

Network bandwidth is not an issue but Network I/O and CPU I/O are:

– Network delays is reduced• Worse performing server produces

delays

– Memory hit ratio is increased• Random partition destroys correlations

•Different read request rate (application level):

Vanilla Cassandra

SPAR + Cassandra

99th < 100ms 200 req/s 800 req/s

63

Page 64: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Problem

Solution

Implementation

Conclusions

64

Page 65: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

Conclusions

SPAR provides the means

to achieve transparent

scalability

1) For applications using

RDBMS (not necessarily

limited to OSN)

2) And, a performance

boost for key-value stores

due to the reduction of

Network I/O

65

Page 66: The little engine(s) that could: scaling online social  networks

ACM/SIGCOMM 2010 – New Delhi, India

[email protected]

http://research.tid.es/jmps/

@solso

Thanks!

66