24
Poor Man's Social Network Consistently Trade Freshness For Scalability Zhiwu Xie, Jinyang Liu, Herbert Van de Sompel, Johann van Reenen and Ramiro Jordan

Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Poor Man's Social Network Consistently Trade Freshness For Scalability Zhiwu Xie, Jinyang Liu, Herbert Van de Sompel, Johann van Reenen and Ramiro Jordan

Page 2: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

2

Outline •  Scaling feed following •  Algorithm •  Experiment and results •  Conclusions

Page 3: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

B

F

A

C

HJ

I

D

K

GE

Feed Following

3

blah

blah blah

consumer

producer

consumer producer

blah

Feed Following:

blah blah blah blah

Page 4: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Feed Following Scalability Give me the 20 most recent tweets sent by all the people I follow •  Individualized queries •  Fast changing global state •  Partitioning, replication, and caching •  NoSQL: trade consistency for scalability

4

Page 5: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Consistency •  Atomicity, Linearizability, or One-copy

Serializability (1SR)

5

blah blah blah blah Feed Following:

blah blah blah blah

Feed Following:

blah blah blah blah

Time

Page 6: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Retweet Anomaly

6

B

A C

blah

Feed Following:

blah Retweet: blah

Feed Following:

Retweet: blah

Page 7: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

New Approach: TimeMap Query Who have created new tweets during the past scheduled release periods? •  Global time across partitions •  Schedule releasing •  Client-side processing and caching •  Consistently trade freshness for scalability

7

Page 8: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

CAP Theorem •  Preconditioned on the asynchronous network model:

the only way to coordinate the distributed nodes is to pass messages

•  In the partially synchronous model, where global time is assumed to be available, CAP may indeed be simultaneously achievable most of the time

8

Page 9: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Global Time •  “One of the mysteries of the universe is that it is

possible to construct a system of physical clocks which, running quite independently of one another, will satisfy the Strong Clock Condition.” – Time, Clocks and the Ordering of Events in a Distributed System, by Leslie Lamport

9

Page 10: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Scheduled Release Algorithm

10

Who have created new tweets during the past scheduled release periods?

Page 11: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Partitioning: Send A New Tweet

11

0 1 2 3 4

User_id: 0, 5, 10, 15,…

User_id: 1, 6, 11, 16,…

User_id: 2, 7, 12, 17,…

User_id: 3, 8, 13, 18,…

User_id: 4 9, 14, 19,…

Page 12: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Partitioning: TimeMap

12

0 1 2 3 N-1

……

……

Page 13: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Client Side Processing

13

A

If the current time is 1:05:37PM, please tell me who (no matter if I follow any of them or not) have sent new tweets from 1:05:30PM to 1:05:35PM. I’ll figure out by myself if any of these new tweets are relevant to me, and if so, I’ll retrieve these tweets separately by myself.

B

If the current time is 1:05:39PM, please tell me who (no matter if I follow any of them or not) have sent new tweets from 1:05:30PM to 1:05:35PM. I’ll figure out by myself if any of these new tweets are relevant to me, and if so, I’ll retrieve these tweets separately by myself.

Cache!

Page 14: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Staleness vs. Latency

14

Time

How are you? I’m fine (as of 2:00)

1:00 2:00

Time

How were you at 12:55?

I was fine (as of 12:55)

1:00 1:05

Fresh, but 1 hour latency

10 minutes stale but only 5 minutes latency

Page 15: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Trade Freshness For Scalability •  Mass transit system vs. private car •  Lose flexibility, but gain overall

efficiency by sharing resources •  Stale up to the length of the schedule

release period, e.g., 5 seconds.

15

Page 16: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment •  Implemented on AWS •  A Twitter like feed following application •  Server side: Python/Django,

PostgreSQL, PL/pgSQL •  Client side: emulated browser,

implemented in Python/Django and PostgreSQL

16

Page 17: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment: Configurations •  Used ~ 100 cloud instances from Amazon •  Most are used for emulated browsers •  3 to 6 c1.medium as servers •  Use memcached to simulate

caches

17

Page 18: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment: Workload •  Work load similar to the Yahoo! PNUTS experiment •  A following network of ~ 200,000 users •  Synthetic workload generated by Yahoo! Cloud

Serving Benchmark

18

Page 19: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment Result: Query Rate

19

Page 20: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment Result: Latency

20

Page 21: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment Results: Caching

21

Page 22: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Experiment Results: CPU Load

22

Server

Client

Page 23: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Conclusions •  Consistently scale feed following •  Linear scalability •  Practical low cost solution

23

Page 24: Poor Man's Social Network - USENIX · implemented in Python/Django and PostgreSQL 16 . Experiment: Configurations • Used ~ 100 cloud instances from Amazon • Most are used for

Thank You •  Questions?

24