45
Distributed Algorithms Epidemic Dissemination Alberto Montresor University of Trento, Italy 2014/10/07 Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 1 / 45

Distributed Algorithms Epidemic Dissemination

  • Upload
    iam

  • View
    8

  • Download
    0

Embed Size (px)

DESCRIPTION

Rumor Mongering NSA Echelon Castellani Hulsey Operation Babbles Bittner, Lott, Schwaller Testing Pulikkathara, Silva, Shlansky, MAFIA Babbles

Citation preview

Page 1: Distributed Algorithms Epidemic Dissemination

Distributed AlgorithmsEpidemic Dissemination

Alberto Montresor

University of Trento, Italy

2014/10/07

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 1 / 45

Page 2: Distributed Algorithms Epidemic Dissemination

Contents1 Introduction

MotivationSystem modelSpecificationsModels of epidemics

2 AlgorithmsBest-effortAnti-entropyRumor-mongeringDeletion and death certificatesProbabilistic broadcast

3 Bibliography4 What’s next

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 2 / 45

Page 3: Distributed Algorithms Epidemic Dissemination

Introduction Motivation

Data dissemination

Problem

Application-level broadcast/multicast is an important buildingblock to create modern distributed applications

I Video streamingI RSS feeds

Want efficiency, robustness, speed when scalingI Flooding (reliable broadcast) is robust, but inefficient (O(n2))I Tree distribution is efficient (O(n)), but fragileI Gossip is both efficient (O(n log n)) and robust, but has relative

high latency

Scalability:I Total number of messages sent by all nodesI Number of messages sent by each of the nodes

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 3 / 45

Page 4: Distributed Algorithms Epidemic Dissemination

Introduction Motivation

Trade-offs

Trade-Offs

Tree Flood

Gossip

efficient

fast

robust

Figure: Courtesy: Robbert van Renesse

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 4 / 45

Page 5: Distributed Algorithms Epidemic Dissemination

Introduction Motivation

Introduction

Solution

Reverting to a probabilistic approach based on epidemics / gossip

Nodes infect each other trough messages

Total number of messages is less than O(n2)

No node is overloaded

But:

No deterministic guarantee on reliability

Only probabilistic ones

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 5 / 45

Page 6: Distributed Algorithms Epidemic Dissemination

Introduction Motivation

History of the Epidemic/Gossip Paradigm

First defined by Alan Demers et al. (1987)

Protocols based on epidemics predate Demers’ paper (e.g. NNTP)

’90s: gossip applied to the information dissemination problem

’00s: gossip beyond dissemination

2006: Workshop on the future of gossip (Leiden, the Netherlands)

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 6 / 45

Page 7: Distributed Algorithms Epidemic Dissemination

Introduction Motivation

Reality check (1987)

XEROX Clearinghouse Servers

Database replicated at thousands of nodes

Heterogeneous, unreliable network

Independent updates to single elements of the DB are injected atmultiple nodes

Updates must propagate to all nodes or be supplanted by laterupdates of the same element

Replicas become consistent after no more new updates

Assuming a reasonable update rate, most information at any givenreplica is “current”

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 7 / 45

Page 8: Distributed Algorithms Epidemic Dissemination

Introduction Motivation

Reality check (today)

Amazon uses a gossip protocol to quickly spread informationthroughout the S3 system

Amazon’s Dynamo uses a gossip-based failure detection service

The basic information exchange in BitTorrent is based on gossip

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 8 / 45

Page 9: Distributed Algorithms Epidemic Dissemination

Introduction System model

Basic assumptions

System is asynchronousI No bounds on messages and process execution delays

Processes fail by crashingI stop executing actions after the crashI We do not consider Byzantine failures

Communication is subject to benign failuresI Message omissionI No message corruption, creation, duplication

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 9 / 45

Page 10: Distributed Algorithms Epidemic Dissemination

Introduction System model

Data model

We consider a database that is replicated at a set of n nodesΠ = {p1, . . . , pn}

The copy of the database at node pi can be represented by atime-varying partial function:

valuei : K → V · T

where:I K is the set of keysI V is the set of valuesI T is the set of timestamps

The update operation is formalized as: valuei(k)← (v, now())where now() returns a globally unique timestamp

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 10 / 45

Page 11: Distributed Algorithms Epidemic Dissemination

Introduction Specifications

Database Specification

The goal of the update distribution process is to drive the systemtowards consistency:

Definition (Eventual consistency)

If no new updates are injected after some time t, eventually all correctnodes will obtain the same copy of the database:

∀pi, pj ∈ Π,∀k ∈ K : valuei(k) = valuej(k)

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 11 / 45

Page 12: Distributed Algorithms Epidemic Dissemination

Introduction Specifications

Probabilistic Broadcast Specification

An alternative view

PB1 – Probabilistic validity

There is a given probability such that for any two correct nodes p andq, every message PB-broadcast by p is eventually PB-delivered by q

PB2 - Integrity

Every message is PB-delivered by a node at most once, and only if itwas previously PB-broadcast

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 12 / 45

Page 13: Distributed Algorithms Epidemic Dissemination

Introduction Specifications

Best-Effort Broadcast vs Probabilistic Broadcast

Similarities:

No agreement property

Differences:

Probability in BEB is “hidden” in process life-cycle

Probability in PB is explicit

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 13 / 45

Page 14: Distributed Algorithms Epidemic Dissemination

Introduction Specifications

Some simplifying assumptions

Every node knows Π (the communication network is a full graph)

Communication costs between nodes are homogeneous

We assume there is a single entry in the databaseI We simplify notation (valuei(k)→ valuei)I Managing multiple keys is more complex than adding

foreach k ∈ K

To be relaxed later...

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 14 / 45

Page 15: Distributed Algorithms Epidemic Dissemination

Introduction Models of epidemics

Models of epidemics

Definition (Epidemiology)

Epidemiology studies the spread of a disease or infection in terms ofpopulations of infected/uninfected individuals and their rates of change

Why?

To understand if an epidemic will turn into a pandemic

To adopt countermeasures to reduce the rate of infectionI InoculationI Isolation

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 15 / 45

Page 16: Distributed Algorithms Epidemic Dissemination

Introduction Models of epidemics

SIR Model

Definition (SIR Model – Kermack and McKendrick, 1927)

An individual p can be:

Susceptible: if p is not yet infected by the disease

Infective: if p is infected and capable to spread the disease

Removed: if p has been infected and has recovered from the disease

Susceptible Infective Removed

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 16 / 45

Page 17: Distributed Algorithms Epidemic Dissemination

Introduction Models of epidemics

SIR Model

How does it work?

Initially, a single individual is infective

Individuals get in touch with each other, spreading the disease

Susceptible individuals are turned into infective ones

Eventually, infective individuals will become removed

http://en.wikipedia.org/wiki/File:Sirsys-p9.png

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 17 / 45

Page 18: Distributed Algorithms Epidemic Dissemination

Introduction Models of epidemics

SIR is not the only model...

Definition (SIRS Model)

The SIR model plus temporary immunity, so recovered nodes maybecome susceptible again.

Susceptible Infective Removed

Definition (SEIRS Model)

The SEIRS model takes into consideration the exposed or latent periodof the disease

Susceptible Infective RemovedExposed

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 18 / 45

Page 19: Distributed Algorithms Epidemic Dissemination

Introduction Models of epidemics

From Epidemiology to Distributed Systems

The idea

Disease spread quickly and robustly

Our goal is to spread an update as fast and as reliable as possible

Can we apply these ideas to distributed systems?

Definition (SIR Model for Database replication)

Susceptible: if p has not yet received an update

Infective: if p holds an update and is willing to share it

Removed: if p has the update but is no longer willing to share it

Note: Rumor spreading, or gossiping, is based on the same principles

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 19 / 45

Page 20: Distributed Algorithms Epidemic Dissemination

Algorithms

Algorithm – Summary

Best effort

Anti-entropy (simple epidemics)I PushI PullI Push-pull

Rumor mongering (complex epidemics)I PushI PullI Push-pull

Probabilistic broadcast

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 20 / 45

Page 21: Distributed Algorithms Epidemic Dissemination

Algorithms Best-effort

Best-effort

Best-effort (direct mail) algorithm

Notify all other nodes of an update as soon as it occurs.

When receiving an update, check if it is “news”

Direct mail protocol executed by process pi:

upon valuei ← (v, now()) doforeach pj ∈ Π do

send 〈update, valuei〉 to pj

upon receive 〈update, (v, t)〉 doif valuei.time < t then

valuei ← (v, t)

Not randomized nor epidemic algorithm: just the simplestAlberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 21 / 45

Page 22: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Anti-entropy

Anti-entropy: Algorithm

Every node regularly chooses another node at random andexchanges database contents, resolving differences.

Nodes are eitherI susceptible – they know the updateI infective – they know the update

Anti-entropy protocol executed by process pi:

repeat every ∆ time unitsProcess pj ← random(Π) % Select a random neighbor{ exchange messages to resolve differences }

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 22 / 45

Page 23: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Anti-entropy: graphical representation

Rounds

During a round of length ∆

every node has the possibility of contacting one random node

can be contacted by several nodes

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 23 / 45

Page 24: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Resolving differences – Summary

pi

PUSH, v, tp

j

pi

pj

pi

pj

PULL, pi

REPLY, v, t

PUSHPULL, pi, v, t

REPLY, v, t

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 24 / 45

Page 25: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Resolving differences – Push

Anti-entropy, Push protocol executed by process pi:

repeat every ∆ time unitsProcess pj ← random(Π) % Select a random neighborsend 〈push, valuei〉 to pj

upon receive 〈push, (v, t)〉 doif valuei.time < t then

valuei ← (v, t)

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 25 / 45

Page 26: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Resolving differences – Pull

Anti-entropy, Pull protocol executed by process pi:

repeat every ∆ time unitsProcess pj ← random(Π) % Select a random neighborsend 〈pull, pi, valuei.time〉 to pj

upon receive 〈pull, pj , t〉 doif valuei.time > t then

send 〈reply, valuei〉 to pj

upon receive 〈reply, (v, t)〉 doif valuei.time < t then

valuei ← (v, t)

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 26 / 45

Page 27: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Resolving differences – Push-Pull

Anti-entropy, Push-Pull protocol executed by process pi:

repeat every ∆ time unitsProcess pj ← random(Π) % Select a random neighborsend 〈pushpull, pi, valuei〉 to pj

upon receive 〈pushpull, pj , (v, t)〉 doif valuei.time < t then

valuei ← (v, t)else if valuei.time > t then

send 〈reply, valuei〉 to pj

upon receive 〈reply, (v, t)〉 doif valuei.time < t then

valuei ← (v, t)

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 27 / 45

Page 28: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Analytical results

Definition (Compartmental model analysis)

We want to evaluate convergence of the protocol based on the size ofthe populations of susceptible and infected nodes (compartments)

st: the probability of a node being susceptible after t anti-entropyrounds

it = 1− st: the probability of a node being infective after tanti-entropy rounds

Probability at round k + 1

Initial condition: s0 = n−1n

Pull: E[st+1] = s2t

Push: E[st+1] = st(1− 1n)(1−st)n ≈ ste

−(1−st)

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 28 / 45

Page 29: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Analytical results

Definition (Termination time)

Let Sn be the first cycle where st = 0 in a population of n individuals

Push: Pittel (1987) shows that in probability,Sn = log n + lnn + O(1)

Push-pull: Karp et al. (2000) shows that in probability,Sn = log log n

Relevant bibliography:

B. Pittel. On spreading a rumor. SIAM Journal of Applied Mathematics,47(1):213–223, 1987.

R. Karp, C. Schindelhauer, S. Shenker, B. Vocking. Randomized rumorspreading. In Proc. the 41st Symp. on Foundations of Computer Science, 2000.

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 29 / 45

Page 30: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Analytical results

In summary:All methods converge to 0, but pull and push-pull are much more rapid

1e-50

1e-40

1e-30

1e-20

1e-10

1

0 5 10 15 20

st

Rounds

Pull

Push

Figure: n = 10 000

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 30 / 45

Page 31: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Comments

Benefits

Simple epidemic eventually “infects” all the population

It is extremely robust

Drawbacks

Propagates updates much slower than direct mail (best effort)

Requires examining contents of database even when most dataagrees, so it cannot practically be used too often

Normally used as support to best effort/rumor mongering, i.e. leftrunning in the background

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 31 / 45

Page 32: Distributed Algorithms Epidemic Dissemination

Algorithms Anti-entropy

Working with multiple values

Examples of techniques to compare databases:

Maintain checksum, compare databases if checksums unequal

Maintain recent update lists for time T , exchange lists first

Maintain inverted index of database by timestamp; exchangeinformation in reverse timestamp order, incrementally re-computechecksums

Notes:

Those ideas apply to databases

Strongly application-dependent

We will see how anti-entropy may be used beyond informationdissemination

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 32 / 45

Page 33: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Rumor mongering in brief

Nodes initially “susceptive”

When a node receives a new update it becomes a “hot rumor” andthe node “infective”

A node that has a rumor periodically chooses randomly anothernode to spread the rumor

Eventually, a node will “lose interest” in spreading the rumor andbecomes “removed”

I Spread too many timesI Everybody knows the rumor

A sender can hold (and transmit) a list of infective updates ratherthan just one

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 33 / 45

Page 34: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Rumor mongering: loss of interest

When: Counter vs coin (random)I Coin (random): lose interest with probability 1/kI Counter: lose interest after k contacts

Why: Feedback vs blindI Feedback: lose interest only if the recipient knows the rumor.I Blind: lose interest regardless of the recipient.

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 34 / 45

Page 35: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Analytical results

Question

How fast does the system converge to a state where all nodes are notinfective? (inactive state)

Compartmental analysis again – Feedback, coinLet s, i and r denote the fraction of susceptible, infective, and removednodes, respectively. Then:

s + i + r = 1

ds/dt = –si

di/dt = +si–1

k(1–s)i

Solving the differential equations:

s = e−(k+1)(1−s)

Increasing k increases the probability that the nodes get the rumor

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 35 / 45

Page 36: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Quality measures

Definition (Residue)

The nodes that remain susceptible when the epidemic ends: valueof s when i = 0.

Residue must be as small as possible

Definition (Traffic)

The average number of database updates sent between nodes

m = total update traffic / # of nodes

Definition (Convergence)

tavg : average time it takes for all nodes to get an update

tlast : time it takes for the last node to get the update

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 36 / 45

Page 37: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Simulation results

© Alberto Montresor 25

Simulation results

17.412.54.530.0113

17.512.75.640.00364

17.712.86.680.00125

16.912.13.300.0372

16.811.01.740.1761

tlasttavg

ConvergenceTrafficm

Residues

Counterk

32152.820.0603

3214.13.910.0214

3213.84.950.0085

33171.590.2052

38190.040.9601

tlasttavg

ConvergenceTrafficm

Residues

Counterk

Using feedback and counter

Using blind and random

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 37 / 45

Page 38: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Push vs pull

Push (what we have assumed so far)I If the database becomes quiescent, push stops trying to introduce

updates

PullI If many independent updates, pull is more likely to find a source

with a non-empty rumor listI But if the database is quiescent, several update requests are wasted

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 38 / 45

Page 39: Distributed Algorithms Epidemic Dissemination

Algorithms Rumor-mongering

Rumor Mongering + Anti-Entropy

Quick & Dirty Push Rumor MongeringI spreads updates fast with low traffic.I however, there is still a nonzero probability of nodes remaining

susceptible after the epidemic

Not-so-fast Push-Pull Anti-EntropyI can be run (infrequently) in the background to ensure all nodes

eventually get the update with probability 1

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 39 / 45

Page 40: Distributed Algorithms Epidemic Dissemination

Algorithms Deletion and death certificates

Dealing with deletions

Deletion

We cannot delete an entry just by removing it from a node - theabsence of the entry is not propagated

If the entry has been updated recently, there may still be anupdate traversing the network!

Definition (Death certificate)

Replace the deleted item with a death certificate that has a timestampand spreads like an ordinary update.

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 40 / 45

Page 41: Distributed Algorithms Epidemic Dissemination

Algorithms Deletion and death certificates

Dealing with deletions

Problem

We must, at some point, delete DCs or they may consume significantspace

Strategy 1: retain each DC until all nodes have received itI requires a protocol to determine which nodes have it and to handle

node failures

Strategy 2: hold DCs for some time (e.g. 30d) and discard themI pragmatic approachI we still have the “resurrection” problemI increasing the time requires more space

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 41 / 45

Page 42: Distributed Algorithms Epidemic Dissemination

Algorithms Deletion and death certificates

Dormant certificates

Observation:

we can delete very old DCs but retain only a few “dormant” copiesin some nodes

if an obsolete update reaches a dormant DC, it is “awakened” andre-propagated

Analogy with epidemiology:

the awakened DC is like an antibody triggered by an immunereaction

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 42 / 45

Page 43: Distributed Algorithms Epidemic Dissemination

Algorithms Probabilistic broadcast

Epidemics vs probabilistic broadcast

Anti-entropy:DB is a collection of message received

Rumor mongeringUpdates ≡ Message broadcast

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 43 / 45

Page 44: Distributed Algorithms Epidemic Dissemination

Bibliography

Reading material

A. J. Demers, D. H. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. E.Sturgis, D. C. Swinehart, and D. B. Terry.

Epidemic algorithms for replicated database maintenance.

In Proc. of the 6th annual ACM Symposium on Principles of DistributedComputing Systems (PODC’87), pages 1–12, 1987.

http://www.disi.unitn.it/~montreso/ds/papers/demers87.pdf.

M. Jelasity.

Gossip.

In Self-organising Software, pages 139–162. Springer Berlin Heidelberg, 2011.

http://www.disi.unitn.it/~montreso/ds/papers/gossip11.pdf.

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 44 / 45

Page 45: Distributed Algorithms Epidemic Dissemination

What’s next

What’s next?

Problems

Membership:How do processes get to know each other, and how many do theyneed to know?

Network awareness:How to make the connections between processes reflect the actualnetwork topology such that the performance is acceptable?

Buffer management:Which updates to drop when the storage buffer of a process is full?

Does not end here...

Epidemic/gossip approach has been successfully applied to otherproblems. We will provide a brief overview of them.

Alberto Montresor (UniTN) DS - Epidemic Dissemination 2014/10/07 45 / 45