Probabilistic Broadcast Presented by Keren Censor 1

Probabilistic BroadcastPresented by Keren Censor

1

Traditional client-server model Central point of failure Performance bottleneck Heavy load on servers

2

Peer-to-Peer (P2P) Central point of failure Performance bottleneck Heavy load on servers

3

Information Dissemination Deterministic solutions

Flooding – send a message to every neighbor #Messages = O(#edges) Time = diameter

Deterministic routing – send according to a spanning tree Non-resilient to failures Time = O(#nodes)

4

Requirements Reliable broadcast

Reach all nodes Resilient to failures

Considering: Dynamic network topology Crashes Disconnections Packet losses

5

Random information spreading Trade reliability with scalability

Algorithm may be less reliable, but should scale well with system size

Basic gossip algorithm Forward information to a randomly chosen subset

of your neighbors. Design parameters: Buffer capacity B Fan-out F Number of times a message is forwarded T

6

Previous algorithms First developed for consistency management

of replicated database [Demers et al. 1987]

Reliability in Bimodal Multicast [Birman et al. 1999]:

The set of nodes that a message reaches is Almost all of the nodes, with high probability Almost none of the nodes, with small probability Other subsets, with vanishingly small probability

7

Design constraints Membership

Knowledge of the participants Network awareness

Knowledge of real network topology Buffer management

Memory usage Message filtering

According to different interests

8


Knowledge of the participants in the system Previous algorithms assume this knowledge Problems:

Storage increases linearly with system size n Maintenance imposes extra load

9


Solution: integrate membership with gossip and maintain partial view Uniformity: how to gossip to members chosen

uniformly at random from the entire system Adaptivity: some parameter must grow with system

size, how do we estimate the system size? Bootstraping: how is the system initialized?

10

Design constraints Network awareness

Knowledge of real network topology Problem: a message sent by p to a nearby q may be

routed through a remote w

p wq

11

Network awareness Solution: organize processes in a hierarchy that

reflects the network topology Distributed? Fault tolerant? Scalable?

Design constraints

12

Related approaches

Directional gossip [Lin and Marzullo, 1999]: a weight is given for each neighbor, according to its connectivity. A higher probability is given for neighbors with less weight

11Pr nBlue

Pr Green

in each round

grows with each round

13

Design constraints Buffer management

Memory usage Problem: limited buffers. When buffer is full:

Drop new messages? Drop old messages?

In Bimodal Multicast: a message is gossiped by a node for a limited number of rounds, and then it is erased

14

Design constraints Buffer management

Solutions: Age-based priorities Application semantics

Elaboration later

15

Message filtering According to different interests Problem: redundancy if there are topics of interest

How does a process know the interest of its neighbors? Assume magically that this info is available, does p

decide not to send to q a message it is not interested in?

Solution: hierarchy of processes

Design constraints

p wq

What if w is interested?

16

LPBCAST Lightweight Probabilistic Broadcast

[Eugster, Guerraoui, Handurukande, Kouznetsov, and Kermarrec, 2003]

Main contribution: Scalable memory consumption for

Membership management Message buffering

17

Model Set of processes П = {p1 , p2 , …} Synchronous rounds Complete logical network

LPBCAST has partial views

Application Application

LPBCASTLPBCAST

18

Buffers Event notifications – events Event notifications identifiers – eventIds Unsubscriptions – unSubs Subscriptions – subs and view

Each buffer L has a maximum size |L|max

Truncation of L: removing random elements so that |L|≤|L|max

view

eventIds

unSubs

subs

events

19

Receiving event from application Upon LPCAST(e)

Add e to events

e

eventsApplication

LPBCASTe

20

Gossiping Periodically (gossip period T ms) generate a

message and send to F (fanout) members chosen randomly from view

view

F random elements

Gossipmessage

eventIds

unSubs

subs

events = Ø

∪{pi}

21

Gossip reception Unsubscriptions:

Remove from view and subs Add to unSubs and truncate

view unSubssubs

pi pipi

pj pjpj

unSubsunSubs

random elements

22

Gossip reception Subscriptions:

Add to view and subs Truncate view into subs Truncate subs

view subs

pipi

pjpj

viewview subssubssubs

random elementsrandom elements

23

Gossip reception Events:

Deliver new event notifications to application Add to eventIds and events, and truncate If received id not in eventIds then add to

retrieveBuf

eeventIdsevents

Application

LPBCAST e e.id

retrieveBuf

id

Keeps ids of all delivered events

24

Retrieving events If > k rounds passed since an eventId was

inserted into retrieveBuf, and the matching event was not yet received: Ask for the event from the process q from whom

eventId was received. If no reply for r rounds: Ask for the event from a random neighbor. Ask for the event from the source of the event.

25

Subscriptions and unsubscriptions Subscribe: pi subscribes by some known pj , which

gossips this subscription. Gossip messages will start reaching pi , otherwise it subscribes again

Unsubscribe: have timestamps after which unsubscriptions become obsolete

Subscriptions are gossiped continuously to insure uniformly distributed views: a failed process will be removed from all views with high probability

26

Analysis – Assumptions n processes, П is constant Latency is smaller than gossip period T Failures are stochastically independent:

Probability of a message lost ≤ ε Number of crashes ≤ f

Event notification identifiers are unique Pr f

ncrash

27

Analysis – Distribution of views Assume each p has an independent uniformly

distributed random view of size l In round r: In round r+1:

For l << |subs|maxF , this is estimated by l/(n-1)

1Pr lnp view

max1 | | 1 1Pr 1l l l l

n subs F l n np view

p in view p not removed

p not in view p enters view

28

Analysis – Event propagation pr = Pr[p receives certain gossip message] ≥

sr,e = #processes that received event e by round r

Markov chain: pij = Pr[sr+1=j | sr=i] =

1 11 1 1 1l F Fn l n

p in view p is chosen message not lost p doesn’t crash

Doesn’t depend on l

( )(1 )

0

i j i i n jn iq q j i

j i

j i

q=1-pr

29

Analysis – Event propagation Markov chain: pij = Pr[sr+1=j | sr=i] =

Distribution of sr:

( )(1 )

0

i j i i n jn iq q j i

j i

j i

q=1-pr

0

1 1Pr[ ]

0 1

js j

j

1Pr[ ] Pr[ ]r r iji j

s j s i p

30

Analysis – Gossip rounds #rounds decreases as the fanout F increases Claim: #rounds increases logarithmically with

system size n Compare: diameter of a random graph is O(logn)

View size l does not influence #rounds But we needed to assume l << |subs|maxF , so the

subs buffer pays the price?

31

Analysis – Partitioning

Pr[partition of size i]=

For constant n: decreases as l increases For constant l: decreases as n increases

1 1

( , , )1 1

i n ii n i

n l li n l

n ni

l l

•In one set: i views include only the other i-1,

• In the other set: n-i views include only the other n-i-1

32

Analysis – PartitioningPr[no partition up to round r]=

Decreases very slowly with r Design: can choose l as a function of some required

probability of not partitioning In practice, add a hierarchy – a set of processes that

are always known to everyone

1 /2

( , , ) 1 ( , , )

r

l i n

n l r i n l

33

Age-based message purging Replaces the randomly truncating of the

events buffer Each event e has an age parameter

Initialized to 0 by the application Incremented by every gossiping processes

While |events|>|events|max

Remove smallest id events from the same source Remove oldest events, according to their age

34

Age-based message purging Evaluation:

Delivery ratio: ratio between number of messages delivered and number of messages sent per round.

Redundancy: fraction of same messages received by a certain process in a given round

Throughput: as a function of stability – delivered by 90% of the processes

Fault tolerance: delivery ratio in the presence of faults

35

Frequency-based membership purging With random truncating, a new member has

the same probability of being removed as a well known member

Each element in subs has a frequency parameter Incremented each time it is received

Truncating: avg = average frequency in list1. Choose random element from list2. If its frequency > k·avg , then remove this element3. Otherwise, increment its frequency and goto 1

36

Frequency-based membership purging Evaluation:

Propagation delay: number of informed processes as a function of the round number

Membership management: number of times membership information about a process is seen by others. Measured on processes removed from the subs buffer

37

References Epidemic Algorithms for Replicated Database

Maintenance [Demers et al. 1987]

Bimodal Multicast [Birman et al. 1999]

Directional Gossip: Gossip in a Wide Area Network [Lin and Marzullo 1999]

Lightweight Probabilistic Broadcast [Eugster et al. 2003]

38

Thank you :)

39

Documents

Probabilistic Broadcast Presented by Keren Censor 1