2007/1/15std/lpbcast.ppt1 Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito

2007/1/15 http://www.logos.ic.i.u-tokyo.ac.jp/~std/lpbcast.ppt 1

Lightweight Probabilistic Broadcast

M2 Tatsuya Shirai

M1 Dai Saito


Broadcast in Large Scale Environment

• End users send messages to all other users more frequently.– P2P BBS– Stock markets

• These applications need software broadcast.• Participating processes change more

dynamically compared to processes on servers, – machine crash– login to or logout from applications


Deterministic Broadcast

• Each process transfers messages along defined routes.• This approach provides consistency of message delivery

ordering.– Messages from each process reach in the order that it

sends• Reliability is expressed in “best effort”


Deterministic Broadcast cont.

rate of perturbed processes

• Poor scalability– Single point of failure– Cost of maintaining

routing information• Low reliability at

unstable networks.– Perturbation of few

processes makes performance of healthy processes lower.


Probabilistic Broadcast

• Each process transfers messages to randomly selected processes without using defined routing information.

• Approximate redundancy enhances reliability.• Reliability is relatively high and stable in large

scale and unstable environments.


Pbcast [Kenneth et al. 1999]

• This approach concurrently uses deterministic and probabilistic broadcast.– While network load is low, deterministic broadcast

achieve high reliability and low cost.– While network load is high, probabilistic broadcast

ensure certain reliability, especially of healthy processes.


Deterministic Broadcast

• The first protocol is deterministic broadcast.• It uses IP multicast, or if it is not available, uses

spanning trees randomly composed.– But composing spanning trees needs information of

all membership. So this approach is limited to a few hundred processes, as mentioned in this paper.


Anti-Entropy Protocol

• The second is anti-entropy protocol based on gossip.– In each round, members choose some of other members

randomly, send a summary of their message history digest to the selected processes.

– Processes receive the digest and check the lack of message, and require the lacking message for original sender.

message history

membership info.

digests

digests

message history

message history

lack 5, 8!

lack 3, 9!

3, 9

message 3, message 9

5, 8

message 5, message 8 5, 8

3, 9


Anti-Entropy Protocol cont.

• Message size and fanout, the number of processes to which a process send in one round, define network load of this protocol.

• Message size is limited by message lifetime on each process.– A process send any message for some fixed rounds fr

om initial reception.– After that, the message is gave up.

1 5 8 6 2 4

3 7

5 9 1 5 8 6 2 4

3 7 9

1 5 8 6 2 4


Flow Control

• Flow control while the network load is high.– The rate of pbcast message

s should be limited.• Normally every 100ms.

– Retransmission should delays in some rounds if many other processes require.

digests

digests3, 9


5, 8



Evaluations

• Parameters:– Message loss rate– Fanout, the number of processes

• Reliability:– (infected processes – failed ones) > all ones/2

• for applications based on quorum replication algorithm

• Throughput:– The number of messages a process receives in 1 sec

ond.


Effects of Fanout

• Predicate I shows pbcast.– Message loss rate is 0.05.– Deterministic broadcast reac

hes 10 % of the processes. – 50 processes participate.

• Probability of failure decrease with an increase of the number of fanout to 8. fanout (0~10)


Scalability

• Predicate I shows pbcast.– Message loss rate is 0.0

5.– Deterministic broadcast r

eaches 10 % of the processes.

• Probability of failure decrease with bigger scale.– Though broadcast to all

processes take more rounds

processes (0~60)


Time for broadcast to all processes

• Messages are received in 12 rounds on an average, less than 20 rounds at 1024 processes.– Fanout is 1– Det. broadcast is not used.

• This result shows the means are at O(logN)

rounds (0~20)

16 32 1024 processes


Throughput

• 150 messages are sent in one second.– When message loss happens

frequently fanout is limited to small size.

• Throughput of perturbed processes decreases, but healthy processes avail full throughput.

rate of perturbed processes

deterministic

pbcast


Throughput cont.

• Throughput at 200 msg/sec.– 25 % of the processes pe

rtube 25 % of the time.– Det. broadcast is unuse

d.• High frequency of packe

t loss causes throughput lower.

• In this case, average throughput decreases to 60% at 96 processes at high bandwidth.

loss rate(0 ~ 0.2)

32~96 processes


Conclusion of pbcast

• Gossip based protocol achieves scalability and reliability in general network environments.

• Then, cost of processes are not considered. The next topic is memory management for pbcast.


Membership Management

• Assumption– Each process knows all Members

• memory consumption in large scale• communication required to ensure

the consistency of the Membership

– Problems of Scalability in Large scale environment


Membership Management of lpbcast

• Member Management + Gossip– Each process knows a subset of all Members– Sending messages with Member information– Size limitation of

Membership Management Buffer • Fixed Memory consumption


Memory Management

• The Memory requirement for a process should not change (in large scale)– Buffer of Membership Management– Buffer of outgoing message

→Scalability

• pbcast with a viewpoint of “Memory Consumption”


lpbcast algorithm

• Assumptions– Each process has unique ID– Each message has unique ID (including

process ID) – joining/leaving (= subscribing/unsubscribing)• Buffers

– Events : event notifications– EventIDs : Event IDs– Subs : subscription information– unSubs : unsubscription information– View : targets of gossip message

• Size limitation for all Buffers– Especially in Events and Subs


sending

• lpbcast(e)– Add e to Events

• periodical gossip– Send buffers to a subset

of View (every 50ms)

e

e

Events

Events

EventIDs

View

SubsunSubs

e

Mes


receiving

• When receiving gossip…– Membership Management

• add Mes.unSubs : unSubs ・ remove Mes.unSubs : View,Subs

• add Mes.Subs : View,Subs• If size of View is too large, move some items to Subs randomly

View

Mes.unSubs

Mes.SubsSubs


receiving

• When receiving gossip…– Event transmission

• Events received for the first time are transmitted to other processes in View

• If size of Events is too large, remove randomly

– Retrieving Event• When receiving undelivered event ID in Mes.EventIDs,

a request of retrieving Event

Events

e

Unknown

ee

EventIDs

Unknown

eID

ID


subscribing

• Subscribing process should know at least one node in specific Members

• Sending Gossip with appending itself to Subs• When timeout, making retransmission

View


unsubscribing

• Sending Gossip with appending itself to unSubs– The process is gradually removed from individual view– Set timeout to unSubs messages– Assumption ： removed process will not recover soon

　

unSubs

unSubs unSubs

unSubs


features of lpbcast

• Throughput is as high as pbcast• A estimation of Memory consumption

• The membership algorithm and the dissemination of events are dealt with at the same level.

• Each view is independent uniformly– True P2P Model

→suitable for WAN– Need to recognize the “locality”


[m1,m2]

Optimization

• Age-base– Optimization of Events Buffer– Now： Events Buffer is purged randomly

→better to remove well disseminated messages– Age = # of hops

P1

P2

bcast(m1)

bcast(m2) gossip(m2)

[m1]

deliver(m2)[m1,m2]


Optimization

• Frequency-base– Optimization of Subs Buffer– Now： Subs Buffer is purged randomly

→ better to remove well-known processes– well-known = included in Subs Buffers

P1

P2

P3

Subs(P1, P2)

Subs(P2)

[P2] [P1,P2]


Experiment : # of rounds

• Simulation– Prob. of Message loss： 0.05– Prob. of process crash： 0.01

• # of rounds to disseminate 99% of all processes

• Logarithmically

– Fanout = 3


Experiment : Reliability

– SUN Ultra 10 (Solaris2.6, Memory256Mb)– 100Mbps Ethernet– 40msg/round, len(Events)=60

• A probability for any given process ofdelivering any givenevent notification


Experiment : Optimization Effect

• Age-based optimization– Delivery ratio =

(# of delivered message)/(# of broadcast)

– 30msg/roundlen(Events)=30Fanout=460processes

Optimized

Random


Conclusion

• Scalability+Reliability

• Bimodal Multicast– Gossip based protocol achieves scalability an

d reliability.

• Lightweight Probabilistic Broadcast– Paying attention to cost of processes– memory management for pbcast.– Lightweight in large scale environment

Documents

2007/1/15std/lpbcast.ppt1 Lightweight Probabilistic Broadcast M2 Tatsuya Shirai M1 Dai Saito