Peer-to-Peer Networks Christian Scheideler Institut für Informatik Technische Universität München 01

Peer-to-Peer Networks

Christian Scheideler

Institut für Informatik

Technische Universität München

0 1

Motivation

• Every distributed system must be based on a network interconnecting its sites

• Network: of physical or logical nature

Physical Network

Supercomputers, multicore systems,…

Logical Network

Internet Internet

Overlay Network

Internet Internet

Overlay Network

Overlay Network

Basic question: how to organize sites in a scalable and robust overlay network???

Overview

• Graph Theory• Supervised and Peer-to-Peer Overlay

Networks• Continuous-Discrete Approach• Maintaining a robust Cycle• Skip Graphs• Locality-aware Overlay Networks• Networks for non-uniform Peers

Graph theory

Graph G=(V,E):

• V: set of nodes / vertices

• E ½ { (v,w) | v,w 2 V}: set of edges / arcs

A

DB C

valid path

v knows wv knows w

v can send info to wv can send info to w

Graph theory

• (v,w): distance (length of shortest path) of w to v in G

• D=maxv,w (v,w): diameter of G

A

DB C

D=4

Graph theory

• (U): set of neighbors of node set U

• (U)=|(U)| / |U|

• (G) = minU,|U|<|V|/2 (U): expansion of G

U

A

DB C

|U)|=1

|U|=2

Graph theory

Network G=(V,E,c):

• V: set of nodes, E: set of edges

• c:E ! IR+: edge capacities

2

A

DB C

Graph Theory

Unless mentioned otherwise:

• All edges have capacity 1

• {v,w} represents {(v,w), (w,v)}

A

DB C

Network topologies

Ideally, complete network:

Problem: does not scale well! (~n2 edges)

Line Network

• degree 2 (optimal), BUT

• diameter bad (n-1 for n nodes)

• expansion bad ( (line) = 2/n )

How to get a low diameter?

Binary Tree

• n=2k+1-1 nodes, degree 3

• diameter is k = 2 log2 n, BUT

• expansion is still bad ( (tree)=2/n )

0

k

depth k

2-dimensional Grid

• n = k2 nodes, maximum degree 4

• diameter is 2(k-1) < 2 n

• expansion is ~2/ n

• Not too bad, but can we get better values?

1

k

side length k

Hypercube

• Nodes: (x1,…,xd) 2 {0,1}d

• Edges: 8 i: (x1,…,xd) ! (x1,..,1-xi,..,xd)

d=1 d=2 d=3

Degree d, diameter d, expansion 1/ dRouting: (x1,x2,…,xd) ! (y1,x2,…,xd) ! (y1,y2,x3,…,xd) ! … ! (y1,y2,…,yd)

Butterfly

• Nodes: (k,(xd,…,x1)) 2 {0,..,d} £ {0,1}d

• Edges: (k-1,(xd,…,x1)) ! (k,(xd,..,xk,..,x1)) (k,(xd,..,1-xk,..,x1))

Degree 4, diameter 2d, expansion ~1/d

0

1

0

1

2

0 100 01 10 11

Routing: (0,(x1,x2,…,xd)) ! (1,(y1,x2,…,xd)) ! (2,(y1,y2,x3,…,xd)) ! … ! (d,(y1,y2,…,yd))

Cube-Connected-Cycles

• Nodes: (k,(x1,…,xd)) 2 {0,..,d-1} £ {0,1}d

• Edges: (k,(x1,…,xd)) ! (k-1,(x1,...,xd)) (k+1,(x1,..,xd)) (k,(x1,..,1-xk+1,..,xd)

De Bruijn Graph

• Nodes: (x1,…,xd) 2 {0,1}d

• Edges: (x1,…,xd) ! (0,x1,…,xd-1) (1,x1,…,xd-1)

00

01

10

11 000

100 110

111

001

010 101

011

(x1,…xd) ! (yd,x1,…xd-1) ! (yd-1,yd,x1,…,xd-2) ! …

The Diameter

Theorem: Every graph of maximum degree d>2 and size n must have a diameter of at least (log n)/(log(d-1))-1.

Theorem: For every even d>2 there is a family of graphs of maximum degree d and size n with diameter (log n)/(log d -1).tree of

all reachable nodes at dist. k

The Expansion

Theorem: For every graph G the expansion (G) is at most 1.

Theorem: There are families of constant degree graphs with constant expansion.

Example: Gabber-Galil Graph• Node set: (x,y) 2 {0,…,n-1}2

• (x,y) ! (x,x+y),(x,x+y+1), (x+y,y), (x+y+1,y)

(mod n)

Overview



Overlay Network

Basic question: how to organize sites in a scalable and robust overlay network???

Scalability: works efficiently for large number of sitesRobustness: can handle faults and malicious behavior

Server-based approach

Internet Internet

server

sitesDoes not scale well!

AlternativesSupervised overlay network

Supervisor assists inmaintaining network

Peer-to-peer overlay network

Peers maintainnetwork themselves

Overlay Network

Problem: How to maintain an overlay network as peers join and leave?

Supervised Overlay Network

• Supervisor assigns peers to points in [0,1) so that peers evenly distributed

• Neighboring peers connect to form cycle

01

0

1/2

1/43/4

1/8

3/85/8

7/8


• Node v wants to join (n nodes in system):give it (n+1)th position

• Node w wants to leave:move last node v to w‘s position

01

v

w


• v: node at nth position

• supervisor: stores pred(v), v, succ(v), succ(succ(v))

• join and graceful leave operation:01

v

Pure Peer-to-Peer Network

We also focus on [0,1).Every peer mapped to random point in [0,1).

Peers form cycle based on points.• Chord: cryptographic hash function• CAN: random number

0 1v

Continuous-Discrete Approach

Problem: cycle not a good routing topology!

01

long paths!

Overview



Continuous-discrete Approach

• V: set of peers, U: virtual space• Each v 2 V mapped to region R(v) ½ U• Family F of functions f:U ! U• {v,w} edge , [F(R(v)) Å R(w)] [ [F(R(w)) Å R(v)] = ;


Basic questions:

• How to map peers to regions?

• What family F to choose?


• Take a classical family of networks(Hypercube, de Bruijn graph,…)

• Convert it into continuous form by interpreting node labels as points in U,edges as a family of functions F

• Mapping peers to regions will then convert continuous form back into discrete graph.

Hypercube

Classical hypercube:• V: nodes with labels (x1,…,xd) 2 {0,1}d

• For all i: (x1,…,xd) ! (x1,..,1-xi,..,xd)

Continuous version of hypercube:• Interpret (x1,…,xd) as z=i xi/2i

• d ! 1: U=[0,1)• F: fi

+(x) = x+1/2i, fi-(x) = x-1/2i 8 i>0

De Bruijn Graph

Classical de Bruijn graph:• V: nodes with labels (x1,…,xd) 2 {0,1}d

• E: (x1,…,xd) ! (0,x1,…,xd-1), (1,x1,…,xd-1)

Continuous de Bruijn graph:• Interpret (x1,…,xd) as z=i xi/2i

• d ! 1: U=[0,1)• F: f0(x) = x/2, f1(x) = (1+x)/2

Gabber-Galil Graph

Classical Gabber-Galil graph:• Node set: (x,y) 2 {0,…,n-1}2

• (x,y) ! (x,x+y),(x,x+y+1), (x+y,y), (x+y+1,y) (mod n)

Continuous Gabber-Galil graph:• n ! 1: U=[0,1)2

• F: f1(x,y)=(x,x+y), f2(x,y)=(x+y,y)


• Take a classical family of networks(Hypercube, de Bruijn graph,…)

• Convert it into continuous form by interpreting node labels as points in U,edges as a family of functions F

• Mapping peers to regions will then convert continuous form back into discrete graph.


• How to map peers to regions?

• Consider any space U=[0,1)d

• Hierarchical decomposi-tion tree:


0 1000 001 01 10 11


Fact:

• Volumes of subcubes assigned to nodes differ by factor of at most 2.

• Subcubes pairwise disjoint.

• Union of subcubes gives U.

Combine this with family F of functions.

Join Operation

0 1000 001 01 10 11010 011

v w

Join Operation

000

R(v)

001 10

11R(v) R(w)

f

f’

{u,v} edge , [F(R(u)) Å R(v)] [ [F(R(u)) Å R(v)] = ;

Join Operation

0 1000 001 01 10 11010 011

v w w inherits connections from vw inherits connections from v

Leave Operation

0 1000 001 01 10 1100

v wv inherits connections from wv inherits connections from w


For any supervised network based on continuous-discrete approach with [0,1)d:

• Sufficient if supervisor introduces new peer to cycle neighbors. From these, new peer can get all F-connections

• Join/leave can be performed with constant time and work for supervisor.

High robustness:• Sufficient to secure base cycle!

Peer-to-Peer Overlay Network

We focus on U=[0,1).

Every peer mapped to random point in [0,1).

01

v

v owns region[v,succ(v))

Join Operation

• New peer chooses random position x.

• Route to peer v owning position.

• Inherit all relevant edges w.r.t. F from v

0 1xv

Leave Operation

• Node that wants to leave transfers its connections to its predecessor.

0 1

Peer-to-Peer Overlay Network

Scalability: with hypercube / de Bruijn

• network has logarithmic diameter

• peers have (poly-)logarithmic degree

• join/leave need (poly-)logarithmic time/work (w.h.p.)

Robustness:

• Make sure base ring is robust!

Overview



Maintaining a robust cycle

Problem: cycle very fragile structure!

01


Solution: connect to (log n) nearest neighbors

01 2 nearestChernoff bounds: nodes still connected under constant fraction of random failures

(with high probability)

Chernoff bounds: nodes still connected under constant fraction of random failures

(with high probability)

Nodes randomly distributed on cycle: constant fraction of correlated failures redu-ces to random failure case

Nodes randomly distributed on cycle: constant fraction of correlated failures redu-ces to random failure case


Problem: what if adversarial peers are part of in the system?

adversarial peershonest peers

system cannot distinguish between peers!

Supervised cycle

01

v

w

Nodes connect to (log n) nearest neighbors:Hard for adversarial peers to isolate honest peers

Peer-to-peer cycle

Chord: uses cryptographic hash function to map peers to points in [0,1)

• randomly distributes honest peers• does not randomly distribute adversarial peers

Peer-to-peer cycle

CAN: map peers to random points in [0,1)

Peer-to-peer cycle

Group spreading:

• Map peers to random points in [0,1)

• Limit lifetime of peers

Too expensive!

Peer-to-peer cycle

How can the system enforce an evendistribution of honest and adversarial peers

in the [0,1) space???

Peer-to-peer cycle

• n honest peers, n adversarial peers

• partition [0,1) space into regions of size (c log n)/n for some constant c

For any region I ½ [0,1) of size (c log n)/n:

• Balancing condition: (log n) peers in I

• Majority condition: honest peers in majority

scalabilityscalability

robustnessrobustness

How to satisfy conditions?

• Rule that works: k-cuckoo rule

evict k/n-region

n honest n adversarial

< 1-1/k

Limitation of k-cuckoo rule

• Only works for any sequence of join and leave requests of adversarial peers.

• Does not work for any sequence of join and leave requests.

Example: adversary orders all peers in a region of size O(log n / n) to leave

Solution: also rearrangements for leave Op.

k-Flip&Cuckoo Rule

• Join: as before (k-cuckoo rule)

• Leave: choose random k/n-region among neighboring (c log n) k/n-regions, empty & flip it with random k/n-

region

n honest n adversarial

flipjoin

Random Number Generation

Critical component:robust distributed random number generator

Solution:• very simple (no error-correcting codes)• works for public channels• even if constant fraction is adversarial

Trick: generate groups of random numbers


• So far, only proactive techniques (i.e., techniques that protect cycle)

• Proactive techniques expensive and have their limits (minority of adv. peers)

• Also reactive techniques needed (i.e., techniques that can recover cycle)

Recovering the cycle

First approach: recover sorted list

5

12

20

2

8

2 5 8 12 20

Recovering a sorted list

Naïve approach:

• Continuously collect info about neighbors of neighbors until all nodes known

• Transform neighborhood into sorted list

Initialgraph

Not scalable!Not scalable!

Not easy to check!Not easy to check!


Better approach: linearization

Every node does the following locally:

12853 14 16

12853 14 16

coordination problemcoordination problem


Naïve solution of coordination problems:

• Suppose that time is synchronized

• In each round (2 time steps) each node v:– right linearization

– left linearization

v v

vv


Correctness of right/left linearization:• Consider arbitrary consecutive pair v,w

• Range reduces by 1 in each round

v w

range of path from v to w


Correctness of right/left linearization:

• Consider arbitrary consecutive pair v,w

v w



Correctness of right/left linearization:• Consider arbitrary consecutive pair v,w

• degree increases by +2 in each round

v w



More realistic approach: take asynchronous behavior into account

• Peers operate in actions:<label>: <guard> ! <commands>

• v.NB: neighbor list of v

• we assume: w 2 v.NB , v 2 w.NB

{v,w}: 0/1

v w edges like shared variables

no edges {v,v}


u.L, u.R: left / right neighborhood of u

Actions for node u:• grow right: (v 2 u.R) Æ (w 2 v.L) Æ (w 2 u.NB) !

u.NB := u.NB [ {w}

• trim right: (v,w 2 u.R) Æ (w 2 v.L) ! u.NB := u.NB n {v}

• grow left and trim left similar

u vw

w vu

safe if executed sequentially in each nodesafe if executed sequentially in each node

preferred op to keep degree lowpreferred op to keep degree low

wait until w2 u.NB and u2 w.NBwait until w2 u.NB and u2 w.NB

Recovering a sorted cycle

Establish wrap-around edge:

• v.wa: wrap-around edge of v• we assume: v.wa = w , w.wa=v• v sets v.wa to w: v.NB:=v.NB [ {v.wa}, v.wa:=w

Problem: more cases for initial state!

Recovering a sorted cycle

Additional actions for node u:• wrap: (u.L=;) Æ (u.wa=?) Æ (w 2 u.R) !

u.wa := w

• extend: (u.L=;) Æ (u.wa=?) Æ (w2 u.wa.R) ! u.wa := w

• unwrap: (u.L=;) Æ (u.wa=?) Æ (u.wa>u) ! u.wa := ?

wu

wu

uv

Overview



Skip Graphs

Problem: messages between local peers may be sent across world

Skip Graphs

Better:

• Give nodes hierarchically specified names europe.germany.bavaria.munich.tum

• Sort nodes according to names

name space

Problem: high imbalance, so cont-disc approach does not work!

Skip Graphs

• Each node v has arbitrary unique name ID(v) and random bit string s(v)

• prefixi(s(v)): first i bits of s(v)

Skip graph rule:

For every node v and i 2 IN0:• v connects to closest successor and pre-

decessor w (w.r.t. ID(v) ) with prefixi(s(w)) = prefixi(s(v))

Skip Graphs

Nodes v with s(v)=0…

Nodes v with s(v)=1…

Skip Graphs

Hierarchical view:

0 1

00 01 10 11

000 001

log n) Degree, (log n) diameter, (1) expansion w.h.p.

Routing in Skip Graphs

Australia

Africa

America

AsiaEurope

O(log n) hops w.h.p.

The Hyperring

Is randomization in skip graphs necessary?

Hyperring: deterministic form of skip graph

Approach similar to skip graphs: organize nodes in cycle according to real names.

CherryBananaApple

Shortcuts: Intertwined Rings

bridge

Join and Leave

• Inserting a node: bottom up

Join and Leave

• Deleting a node: bottom up

k-separated Hyperring

In every level, bridges are k nodes apart.

How large does k have to be to guaranteepolylogarithmic expansion ?

Theorem: = (1/n)(1/ k )

So k has to be non-constant ( ( log n ) ).

Do areas with old insertions/deletions have to berevisited??

2

k-separated Hyperring

Rule: Choose k=6(d+3) d: current degree of node initiating op.

Theorem:• degree: O(log n)• expansion: (1/log n)• congestion for permutations: O(log n)

w.h.p.• work for Join/Leave: O(log n) 3

Locality-aware Overlay Networks

Problem: in general, a distance metric can-not be embedded well into 1-dimensional space

So applicability of skip graphs limited

Use different construction based on Plaxton, Rajaraman and Richa

Overview




For a node v let

• s(v) be its random bit string and

• Bi(v) be ball around v of minimum radius so that Bi(v) contains c 2i log n peers

B1(v)B2(v)

B3(v)


Assumption: growth-bounded metric

• N(v,r): set of nodes w with d(v,w) < r

• There is a constant >0 so that|N(v,(1+)r)| < 2|N(v,r)| all v, r

B1(v)B2(v)

B3(v)


Topology: for every node v and i 2 IN:

• v connects to all nodes w 2 Bi(v) with prefixi-1(s(v)) = prefixi-1(s(w))

B1(v)B2(v)

B3(v)

c 2i log n peersin Bi(v)


Topology rule implies:

• degree of each node (log2 n) w.h.p.

• v has nodes w in Bi(v) with prefixi(s(w)) = prefixi-1(s(v)) ± x for all x 2 {0,1} w.h.p.

B1(v)B2(v)

B3(v)

c 2i log n peersin Bi(v)

Locality-aware Routing

Routing from v to w:

• s(v)=(x1 x2 x3…), s(w)=(y1,y2,y3,…)

• v ! closest u1 in B1(v) with prefix1(u1) = y1

• u1 ! closest u2 in B2(u1) with prefix2(u2) = y1 y2

• …

• until we reach uk-1 with w in Bk(uk-1)


v wu1u2

B1(v) B2(v)

B2(u1)B3(u1)

B3(u2)

1


Let r(B) be radius of ball B.• d(u1,v) < r(B1(v))/ w.h.p. ( = (log1+ c) )• r(B2(u1)) > (1+-1/) r(B1(v))• d(u2,u1) < r(B2(u1))/ w.h.p.• r(B3(u2)) > (1+-1/) r(B2(u1))• …

After k hops ( r=r(B1(v)) ):• d(uk, w) < d(v,w) + i=0

k-1 (1+-1)i r/ < d(v,w) + (-1)-1 r (1+-1/)k

• r(Bk+1(uk)) > (1+-1/)k r


After k hops ( r=r(B1(v)) ):• d(uk, v) < i=0

k-1 (1+-1)i r/ < (-1)-1 r (1+-1/)k

• r(Bk+1(uk)) > (1+-1/)k r

Finally, w 2 Bk+1(uk):• d(v,w) > r(Bk(uk-1)) – d(uk-1,v)

> (1-1/(-1)) (1+-1/)k-1 r• d(uk,v) < d*=(-1)-1 r (1+-1/)k and

total path length < 2d*+d(v,w)

vukw

d* < (/2)d(v,w) if > 2(1+)/+2

Overview



Networks for non-uniform peers

Problem: peers have non-uniform bandwidth

Cont-disc and skip graphs do not work!


Ad-hoc solutions:

• cut large peers into many small peers

• multi-tier network

Better approach:

• organize peers in a heap

How to design scalable distributed heap?


dB(1)

dB(2)

dB(3)

dB(4)

………………..

dB(d): leveled de Bruijn graph of dimension d

Routing between v and w via nodes of two dB-levels up

PAGODA heap network

5 levels

4 levels

3 levels

v w

Join

dB(1)

dB(2)

dB(3)

dB(4)

………………..


PAGODA heap network

5 levels

4 levels

~log2 n levels

Move upwards until all parents havelarger bandwidth

Leave

dB(1)

dB(2)

dB(3)

dB(4)

………………..


PAGODA heap network

5 levels

4 levels

~log2 n levels

Set bandwidth to 0, send downwards untilno further children, remove node


dB(1)

dB(2)

dB(3)

dB(4)

………………..


Problem: updating PAGODA may need O(log2 n) time

PAGODA heap network

~log2 n levels


SHELL network: oblivious heap

Join operation: O(log n) time

Leave operation: O(1) time

Conclusions

Many interesting fronts to work on in contextof scalable distributed systems:• self-optimizing networks• social networks• proactive approaches• reactive approaches

(repairs under adversarial presence)• new paradigms

Questions?


01

v

Documents

Peer-to-Peer Networks Christian Scheideler Institut für Informatik Technische Universität München 01