P2P and multimedia applications over the Internet

P2P and multimediaapplications over the

Internet

Notes on the course

Fiandrino Claudio

July 4, 2011

‡

II

Contents

1 P2P systems 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Time evolution of applications . . . . . . . . . . . . . . . . . 2

1.4 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.1 General Issues . . . . . . . . . . . . . . . . . . . . . . 3

1.4.2 Issues for ISP . . . . . . . . . . . . . . . . . . . . . . . 3

1.4.3 Issues for Users . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Overlay network . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6 Family of systems . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.7 Napster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.8 Gnutella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.8.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.8.2 Messages . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.8.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . 12

1.8.4 Performance evaluation . . . . . . . . . . . . . . . . . 13

1.9 Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.9.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.9.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.9.3 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.9.4 Load balance . . . . . . . . . . . . . . . . . . . . . . . 23

1.9.5 Comparison between Chord and Gnutella . . . . . . . 25

1.10 CAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.10.1 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.10.2 Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.10.3 Performances . . . . . . . . . . . . . . . . . . . . . . . 28

1.10.4 Leaving of a node and failures . . . . . . . . . . . . . . 28

1.11 Tapestry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.12 BitTorrent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.12.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.12.2 Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.12.3 Case study: Flash Crowd . . . . . . . . . . . . . . . . 34

1.13 Skype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

III

IV CONTENTS

1.14 P2P Streaming systems . . . . . . . . . . . . . . . . . . . . . 381.14.1 Tree-based systems . . . . . . . . . . . . . . . . . . . . 401.14.2 Meshed-based systems . . . . . . . . . . . . . . . . . . 43

2 Random graphs 532.1 Introduction and definitions . . . . . . . . . . . . . . . . . . . 532.2 Erdos-Renyi Model . . . . . . . . . . . . . . . . . . . . . . . . 54

2.2.1 Average degree . . . . . . . . . . . . . . . . . . . . . . 552.2.2 Degree distribution . . . . . . . . . . . . . . . . . . . . 56

2.3 Bender-Canfield Model . . . . . . . . . . . . . . . . . . . . . . 562.3.1 Node reachability . . . . . . . . . . . . . . . . . . . . . 562.3.2 Small-world effect . . . . . . . . . . . . . . . . . . . . 612.3.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.4 Heavy-Tailed Distribution . . . . . . . . . . . . . . . . . . . . 652.5 Watts-Strogatz model . . . . . . . . . . . . . . . . . . . . . . 66

2.5.1 Clustering analysis . . . . . . . . . . . . . . . . . . . . 672.5.2 Small-world analysis . . . . . . . . . . . . . . . . . . . 68

2.6 Theory of evolving networks . . . . . . . . . . . . . . . . . . . 692.7 Resume scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Chapter 1

P2P systems

1.1 Introduction

For P2P analysis point of view, the Internet is a structure already definedand perfectly working: only users are taken into account and they are calledhosts or peers. Hosts communicate thanks to the Internet, which can be seenas the transport media that carries data, therefore the analysis focuses onlayers 4 and 7 of the OSI stack. Indeed it is necessary having a knowledgeof transport layer to understand and predict the behavior of the network,but it is also necessary know what kind of features users may require fromthe application layer, since they operate with applications.

Layer 7

Layer 4

1.2 Definition

P2P (peer-to-peer) systems are system in which users receive and providepart of the service. This is a general definition, indeed the concept of ser-vice has to be declared. The important thing is that hosts also contributeto service provisioning: it means that the service is distributed and not cen-tralized like a web browsing application. Depending on the type of service,users provide different things using their resources.

1

2 CHAPTER 1. P2P systems

Sharable resources

In this section the attention will focus on kind of sharable resources.

A first type are content resources: users share content that they have ontheir machines. If there are no other users with that content, the quality ofservice will be very bad while, if a lot of hosts share the same content, theservice will be excellent. An example of application is Napstar where thecontent is music. Types of content indeed might be various; grouping them,it is possible introduce the following classification:

. file sharing;

. directories.

File sharing groups a lot of possible contents: music, games, videos, films,ebooks. Directories are typically part of a distributed database that oncereceived it is redistributed and anyone can access to that part (Skype).

Another possible sharable resource is CPU : in this context the compu-tational power is shared. For example, if an application requires a very hugecomputational capacity not owned by a single machine, it can be distributedamong Internet hosts to use their computational power to process a singlepart of the application (application to discover new form of life that requiresharing power to signal processing).

The last possible shareable resource is bandwidth: an example is the casein which an host owns a very popular film requested by a lot of other peers;if it has to distribute to everyone, a very large bandwidth it is required atthe access link. Perhaps it is better if he distribute parts of that film toother users that in turn redistribute: in this way the bandwidth actuallyused is greater. Examples of applications are Bit-torrent, P2P Tv, Gaming.

1.3 Time evolution of applications

At the begin, the Internet was in certain sense peer-to-peer: flat topology,distributed features and protocols. Growing up, it moves to the client-server paradigm in which someone provide some service requested by otheruser: the web browsing is a typical client-server application. ISP developedapplications in that sense and that choose implied having asymmetric access:upload and download treated separately, typically assigning to downloadmuch bandwidth (ASDL). Indeed, usually there is one server with severalclients.

With the development of peer-to-peer applications the situation changedin a fair symmetric way and now there is no a strict division to downloadand upload bandwidth because, if peers have to redistribute contents, theyneed an application able to exploit in particular the upload bandwidth.

1.4. Issues 3

Moreover with the technological evolution of devices, a much more com-putational power has made it possible push down some tasks from the corenetwork to edges.

1.4 Issues

1.4.1 General Issues

Peer-to-peer systems suffers of critical issues. One is churning, the highvariability in time of the system. Indeed hosts can freely join or leave sothe quantity of content avaiable changes very frequently. For example, forP2P Tv, resources have to be balanced on the quantity that a peer canredistribute and the quantity that he needs.

Furthermore, a perfect knowledge of participants is required, such astheir Ip address that, due to churning, can change over time. This knowledgeis not strictly necessary in others applications.

If a peer is hidden behind a NAT or a firewall, further information isrequired, in particular the public Ip address of NATs. The reason is thatNATs were developed for a client-server kind of application. Firewalls, in-stead, can denied the access of a machine to the P2P application.

Every P2P system has to deal with join issue: when users want to jointhe net, they require some information like the address of the firs neighbor.If, in a certain moment, there are no peers in the network the service cannot be provided. In order to join is possible:

. access to a web page which contains a list of peers active or recentlyactive: the new peer contact them as soon as he finds one up;

. connect to some server always on.

These mechanisms are centralized techniques: an application that use themis BitTorrent.

1.4.2 Issues for ISP

ISPs have to cope with following troubles:

. traffic engineering : to improve the service, having in mind the goalof satisfying users requirement, ISPs can balance traffic (symmetric orasymmetric access means different amount of traffic in the network);

. capacity problems: many applications generate a lot of traffic and ISPs,when exchange traffic to other ISPs, have to respect cost policies stip-ulated; moreover, the quantity of the traffic can be huge because ap-plications does not care of the physical topology so, being neighborsin the peer network does not implies belonging to the same ISP: theconsequence is that, in general ISPs are crossed many times;


. competitive services: ISP can have their own telephony company whichgives a non free service; of course they also carry data traffic and, if thattraffic is Skype traffic, which is free service VOIP, they may penalizeit since it is concurrent.

1.4.3 Issues for Users

Considering users, they have to deal with:

. legal issues: some services, for example file sharing, may incur in thisissue because contents are distribuited violing copyright;

. security and private issues: maybe some applications are maliciousand exchange traffic potentially riskily (viruses, malaware, spyware).

1.5 Overlay network

The layer 7 network that connects peers is called overlay network. Theoverlay network is completely independent from the physical network andcan be fully mesh connected or not (if peers does not know all other peers,but they have a partially view of the topology). The picture below reportsan example.

Isp 1

Isp 2

Isp 3

Overlay Network

Links are logical of course, and two peers connected by a link of theoverlay network are neighbors and they may belong to different ISP: it meansthat physically they can be located very far away. Links can be createdin different ways, with direct TCP connections for example, or with UDPconnections plus some further information.

1.6. Family of systems 5

The overlay network is used to implement functions, different from appli-cation to application and it is possible have more than one overlay networknested together. Some examples are:

Gnutella :

{query files: overlay network

retrieve files: tcp connection

BitTorrent :{

retrieve files: overlay network

1.6 Family of systems

According to the following classification, it is possible to distinguish:

. unstructured P2P systems: they are systems in which the topology isnot regular, but a random graph (neighbors are randomly chosen); anexample is Gnutella;

. structured P2P systems: in these systems the topology is regular; anexample is Chord;

. hierarchical P2P systems: a hierarchy is created among peers, distin-guishing high priority peers (super peers) and ordinary peers; superpeers are connected together in a structured way while ordinary peersare connected with unstructured topology; an example is Skype.

1.7 Napster

Napster can be considered as the first P2P system, developed by ShawnFanning with Sean Parker and released in 1999. Actually it was not areally P2P system since users were not connected together (they had to joinservers), but it has some peculiar characteristics of P2P systems. Thoseservers contained, in a database, lists of shareable contents that users hadon their pcs. The architecture was something like a star where central nodeswere servers; it is briefly shown in the following picture.

Server UsersDb


Properties

. Informations that users declare: ID, Ip address, number of port, listof sharable contents.

. Fundamental function: query for a given content.

How it worked

When a user wanted to retrieve some content like a song, sent his request tothe server; at that point the server looked for the content into the databaseto know who hold it. If someone had it, it returned to the initial user allinformations regarding the user that had the content: in this way the twohosts can exchanged the content using a direct connection.

1.8 Gnutella

Gnutella is not an application or a system, but it is a protocol that otherapplications implement (for example Shareaza, Bearshare, LimeWire). Thetopology is unstructured and there is no distinction among peers: it is server-less. Moreover, each node can request or distribute contents: this kind ofpeers are called servent.

It is assumed that users share contents stored on their pcs so they havefirst to declare to the network the knowledge of their contents. The pur-pose of Gnutella is make queries in a smart way. A query, to discover therequested file, has to search on a list of contents held by peers; such a searchis realized thanks to flooding : the initial node can send the request only toits neighbors, they forward it to their neighbors and so on. It implies thateach node has not a global view of the network.

1.8.1 Analysis

To analyze a P2P protocol, the attention has to be focused on the followingaspects:

. how users join;

. maintenance: fundamental task to deal with churning;

. search: discover some content in the network (is a typical task for filesharing applications);

. download: when a search succeeds, how the file is downloaded.

1.8. Gnutella 7

Joining

The protocol does not specify a procedure: usually on a web page there isa list of peers active or recently seen active. The new user has to connectto that page and download that list; then he has simply to try to contactusers presents on the list as soon as he is able to find one of them active:at that point he can open a connection and wait for the acknowledge. Tobe contacted each peer has to declare: its ID, the Ip address and the portnumber.

The graphical explanation is reported below.

A

Step 1: contact the web page

Web Page

A

Step 2: download the list of peers

Web Page

A

Step 3: contact a peer

Web Page

A

Step 4: wait the acknowledge

Web Page


A

Step 5: the new user is a peer

Web Page

Steps 1-4 are called signalling procedure: after that the new user becomes apeer and, at the beginning, he has just one neighbour (the peer contacted bymeans of the web page); in Gnutella, two peers are neighbors when they haveestablished a TCP connection (at that time using TCP was very peculiar).Since it is possible to contact each peer present in the list, the topology israndomly created.

Maintenance

When one peer is connected he has to discover other neighbors to have a goodconnectivity; indeed, if the only neighbor that it has switch off, he remainsno longer connected with the network. Goals of maintenance mechanismare:

. guarantee a good connectivity;

. give the possibility of change neighbors (in order to discover peers withmore contents).

The second feature implies that the overlay change a lot in time, due to thisfact and to churning.

To reach the two purposes, the following mechanism is provided:

. time by time a ping message is sent to check if neighbors are alive;

. when a ping message is received:

. with a pong message a neighbor signals that it is alive;

. the peer forwards to all its neigbors the ping (they will answerwith a pong just to the peer that forward the first ping, not tothe initial sender);

. when a pong message is received it is forward to peers that previouslysend a ping.

This mechanism is called flooding or discovery method because the new peer,thanks to ping and pong, can discover new neighbors.

The algorithm stops using the TTL field of both messages; it allows to:

1.8. Gnutella 9

. avoid messages that run forever in the network;

. discover a part of the topology, not the complete knowledge of thenetwork.

Since each message has an almost unique identifier (it is selected ran-domly among a large set, so the probability of having two messages with thesame ID is negligible), the peer i has not to forward a message (both pingand pong) if he has received it more times; this choice has been taken:

. to avoid useless propagation of messages;

. to have a small cache in which store messages (possible only if uselessmessages are not propagated).

The mechanism does specify the policy in which a new peer operates,once it has discovered new peers with pongs: contact all of them, just a partchosen randomly, a part chosen following some criterion.

Search mechanism

The search method is implemented with flooding as the maintenance mech-anism. When a peer wants to search a given file, it has to send a querymessage to his neighbors; the message contains all fundamental informationon the file. Nodes that receive the query check if they have that content:

. if not, they have to forward the message to their neighbors (as before,the message has an unique ID, so if a peer receive it more time, it justignores the message);

. if yes, they have to answer with a query hit message.

The node that has the content does not forward anymore the query message;notice that the query hit uses the reverse path to reach the initial node.The reverse path is exactly the path followed by the query message and itis extremely important since, as mentioned, each node does not have theglobal view of the topology.

Download

When a query succeeds, the initial requester has to download the file; it isable to do it, since the query hit message contains all information on thenode that holds the content. In particular, peculiar features are:

. Ip address;

. peer ID;

. port number.


The download uses HTTP protocol and it happens directly between therequester and the peer that holds the file: it means that contents are notdistributed over the overlay network, just queries are.

1.8.2 Messages

Messages, or descriptors, are used to implement functions mentioned beforelike maintenance and search. They are composed by header (common to allmessages) and payload (different from function to function):

Fields Header Payload

0 22 23 variable

Header

The header is composed by:

Fields Descriptor ID PT TTL Hops Lenght

0 16 17 18 19 22

where:

. descriptor ID is the unique identifier;

. PT is the payload type;

. TTL is the counter decremented each hop crossed;

. Hops is a counter incremented each hop crossed;

. Length is the field that specify the length of the payload (since it isvariable is not known a priori).

Payload

Ping This message has no payload.

1.8. Gnutella 11

Pong

Fields Port N. Ip Addr Num. Files Num. Kb

0 2 6 10 13

Last two fields represent the sharable capability of the node (in number offiles and Kb): this information helps to decide to what peer is convenient tobe connected to.

Query

Fields Min. Speed Search Criterion

0 2 variable

where:

. minimum speed is the rate at which the peer wants to achieve the file(measured in kbit/s);

. search criterion is the field that contains information used to search thecontent; since the protocol says nothing, each application can specifyits own policy and it is a good choice because, the more general is thesearch criterion, the easier will be the research.

Query Hit

FieldsHitsN. N.

PortIp Addr Speed Result Set Servent ID

0 1 3 7 11 N N+16

where:

. num hits field represents how many contents satisfy the research;

. speed represents the minimum speed (see query message);

. result set contains:

Fields File index File size File name

0 4 8 variable


Push If the node that contains the file is behind a firewall the requesterservent is not able to contact him: in this situation he sends to his neighborsa push message. Once it is reached by the final node (always with flooding),the connection between the two servent is opened by that peer and not bythe requester on. A push message is compound by:

Fields Servent ID File index Ip Addr Num Port

0 16 20 24 25

1.8.3 Characteristics

Network’s aspects

From the network point of view, main characteristics to keep in mind are:

. scalability with the number of peers: the system scales very well be-cause is completely distributed;

. robustness with respect to churning/failures: the system is very robustboth to churning and failures because the maintenance is realized withflooding and the connectivity is very high.

User’s aspects

From the users point of view, the main characteristic in which they areinterested in, is the efficiency or response time. It depends on the popularityof the content:

. if it is very or quite popular, probably the hit will happen before theTTL goes to 0;

. if it is not popular, the probability of finding the content before theTTL goes to 0 is not sure.

In the first case the efficiency is guaranteed while in the second case no.

Costs

Since there is a lot of traffic to deal with, from the network point of view,the protocol is extremely costly: this is the main drawback of Gnutella.

Considering users, the algorithm is simply and, in terms of resourcesconsumed, is cheaper since the storage capability devoted to the protocol islittle. Only things to manage are:

. neighbors;

. cache.

1.8. Gnutella 13

1.8.4 Performance evaluation

To evaluate Gnutella performances the analysis focuses on flooding proce-dure:

Each arrow color represents a different step of the procedure: this is a sortof tree:

A

B

E F

C

G H

D

I L

To perform some analysis, first parameters have to be declared; they are:

. κ is the number of neighbors for each peer (in the previous pictureκ = 3: for example, A can contact B, C and D while C can contact G,H and A); it is assumed constant;

. H is the number of hops: represent the deep (number of levels) of thetree;

. N is the number of peers;

. T is the average time to contact a peer; it is a random variable de-pending on:

. layer 3-4;

. physical distance;

. number of routers crossed;

. possible congestion in the network;

. p represent the popularity of the file: it is a probability that some peerhold that content.


Number of contacted peers

Since κ is assumed to be constant, at each level of the tree there, each nodecan contact exactly other κ nodes; to have an approximation of the numberof contacted peers c , the following assumptions are taken:

. common neighbors are neglected, therefore each node contact κ·(κ−1)peers (all sons of the tree a part from the father);

. the value κ · (κ− 1) is approximate with κ2.

In conclusion at each step the number grows by:

c = κ‡ + κ2‡ + κ3‡ + . . .+ κH‡

It is possible to rewrite the expression into:

c =H∑i=1

κi

Example Taking values for H and κ it is possible to determine realisticvalues for c: {

κ = 4

H = 7=⇒ c ∼= 22k

If the message was a ping, peers will answer with a pong, therefore for eachping, in a scenario like the preceding one, there will exchange ∼= 44k.

Time need to contact peers

To compute it, first an assumption has to be taken: at each level of the treethe time to contact peer (from father node to sons) it fixed and equal to T .Implicitly it means that the time required to send sequentially messages isconsidered negligible with respect to the time need to reach neighbors.

Under that assumption, considering independent each level of the tree,parallels propagations occur and so:

Avg{time} = H‡ · T ‡

In a time (H · T ), κH nodes are reached.

Example Considering:{H = 7

T ∼= 200 ms=⇒ Avg{time} = 0.2 · 7 = 1.4 s

Therefore, it is possible say that, the response received by an huge numberof peers is quite quick.

‡ First step. ‡ Second step. ‡ Third step. ‡ H-th level of the tree. ‡ Number of

hops. ‡ Time to cross an hop.

1.8. Gnutella 15

Probability of not finding a content

This is an inefficiency of the system perceived by users. In general, thenumber of copies of a given content with popularity p is (N · p). It meansthat each peer has an independent probability of having that content.

Considering c the number of contacted peer, the probability of not find-ing the content is:

P (not find) = (1− p)c

Choosing a target F under which P (not find) must be assured:

P (not find) < F

⇓(1− p)c < F

Taking the logarithm:

c · log(1− p) < logF =⇒ c >log(F )

log(1− p)

Example Considering κ = 4:

Value of H Value of c

1 4

2 20

3 84

4 340

5 1360

6 5460

7 21844

Maintaining κ = 4, considering F = 0.01:{p = 0.05 (5%)

p = 0.01 (1%)=⇒

{c > 90 take H = 4

c > 458 take H = 5


Performance

Performances principally means the average number of hops require to con-tact before having the first hit. For example:

P (1) = P (find the file at the first hop) = 1− (1− p)κ

Prosecuting:

P (2) = = (1− P (1)) · [1− (1− p)κ2 ]

P (3) = = (1− P (1)) · (1− P (2)) · [1− (1− p)κ3 ]

The average time to send a request is:(H∑i=0

i · P (i)

)· T

The average time to receive an answer is:(H∑i=0

i · P (i)

)· 2T

1.9 Chord

Chord is a structured system (on the overlay) which implies that churningis a big issue since the topology is fixed. So the choice of the topology isvery relevant: it can not be a star because in a P2P system in general thereare no role distinctions like the one introduced by the star topology withthe central node. Moreover, also regular structured topologies are not sogood since they introduce the concept of priority based on the geographicalposition. The topology actually used is a ring.

The attention must be focused on the P2P technology, so the applicationlayer and network layer are non considered; using a diagram, the stackshould be:

Application

P2P Technology

Layer 3/4

1.9. Chord 17

The P2P technology concerns features like overlay creation and maintenance,join operation and management of messages.

Chord is similar to Gnutella since it is a protocol, but it distributes theinformation about contents and not the request for a given file. For example,it is possible that the peer that knows where is located a certain content isnot the holder: the two aspects are completely separated.

1.9.1 Analysis

A regular structure like the ring gives, implicitly, a knowledge about thedistance between nodes. This fact is very useful to help the join operation:a new peer that wants to be connected has just to know in which positionhe should be placed. The distance knowledge is not provided physically:it is too complex to manage. Moreover it introduce some differences froma peer to another one: if the application that runs this protocol becomesvery popular in a given country, nodes belonging to that country will bephysically placed near with respect to a node belonging to another country.The density would be different.

On the contrary, supposing to have a knowledge of distance at the over-lay, allows to consider peers physically located very far away as neighbors.The way in which nodes are placed on the ring is to apply a function Fto a list of information about the peer: the outcome is deterministic anduniformly distributed into an interval. This outcome is a number mapped inbits, so the ring is usually divided into m bits and, consequently the intervalis divided into 2m−1 parts.

Peer Info Node IdF

The function F is realized thanks to cryptografy (SHA-I):

. because makes difficult from the Node Id, obtain the peer informationlist;

. allow to map a lot of information into an uniformly distributed spaceavoiding some proximity among peers;

. although the mapping is random into the interval [0, 2m−1], the func-tion is deterministic, so receiving two identical inputs, it will providethe same output (possible collisions).

The Node Id represent the final position of the peer on the ring; thanksto that topology, each peer has just two neighbors called predecessor (i− 1in the following picture) and successor (i+ 1 in the picture); therefore it is


possible define neighbors as the closest active peers of the considered node(i).

02m−1

ii− 1

i+ 1

Join

Up to now, the join operation can occur with following steps:

. the new node applies the function F to his peer list information re-ceiving as a result his own position (N7);

. he should know another peer and contact it (N24);

. this peer contact his successor and so on until the right position of thenew node is reached;

. when successor and predecessor of the new node are founded, the con-nection is established and the node becomes a peer.

Graphically:

N7

N24

N7

N24

N7

1.9. Chord 19

How information is distributed

Unlike Gnutella, in Chord the information of where contents are located isdistributed among peers. Each peers knows that information thanks to keysthat are generated applying a function G to metadata (data that describesynthetically the content). Graphically:

Metadata KeyG

Keys are values generated with the same properties of Node Id, there-fore they are uniformly distributed in the same interval [0, 2m−1]. An im-portant thing to remark is that F and G, starting from different inputs(peer information list and metadata), are both able to map different kind ofoutputs (Node Ids and keys) into the same interval.

To associate keys to Node Ids the rule used is to assign a key to thenearest peer succeeding the key value.

Queries

When the node N wants to retrieve a content, runs the function G over themetadata obtaining the key. Since it knows only his neighbors, he forwardsto them the query that each time is redistributed. In this way sooner orlater the peer that has holds the key searched by N is founded.

If peer are n, globally, the expected time to found the one with the rightkey is n/2. This assumption holds just because both keys and node id areuniformly distributed. Therefore the order of complexity is quite high withrespect to Gnutella, but Chord guarantees that the content is surely found(in Gnutella it depends).

Shortcuts The query process has been improved by using shortcuts: inpractise each node does not have just the knowledge about his neighbors,but know the location of more peers. Those peers are not chosen randomly,but with a specific rule: each time the space of a possible search of a filemust be divided in two parts. The graphical explanation is:


The principal advantage of using shortcuts is that the search, insteadbeing linear (complexity n), becomes dicotomic and therefore, the complex-ity is log n. The main drawback is that a sort of routing table is required:in Chord is called finger table. For a given node N, it has m entries and itis build as:

Index Value Successor

1 N+20 successor(N+1)



...

i N+2i−1 successor(N+2i−1)

...

m N+2m−1 successor(N+2m−1)

The value of m is critical: if it is large the probability of having conflicts(same output value applying the function on different inputs) is negligible;on the other side, high values of m imply:

. large number of bits used;

. high length of the finger table.

1.9.2 Example

Given the following picture with m = 6 and the number of bits 26 = 64:

1.9. Chord 21

N56K54N4

N8

K10

N14 K10

N32 K24

K30K38N39

K38

N42

N48

N51

consider the case in which N8 is looking for K54. The finger table of N8 is:


1 8+1=9 N14

2 8+2=10 N14

3 8+4=12 N14

4 8+8=16 N21

5 8+16=24 N32

6 8+32=40 N42

In this case the query is forwarded to N42 which is the nearest peer; thefinger table of N42 is:


1 42+1=42 N48

2 42+2=44 N48

3 42+4=46 N48

4 42+8=50 N51

5 42+16=58 N4

6 42+32=74=10 N4


At this moment, the nearest peer is N51; its finger table is:


1 51+1=52 N56

2 51+2=53 N56

3 51+4=55 N56

4 51+8=59 N4

5 51+16=67=3 N4

6 51+32=83=19 N21

Since the key is in between values 53 and 55, the peer selected is N56: inthree hops the key is founded.

Join procedure with shortcuts

If a new node wants to connect to the P2P application, runs the functionF to discover his Node Id: assume it is N26. In the example, it has to beplaced between N21 and N32. If, for example, he contact N4 to discover hissuccessor and predecessor, the way in which this search is made is thanksto shortcuts, exactly like a query: first the successor of N26 is found andthen contacting N32 is possible discover N21 which will be the predecessorof N26, but at the moment is the predecessor of N32. After this preliminarystep, all finger tables have to be updated.

Procedure

1. ask to some nodes to retrieve the successor(n) and the predecessor(n);

2. create finger table of n and update finger tables of other nodes; theupdate operation is very complex;

3. redistribution of keys.

1.9.3 Issues

A possible problem of consistency takes place when finger tables are up-dated: for example, if a node is searching a key in a given node N , but iffinger tables that point to N are not updated the content will not be found.

Another issue is a failure of a peer. When it happens due to a simpleswitch off of a peer, notifications are sent to other nodes, but if a node failshow notifications are sent?

1.9. Chord 23

To avoid some of those issues, it is possible introduce some redundancy:each node maintains a list of some successors and not only the knowledge ofone predecessor and successor. If, for some reason, the immediate successorfails, the node considered contact some of other successors.

Stabilization procedure

It is run every some time: each peer n ask to his successor n+ 1 to answerwho is its predecessor; if the answer is positive the peer n is actually thepredecessor of n + 1. Otherwise, if the answer is p, two possible anomaliestake place:

1. in the case p > n:

n

p

n+ 1

in this case the information is wrong and the node n has to update hisfinger table since his own successor is p and not n+ 1;

2. in the case p < n:

p

n

n+ 1

in this case the information is wrong and the node n+1 has to updatehis finger table since his own predecessor is n and not p.

1.9.4 Load balance

The amount of work that each peer has to deal with depends how keys areassociated to nodes. Let x:

A

B

x


x =B −A

2m

This parameter x is simply the fraction of the ring that the peer B is incharge of; larger is x, larger can be the number of key assigned to B, sothat node has to deal with a large amount of work. In other words, it isalso possible to say that x is the probability that B is storing a given key:since they are uniformly distributed on the space (normalized values in thepicture below), the probability of having a key is proportional to the spacethat a node is in charge of:

0 1

x

Assuming that there are κ keys in the system, the probability that A isnot in charge of having keys is:

P (A has no keys) = (1− x)κ

while the probability that A has exactly i keys:

P (A has i keys) =

(κ

i

)· xi · (1− x)κ−i

The distribution of that probability is something like:

fA(n)

n

1 2

The region 1 represents nodes that hold few keys, while region 2 describepeers with a huge amount of work to deal with; since the distribution issymmetric with a low variance, the load is assigned quite fairly to nodes.

The mean number of keys stored in peer B is:

E[# keys] = κ · x

and, if there are N active peers, due to their uniformly distribution into thering:

x =1

N

1.9. Chord 25

Therefore:

E[# keys] = κ · 1

N=

κ

N

The fair assignment of keys to nodes on average should not be good: if,for example, the peer A has much more bandwidth with respect to peer B,it would be better assign to A more keys in order to provide a better serviceto all users.

1.9.5 Comparison between Chord and Gnutella

Chord Gnutella

scalability very good very good

robustness (to churning) poor very good

overlay maintenance complex/less costly simple/costly

performances (users) service guaranteed no service guaranteed

responsiveness O(log n) O(H)

performances (network) efficient (shortcuts) inefficient (flooding)

O(log n) O(κH)

node: complexity small very small

node: storage size order of m order of κ

node: load balanced depends on κ

node: contents no user dependency user dependency

Robustness in Chord is poor since the routing is deterministic (short-cuts): if churning is high, updating finger table implies consistency prob-lems. Indeed, structured systems, suffer an intrinsic issue due to the factthat peers have a quite large knowledge of the topology: this implies thatthe state information is high therefore the accuracy have to be very preciseotherwise the system will be not reliable.

The responsive time is similar for both protocols, but actually theyare not comparable because one is a structured system and the other oneunstructured, Chord uses a deterministic routing to found contents whileGnutella uses flooding.


1.10 CAN

CAN (Content Addressable Network) uses the same basic approach of Chord:peer, thanks an hash function, are mapped on a space like keys. Moreoverthe space is the same for both keys and peers; the main difference is that thespace is not mono-dimensional like in Chord, but it could have d-dimensions.

Peer Info Node IdF

Contents KeysG

For example, with d = 2, the space will have two dimensions identified bytwo coordinates:

x

y

The way in which keys are assigned to peers is on the base of the distance:the space is divided fairly to peers and each one controls his region. It impliesthat, all keys placed in a given region, are assigned to the peer that is incharge of that region. Graphically peers are marked in blue while keys inorange:

1.10. CAN 27

1.10.1 Routing

When a peer is looking for a given key, he follows the shortest path to contactthe peer that is in charge of the region where the key is placed. Implicitly,it means that peers has a detailed knowledge about their neighbors (witha routing table): indeed, to select the shortest path, they have to chooseamong them to contact the best one that guarantees the reachability of thekey.

1.10.2 Join

Once a new host has run the hash function he is able to know its finalown position on the space. First he has to download, from a web page, forexample, a list of active peer. Then he contact one of them: this node, bycontacting his neighbors, determines the position of the new peer in the sameway in which queries are performed. When the right position is discovered,the node that is in charge of that region has to partitioned it, assigningto the new node a portion. Regions describe the load that each peer dealwith, therefore high width means high load. Graphically, pictures show thescenario before and after the arrive of a new peer (marked in yellow):

A

A

B

At first, peer A, was in charge of an huge area with 2 keys. After the arriveof peer B, the area has been reduced and, nodes A and B, have to deal withone key each one. In practise, the step 3 in Chord (redistributing keys inpage 22), is realized in an hidden way just dividing the area.

It could happen that the hash function returns values very similar fortwo different peers: in this scenario is possible that, one of the two nodes isin charge of a region, but it does not physically belong to that region. Forexample:


A

B

B is in charge of the yellow region although it does not belong to it. Thisphenomenon is due to the fact that the algorithm tries to obtain a fairdistribution of the load and, therefore, to divide regularly areas.

1.10.3 Performances

The complexity of a query request or a join can be evaluated by means ofthe average path length:

AVG{path lenght} =d

4· n1/d

The formula says that, in order to have a complexity not too high, d must betaken sufficiently large, but large values of d implies have many dimensionsand, therefore, many neighbors to contact each time a message is sent.

The parameter d is much more critical with respect to the parameter manalyzed in Chord: indeed, the complexity in Chord grows by log n inde-pendently by m while the complexity of CAN is directly given by the valueof d.

1.10.4 Leaving of a node and failures

When a node leaves, notifications must be sent to his neighbors in orderto decide which of them have to take care of the leaving peer’s region.Periodically, peers send messages containing information to their neighbors:among of them there is also the width of the area. Indeed, the criterion thatpeers uses to incorporate region is simply: the neighbor with the smallestarea will be the new owner. This is done to maintain some uniform into thespace.

When a message is sent and after sometime a timeout expires withouthaving received any notification, the peer realizes that some problems occur.To recover, a timer is started and that peer waits for some other informa-tion about his neighbor that seems failed. If nothing arrives the takeoverprocedure take place. The timer is proportional to the area owned by theneighbor of the node that seems failed, therefore being in charge of a smallarea allows to enter quickly in the recover procedure. The takeover runs:

1.11. Tapestry 29

. sending pickover messages to all neighbors of the node that is assumedto fail (it implies that each peer has also the knowledge about neighborsof his neighbors);

. assigning to someone the area of the node failed.

All these managing mechanisms are asynchronous and only provided instructured systems that are very complex to managed.

1.11 Tapestry

Tapestry adopts the same method of Chord and CAN: peers and keys aremapped on the same space. The peculiarity is that the space is composedof 160 bits organized into 40 hexadecimal digits.

To know distances among nodes, digits that represent a peer are com-pared; for example, considering:

Node 4227:

. Node 4228 has distance 1 so it is a Layer 4 neighbor (1 digit differ-ent);

. Node 42A2 has distance 2 so it is a Layer 3 neighbor (2 digitsdifferent);

. Node 43C9 has distance 3 so it is a Layer 2 neighbor (3 digitsdifferent);

. Node 6FA0 has distance 4 so it is a Layer 1 neighbor (4 digitsdifferent).

Therefore:

. Layer 4: 422x;

. Layer 3: 42xx;

. Layer 2: 4xxx;

. Layer 1: xxxx.

where x ∈ [0− F].

If each digit is a peer the knowledge near the considered one is verydetailed while it is reduced going far away: this mechanism is called meshrouting and allows to reduce complexity.


Routing

It is very similar to the longest prefix match: if the peer 5230 queries 42A1:

5230400F

4277 42A2 42A1

L1

L2

L3 L4

The search is reduced more deeply goes into layers, but this advantage hasa cost: the maintenance of tables that potentially are large. If β is the baseof digits, the complexity is O(logβ (n)).

It could happen that the table is not completely full: it means thatsome digits are not associated to some peer. This is very risky because thealgorithm was designed for a stable number of peers and this implies thatis not robust to churning.

1.12 BitTorrent

BitTorrent is a very popular system and it is a bit different with respect toprevious mentioned systems. The objective is distribute files with huge sizeto a, potentially, high number of customers. The peculiar feature is that, thecontent, is not stored by a given user, but it is distributed among peers thatshare, among them, the bandwidth to download it. The overlay, therefore,is designed for this purpose and not for make queries.

The content is divided into small pieces called chunks: to consume thefile they have to be all downloaded so, from a peer point of view, they havethe same importance. The usual dimension of chunks is around 64−256 kbit:they are quite small. The neighborhood (overlay) is established randomly, sopeers are forced to both download (new chunks) and upload (chunks held).Transmission occur by means of TCP.

1.12.1 Analysis

The distributor that wants to share the file, has to create a .torrent file bymeans of an hash function: indeed the .torrent is simply a file which indexall chunks including the hash keys that guaranteed the correctness of chunksand, therefore, of the file. The .torrent contains also other information; some

1.12. BitTorrent 31

of them are: the file name, the file size, the number of chunks in which isdivided into a the address of the tracker.

After the creation of the .torrent the distributor has to upload it to awebsite from which peers can download and start to receive the file. Thereis a central authority that maintains the list of active peers that are sharingthe content: it is called tracker. The tracker is not connected to the overlay;his purpose is just help peer to download the file and, for reliability is betterhave more than one tracker managing the overlay for each file.

A

.torrentWebsite

Tracker

1. upload

2. request3. download .torrent

4. contact

5. list of peers

The list downloaded by the tracker is, usually, composed by 40 peers: theywill become the neighborhood of peer A.

Definitions

. seeders: peers that hold the whole content; they are very importantfor the well behaviour of the system because it is possible downloadevery chunk by a seeder;

. leechers: peers that hold just a part of the content;

. swarm: the totality of peers (seeders and leechers) that share the file;

. chocked peers: this nodes are not allowed to receive content from agiven peer;

. unchocked peers: this nodes are allowed to receive content from agiven peer.

Among the list of 40 peers downloaded by the tracker, the node selectjust 4 peers: they are effectively those one that he is in contact with.


1.12.2 Policies

In this section are describe policies in which a peer select the 4 nodes toexchange traffic and how select chunks to be downloaded.

Selection of chunks

Peers distribute a map that shows what chunks they hold; this map is sentto peer’s neighbors, so they can decide which chunk should be downloaded.The policy is simple: the rarest chunk is selected and this is done for tworeasons:

. avoid risks that a rare chunk disappears from the network;

. speed up the download.

Chunk are subdivided in sub-blocks which are composed by around 10TCP packets (∼ 16 kbit). If some neighbors have the same chunk, it ispossible open more TCP connections to download in parallel (typically 5)sub-blocks at a time. In this way an higher download bit rate is expectedbecause the bandwidth is enlarged: indeed, if the connection established fordownloading a sub-block is very very slow, the effect on the global rate ismitigate from the other connections.

Selection of peers

Actually BitTorrent introduces two overlays:

. one for the list of 40 peers downloaded by the tracker (green peers);

. a second that contains the 4 peers (marked in orange) in which a givenpeer is in contact with (the blue one).

The following picture shows this concept:

Overlay 2

Overlay 1

Physical network

1.12. BitTorrent 33

The selection is based on the technique tit-for-tat : it depends on how muchpeers contributed in the past. The global advantage is that connectionswith large bandwidth are favourite and the local advantage is that the sys-tem forced each peer to share more because in this way it will receive abetter service (avoid free riders: peers that want just to download and notcontribute). In conclusion, tit-for-tat :

. improve cooperation among peers;

. provide fairness.

Due to tit-for-tat, there is the distinction of chocked and unchocked peers:if a node in the past has contribute very little, probably it will be put inthe chocked list. Each peers has his own chocked list, computed every timewindow (10 s for example), in which nodes are ordered by how much theyshared: in first positions are put unchocked peers.

The main drawback is that, at the beginning, each node should receive avery bad service since he is not able to contribute so much. This fact is avoidthanks to optimistic unchocking : each time, one chocked peer is unchocked.Indeed, when a peer receives request from others, the one that he will servesare peers that have lots of chunks (they have lots of rare chunks and theycan contribute to share wery well). It means that the rarest approach forbeginning users can not be used: they have to choose randomly chunks todownload, then when their number will be sufficiently high, they can startuse the rarest approach since their contribution will be enough.

Tit-for-tat tries to improve fairness balancing how much a peer can con-tribute with his desired service, but it is possible that, due to asymmetryof network flow, it reduces the performances of the system. Imagine thattwo peers are exchanging chunks belonging to the same content: if the com-munication follows two different paths, it is possible that one of them isbottlenecked. It implies that one of the two peers (A) has a very slowupload ratio with respect to the other (B), therefore (B) can not exploitcompletely his bandwidth because the mechanism tries to punish (A) thathas a low contribute.

To improve efficiency and performances the end game mechanism hasbeen introduced: for each chunk, last sub-blocks are requested by the peerin broadcast to his neighbors. Once the positive answer is received, therequest is aborted. This technique allows to avoid that, being unlucky, thereceiver waits too much time the download from a slower peer: indeed, sincejust one chunk at a time is possible download, waiting for just the lastsub-blocks is waste of time that is possible to avoid. This implies that thedownload is sped up.


1.12.3 Case study: Flash Crowd

Supposing that a content is very popular and the purpose is to distribute itto the largest number of customer possible. Assume:

. the number of peers interested in is n = 2κ;

. two cases are avaiable:

1. a client/server scenario;

2. a scenario in which the content is redistributed by peers;

. the content distributed is an atomic entity;

. all peers have the same upload bandwidth b.

If the size of the content is s, the time needed to download/upload thecontent is:

T =s

b

Plotting on the x axis the number of peer contacted at each step and on they axis the time:

peers

time

T

2T

3T

κT

2

4

8

2κ

Case 1

Considering the client/server scenario, the service capacity needed, is:

1.12. BitTorrent 35

t

C(t)

B

where B is the global capacity of the server, and B > b.

Case 2

In the other approach:

t

C(t)

b

It implies that this method is very effective: in a very short time, it reachesthe client/server approach.

Now consider the case of parallel download: each peer divides in twohis upload bandwidth in such way that two other peers can download thecontent simultaneously. This time the time to complete a download is:

Tx =s

b/2=

2s

b= 2T

The graph will be:


peers

time

T

2T

3T

4T

κT

3

9

If the content is a chunk, comparing the two graphs, it is immediatelyclear that is better not divide the bandwidth distributing it: this allowsto speed up the download because more peers are reached in less time.Moreover, now becomes clear the fact that the size of chunks is reduced: if sis small, also T is small and if the download time is small, the redistributiontakes place quickly improving performances.

The source (colored in blue in both graphs) is the peer that works forthe highest time, but the (κ − 1) step (that is the most effective becauseallows to reach half peers interested in the content) works just for a while: itimplies that the potential bandwidth (2κ · b) is not completely exploited. Away to improve it, is having independent distribution trees: they representpaths follow by chunks to reach peers.

The most effective step is, as mentioned before, the last one becauseallows to reach a large number of peers: this is a reason why the rarestchunk selection is implemented. Indeed, in first steps, the chunk is veryrare, so it is better to distribute it otherwise it can disappear from thenetwork, but at the end it is very popular and the risk of a loss is negligible.

1.13 Skype

Skype is a very popular system that adopts proprietary solutions, thereforethe design is closed and everything is encrypted. The knowledge about thissystem is obtained thanks to reverse engineering. In this system directoriesof people are distributed and they are managed only by super-peers. Reasonsof his success are:

1.13. Skype 37

. very good design and high quality (also in presence of NATs/firewalls);

. users are involved to use it since lot of people use it.

The overlay is hierarchical and distinguish:

. peers;

. super-peers that are very well connected.

An example is:

Super-peers

Normal peers

Super-peers are chosen by election among normal peers and it is possibleforce the software to be not elected; super-peers must have:

. a public Ip address;

. bandwidth to share.

Super-peers are in charge of managing their normal-peers: they know whenpeers are on/off line, they helps peers to find other contacts and with com-munications in presence of NATs/firewalls. However, each normal peer cancontact more than one super-peer for reliability.

Users are not identify based on their Ip address, but with an identifier:this helps people to use the application regardless the place in which theyare. Indeed, if they are at home they can use a pc, when they are at officeanother, but for the application the user is the same. This purpose is reachedthrough an authentication method: each time the user have to declare hisidentity before being connected. Due to this fact, it is possible distinguishtwo classes of signalling:

. one to login and to authenticate;

. one to look for other users.


In general, as transport protocol, is used UDP: since the human voicerequires a low bandwidth, to avoid fluctuations is better use UDP that doesnot provide congestion control although it is not reliable. Of course, whenit is needed (in particular in presence of NATs and firewalls), it is possibleuse TCP; the signalling traffic, instead, is always sent through TCP.

A communication between two hosts not behind a NAT happens like:

. the initiator asks to his super-peer informations (Ip address and portnumber) about the peer that wants to talk with;

. the super-peer provide those information;

. a test connectivity take place: the initiator tries to open a direct con-nection;

. if possible they can start to communicate.

If the initiator is behind a NAT, the connectivity test fails because infor-mation retrieved by the super-peer are different from the actual informationfor the receiver: the answer is negative, therefore, and in the message arespecified the current Ip address and port number. In this way, the initiator,using those new parameters seems that he is not behind the NAT. In thecase in which is the destination behind the NAT, it can not be reached:therefore the initiator contact the super-peer telling that is the destinationhas to start the talk.

When both are behind the NAT, they have also to retrieve their publicinformation from super-peers before start the communication. It is possibleto conclude that the reachability in Skype is very high: indeed, super-nodescan also works as relay nodes in presence of NATs or firewalls; in this casethe two links are completely independent and transport protocols used canbe different. Solutions discussed are called Simple Traversal of UDP throughNATs (STUN) and Traversal Using Relay NAT (TURN).

With Skype is also possible contact the fixed telephone network (proce-dures called skypein/skypeout) by using gateways: in this case the qualityperceived is the same of the fixed telephone because a different codec is used(G729). Usually the voice codec is select from a list; main features are:

. bit rate: 10− 32 kbit/s;

. fixed inter packet gap (IPG): 30 ms.

Moreover, to deal with losses, Skype introduce redundancy.

1.14 P2P Streaming systems

P2P streaming systems are systems that provide multimedia service in prin-ciple and the fundamental assumption is that, the user interested in the

1.14. P2P Streaming systems 39

content, consume it in real time that is he consume it while downloading.Therefore several efforts are make in this sense: service interruption avoided,reduce the delay are some of them.

Services provided are video, audio or both video and audio. Those sys-tems can be distinguished based on the kind of service provided:

. VoD: video on demand (example: catalogue of video channels);

. real-time TV (examples: live sports events, interactive TV).

The fundamental distinction is the delay: in the second case it is much moretight than the previous category. For real-time TV the latency, therefore isvery short: it is the gap between the moment in which the video is generatedand the moment in which the video is consumed. Regardless this classifica-tion, there is a delay to take into account every time: it is the delay thatconsider the distribution of the content. Therefore, peers that compose theneighborhood of a given node, are just those one interested in the same partof the content. Peers are not forced to be synchronized, but in general peersare interested to consume the same part of the content more or less at thesame time.

Reasons for which these systems are now popular are:

. possibility of distribution everywhere at the same time (example of for-eign communities or places with few infrastructure where just internetarrives);

. scenarios of closed market or due to expensiveness;

. small distributors: small communities interested in a given momentwhere the number of users is large but sparse, for example scientificcontests.

Another classification of this system is based on the the type of overlayused:

. tree-based;

. mesh-based (similar to BitTorrent).

The overlay is in charge of the distribution of contents and tasks performedare:

. how find content;

. how find neighbors.

Users share their upload bandwidth to distribute contents.


1.14.1 Tree-based systems

These systems were proposed as alternative to multicast IP distribution us-ing routers to reach more than one users simultaneously. That method suf-fered because routers were assumed to have much more capability; moreovermulticast suffer of following issues:

. routers bottlenecked;

. addresses;

. group maintenance;

. security.

Hosts are divided in source (that generate the content), destinations andintermediate hosts. An example of topology is:

Source

P1

P4

P6 P7

P5

P8

P2 P3

P9

If each node is in charge of distribute the content just to his children, thebottleneck problem disappears because the required bandwidth is not toohigh.

Tree construction Parameters to define are:

. the number of levels of the tree (the number of hops to reach the lastlayer of the tree);

. the fan out: the maximum number of children that each node canhave.


Based on the number of levels, it is possible impose an upper-bound on thedelay: it will be small if the number of levels is reduced. Based on the fanout, instead, it is possible to impose a limit on the upload bandwidth: toomany children are difficult to manage. Indeed, the maximun fan out is:

Fout =global capacity

bit rate for each video

The important think to remind is that the upload bandwidth can not becompletely exploited because some signalling is needed.

Tree maintenance This is a very critical point because trees suffer ofan intrinsic vulnerability: when a node switch off, the topology is divided,therefore some parts of the tree may incur in a potential service interruption.

Potential problem Based on their position in the network, nodes cancontribute more or less to distribute contents, a part from nodes placed inthe last layer: they do not contribute at all. It implies that there is someunfairness.

End-system multicast (ESM)

This system was not designed with a P2P approach and it has two overlays:

. one which is in charge of the tree maintenance: it is based on a meshtopology (information related to maintenance is distributed with flood-ing);

. the second is in charge of distribute and find contents: it is based onthe tree topology.

The approach is distributed, but peers actually maintain a global view ofthe network.

Join operation

. After a bootstrap phase (where the initial node downloads from a webpage a list of active peers) he contact someone;

. a join message is sent through the mesh overlay to all peers (in thisway everyone knows that a new peer wants to join);

. the same happens for the leaving step: a leave message is propagatedthrough the mesh topology.


Periodically each peer sends a message by flooding: in this way nodes canbuild a neighbor table because messages contains information like the peerid from which they have received the message, the Ip address, the id ofthe message and a timestamp. This helps to know when a node leave thenetwork after a failure: if after a timeout (checked with respect to the lasttimestamp received by a given node) no messages arrives from that node,the peer send a message to it; in the case in which no answer is received, heis in charge to notify by flooding the leaving otherwise he just has to updatehis neighbor table.

Once the mesh topology is created, to select the subset of the graph usedto detect the tree is used a distance vector algorithm.

Multi-tree systems

They are still tree-based system in which tree used are more than one; thosesystems are also called second generation systems.

They were developed to deal with issues of single tree-based systems,like:

. little robustness to churning (part of the tree isolated);

. inefficient use of the bandwidth (last layer of the tree does not con-tribute).

The content is organized in m sub-streams and each one is served by adifferent tree. In this way, nodes that are in the final layer of a tree canbe sources for another tree: this improve efficiency and robustness because,when a node leave there should be problems not in all trees, but only in thoseone in which the node is not in the last layer. An important advantage isthat, managing several trees does not implies having too much complexity.

There is a balancing to what a peer receive and how much he contribute:in some conditions it acts like an internal peer, distributing contents to mchildren and in other conditions it acts like a leave by receiving contents.

1/m

m


A drawback is that the parameter m can not be adapted time by time (tothe capacity, to the number of peers), but it has to be decided a priori.

Peer Join A new peer that wants to join to the network, has to:

. find his current position in m trees;

. join as an internal child in one tree, the parent will be the first nodewith the lowest depth that can accept a further child.

The highest is the position the more the peer will contribute in the distri-bution.

Peer leaving The leaving of a peer, if he is placed as a leaf, does not causeproblems, while if he is an interior peer yes: his children have to re-performjoin operation.

Descriptors Multi-tree systems were designed for multi-descriptor codecs:the original information is taken and coded into several descriptors, whereeach one has a different codec. If the user is able to receive all of them, hecan consume the content with an high quality; if he is able to catch justa part of them, he still is able to consume the content, but with a lowerquality.

Multi-trees systems ensure that, a node leaving, is not a critical issuebecause it just could happen that some descriptors are lost, but this impliesthat there is no service interruption: the content will be received with a lowquality.

The drawback is compression: the efficiency of multi-trees plus multi-descriptors is a bit reduced because multi-descriptors, to reach a good qual-ity, need much more bandwidth.

1.14.2 Meshed-based systems

Those systems take inspiration from BitTorrent although the purpose isdifferent. Pieces of a streaming content are distributed to neighbors andthere is a tracker whose role is letting peers join by sending them a list ofactive peers. As BitTorrent, there is no structured overlay: nodes are notforce to be placed in a given position accordingly to a given topology. Theoverlay, indeed, is a mesh randomly created. The maintenance is providedwith a gossip-algorithm: peers declare a list of neighbors that send to theirneighbors and through Hello messages the presence is periodically notified.Indeed, having a small neighborhood, limit a peer in:

. distributing/receiving;

. more easily to be out of service.


Gossiping is not flooding: no rules are imposed to reach all nodes with agiven update information.

As in BitTorrent, the neighborhood from which the peer exchange trafficis reduced: the peer select them based on:

. their workload or capacity ;

. path characteristics: RTT, loss probability (but are time variable pa-rameters and have to be measured);

. content availability.

Mesh Topology

Neighbors

Neighbors to exchange traffic

Data delivery Contents are divided into pieces called chunks that aretreated independently, therefore they can follow different distribution trees.

Policies to distribute chunks are local, so there is no a wide coordination;scheduling mechanism are basically:

. push: decisions are taken by the transmitter;


. pull: decisions are taken by the receiver.

With a push, the peer, based on the chunk, send it to neighbor withoutnegotiation while with a pull, is the receiver that requests the desired chunk:this implies having some knowledge about contents.

Push Pull

short delays (no negotiation) requires more signalling

multiple copies (waste of bw) no multiple copies

possible losses larger delays

Push may suffer of losses when, due to multiple copies, the bandwidth is notenough.

Strategies Let:

. u and v be peers and neighbors;

. c(u) be the set of chunks held by u;

. C ∈ c(u) be the set of chunks sent by u.

Strategies are methods to decide what transmit or request ,based on Cand v, chunks and neighbors.

. first peer selection:

. random selection;

. random selection of useful peer (the one that need somethingfrom u)

v such that c(u) \ c(v) 6= 0

here is very important to keep in mind what is the order of selec-tion: if first is the peer, there are constraints on chunks to deliver,while on the contrary, a chunk-peer selection implies having con-straints on the peer;

. most deprived peer (the one that can receive a lot of chunks fromu);

. first chunk selection:

. random selection;

. random useful selection;

. latest blind chunk (the most recent chunk, with respect to sourcegeneration, is sent: it is the more urgent needed by peers, the onewith the tight delay constraint);


. latest useful.

Indeed, another concept similar to BitTorrent, is that latest chunks are heldby few peers, so it is better to distribute to make them safe (not lost easily).

Examples

. random peer/latest blind: this combination pushes every time the lastchunk; if the source is greedy, the service perceived is good because,due to latest blind, does not matter which chunks hold the receiver;properties are:

. little overhead;

. minimum delay;

. possible losses and duplicates;

. most deprived/latest useful: first is selected the peer that hold the lessand is sent to him the latest useful chunk; this implies that peers musthave knowledge about chunks held by their neighbors; properties are:

. large overhead;

. large delays.

Performances There are two complementary indices:

. diffusion rate r(t): probability that a generic peer receives a chunk ina time smaller than t; it gives ideas on delays;

Diffusion rate

Given a time, how many peer is possible reach?

. diffusion delay: it is the delay that a chunk takes to reach a fraction1 − ε of peers; fixed ε, i.e. 5%, the diffusion delay measure the timeneeded to reach the 95% of peers.

Diffusion delay

Given a population, which time take to reach a part of it?


Relation delay-losses Since users consume the content while they aredownloading it, the delay should be the shortest as possible:

Source t

1 2 3∆t ∆t

Layer 3 Network

Peer t

1 2 3

The delay of layer 3 network is composed by a combination of scheduling andbuffer policies, possible congestion and propagation delay; but also delays oflayer 4 and 7 have to be considered, therefore each packet is received witha different delay: the variability of delay is called jitter. Moreover, it canhappen that packets are received out of order:

Source t

1 2 3∆t ∆t

Peer t

1 3 2

When the first packet had stared to be played, the codec needs that exactlyafter ∆t the second is ready and so on, therefore out of order packets arevery dangerous: they decrease the quality perceived. To deal with this factis possible introduce an initial playout delay : it is artificial and used just toincrease the probability of receiving right chunks before play them.


Source t

1 2 3∆t ∆t

Peer-received t

1 3 2

Peer-played t1 2 3

playout delay ∆t ∆t

The trade off delays-losses is:

. higher delay −→ no losses −→ high quality perceived;

. lower delay −→ possible losses −→ low quality perceived.

Losses can be due to:

. packets/chunks never received;

. packets/chunks received late.

The second category is much more critical because packets late receivedare useless and are an useless waste of resources (in terms of bandwidth).Indeed, the following picture highlights this fact:

current chunk played

useless info buffer

chunk loss chunk ownedchunk not already received

Chunks that are not received that in the sequence order are put previously ofthe current chunk going to be played are useless, while if they are consequentthey can still be received: therefore a buffer is needed to store them. Byadopting a policy in which chunks are selected by latest blind, the one neededis shown in orange in the previous picture. The use of buffer allows to reachsome synchronization: indeed peers are interested in the same content atthe same time, so in a situation like:


Peer 1


Peer 2


the two peers are not interested in communicate each other. The conse-quence is that chunks has not the same relevance as in BitTorrent: some ofthem are more urgent and other can become useless if not received in time.

The information needed by peers to communicate what chunks they cantransmit is the buffer map (BM): it is a map that describe owned and notowned chunks by a given peer. For example:

1 0 0 1 1

The exchange of buffer maps can happen:

. periodically (issues in choosing the period: long implies delays, smalloverhead);

. at each received chunk.

When a peer has to re-distribute a chunk:

A

B C D

new chunk received

the temporal diagram, considering a pull policy, is:


t

A B Crequest from B

request from C

received B received C

BMtoB

BMto

C

chunkto

B

chunk

to C

delay A-B

delay A-C

It is possible conclude that, the exchange of buffer maps introduce a furtherdelay, while, considering a push policy, the temporal diagram is:

t

A B Crequest from B

request from C

received B received C

BMtoB

BMto

C

chunkto

B

chunk

to C

delay A-B

delay A-C

The delay is reduced, but if B and C do not request that particular chunk,the bandwidth is waste with no meaning behind.

There are some proposes to reduce the delay by using a pull policy:

. select peer based on RTT: indeed the exchanging phase of buffer mapsallows to measure the RTT, therefore if the peer is selected based onthat measure, decision could be better; a possible drawback is that,distance-based decisions may degenerate in partition the network, interms of connectivity: locality is introduced;

. select peer based on probability: with p select randomly, with 1 − puse RTT measures;

. wait an amount of time t before select the same peer again: this allowsto inhibit the selection of the same peer to not favourite him;


. bandwidth aware policy: reduce delays by exploiting at the best thebandwidth, especially when peers have different upload bandwidthbecause the delivery is favourite to nodes that have it more; the pictureshow this fact, emphasising that peers with more bandwidth have largesize:

statistically, this allows to reduce the number of hops because treesare much more short; the main issue is the detection of the uploadbandwidth.

Issues

. Fairness: very evident in bandwidth aware policy, some nodes maydistribute more than they receive.

. Depending on the codec, is not possible increase too much the down-load bandwidth, therefore the quality is bounded.

. Content aware: to improve efficiency is possible change codec, butchunks do not have all the same importance, so the ones more relevanthave to be transmitted in such a way to be sure that they can bereceived.

. Costs for ISPs.


Chapter 2

Random graphs

2.1 Introduction and definitions

Random graphs are created through rules that provide randomicity: theyare use to model and describe systems with many components and highcomplexity. Application fields are:

. model the internet (layer 3 network);

. model the web, www (interconnection when browse a page, layer 7network);

. network designing;

. biology;

. social networking.

P2P systems are based on overlays: a way to model them is through randomgraphs. This kind of models are used to:

. understand the system;

. tuning parameters;

. design choices;

. performance evaluation (in simulation, for example, evaluation of scal-ability).

Definitions

. Graph: composed by:

. nodes/vertices;

. edges/links;

53

54 CHAPTER 2. Random graphs

. neighbor: node connected directly through a link;

. degree: number of neighbors of a given node;

. component: subset of nodes connected each other through links(more components inside the graph implies having a disconnected net-work because picking up two nodes from two different componentsthey are not reachable);

. giant component: a finite fraction of nodes belonging to the samecomponent (if the number of nodes is high and there is a giant compo-nent the network has a very good connectivity; in biology scenario, toisolate viruses, the presence of giant component is bad because allowsinfections easily: better have a low connectivity);

. clustering: the probability that two nodes are neighbors increases ifthey have at least one neighbor in common;

. clustering coefficient: the average probability that two neighborsof a given node are neighbors too;

. radius (around a node): the distance (in number of hops) to reachany node from a given node.

2.2 Erdos-Renyi Model

Given:

. n: nodes;

. p: probability that a link between two nodes exists.

the the resultant graph is called G(n , p). Another equivalent definition is:G(n , p) is a set of graph of n nodes and each graph appears with a probabilitythat is typical of the number of links. Indeed, considering:

. n: nodes;

. m: links;

there are many combinations that have a certain probability of appear:

P = pm · (1− p)M−m

where M is the total number of possible links. Analysing each term:

. pm: is the probability that exactly m links are present;

. (1− p)M−m: is the probability that all other links do not exist.

2.2. Erdos-Renyi Model 55

Definitely P is the probability that a given graphG appears, but it is possiblebuild several graphs over the same number of nodes; consider another ofthem, G

′:

P (G) = P(G

′)

M is the total number of links, therefore is the number of links if thetopology is a full mesh:

M =n · (n− 1)

2where the division by 2 is necessary since directions of links do not count.

2.2.1 Average degree

The average degree of a node is the average number of links that he has, itdepends on the graph and it is a random variable. The average degree canbe computed as the probability of the total number of possible links dividedby the number of nodes.

. M · p is the average number of links generated by the process (thenumber of potential links times the success probability);

. n is the number of nodes.

Actually this is not enough because to be precise one link consist of twoend-links that connects two nodes, therefore the average number of linksgenerated is 2 ·Mp. In conclusion:

avg{degree} =2 ·Mp

n=n · (n− 1) · p

n= (n− 1) · p

The average degree can also be written as z or < κ >. For large number ofn:

z = (n− 1) · p ∼ n · p

Values of z

. z = 1: is a critical value.

. z > 1: with high probability there is a giant component.

. z < 1: there is not a giant component.

Clustering coefficient

The clustering coefficient perceived is:

c = p

therefore:c = p =

z

nfor large values of nodes present.


2.2.2 Degree distribution

Given κ random variable describing the degree and Pκ the probability thatthe degree of a node is equal to κ, it is possible say that:

Pκ =

(n− 1

κ

)pκ · (1− p)(n−1)−κ

where:

. n − 1 are the total number of possible experiments: all nodes minusthe one considered;

. κ is exactly the number of successful experiments.

If n� (κ · z):

Pκ =zκ · e−κ

κ!

and it is a Poisson distribution with parameter z: it means that E[Pκ] = z.This approximation is due to the fact that the binomial distribution tendsto a Poisson for large numbers of n and small numbers of κ.

2.3 Bender-Canfield Model

This model deals with random graphs that have a given non-Poisson degreedistribution. Graphs are built in two steps:

. assign edge-ends to nodes (for each value of the degree probabilitydensity function, edge ends are assigned accordingly);

. randomly connect edge-ends.

This is a different way to build random graphs with respect to the Erdos-Renyi model because positions are independent and no notions of locality ispresent.

Following sections deal with properties derived by this model.

2.3.1 Node reachability

The node reachability property studies the possibility of having a giant com-ponent: if nodes are easily reached, it means that the probability of having agiant component increases, while, on the contrary, a bad reachability implieslow connectivity and therefore, the giant component will not be present.

Consider the following topology, in which, starting from a given node(marked in orange) the reachability of 1-hop (in light-blue) and 2-hop (inviolet) neigbors is studied:

2.3. Bender-Canfield Model 57

. 1-hop neighbors: their number is the degree;

. 2-hop neighbors: to compute their number, the distribution degreeof 1-hop neighbors is required; in principal, each node has the sameprobability Pκ to be picked, but 1-hop neighbors are not picket ran-domly: if a node has an higher degree, it has much more probabilityto be picked, so the rule is κ · Pκ.

To understand this concept, consider the star topology:

in which n nodes are composed in such a way:

. the center with degree n− 1;

. n− 1 nodes with degree 1.

From this is possible to derive:

κ

Pκ

n− 1

n

1 n− 1

1

n


where the heigh is proportional to the degree and, to be a distribution,is normalized. By starting from the center, the degree perceived is 1, butstarting from any other node the degree perceived is n−1 because the centeris easy to reach. Therefore:

κ

Pκ

n− 1

n

1 n− 1

1

n

If each node counts proportionally to his degree, the center counts:

(n− 1) · Pκ

because is reached many times, while any other node counts:

1 · Pκ

In conclusion, it is possible say that the general distribution of 2-hop neigh-bors is proportional to:

κ · PκOf course it is not a distribution because it does not sum to 1. Since from1-hop neighbors also the initial node is reachable, it does not have to beaccounted, therefore new nodes reachable are κ − 1. It implies that theprobability density function of new nodes reachable in 2 hops is:

qκ−1 ∼= Pκ · κ

Therefore:qκ = Pκ+1 · (κ+ 1)

To be a distribution:

qκ =Pκ+1 · (κ+ 1)∑

j j · PjThe average is given by:

Avg{qκ} =∞∑κ=0

κ · qκ =∞∑κ=0

κ · Pκ+1 · (κ+ 1)∑j j · Pj

By substituting i = κ+ 1 =⇒ κ = 0 i = 1:

Avg{qκ} =

∞∑i=1

Pi · i(i− 1)∑j j · Pj

=

∞∑i=1

Pi · (i2 − i)∑j j · Pj


By splitting the numerator into two sums:

Avg{qκ} =

∑∞i=1 Pi · i2 −

∑∞i=1 Pi · i∑

j j · Pj

Now:

.∑∞

i=1 Pi · i2 is the second moment < κ2 >;

.∑∞

i=1 Pi · i and∑

j j · Pj are first moment (average) < κ >.

Therefore:

Avg{qκ} =< κ2 > − < κ >

< κ >

Since this represents the average number of nodes discovered in two hopsit will be denoted with z2. Till now are considered just 2-hop neighbors ofone 1-hop neighbor of a given node; the following picture shows this fact byhighlighting the paths mentioned in red:

Of course, the initial node has more neighbors so, to compute exactly z2 allof them have to be considered: to do this, it is just needed to multiply z2by the number of nodes of the initial node and this number is the degree< κ > (also possible to call z1 to emphasize that counts 1-hop reachableneighbors):

z2 =< κ2 > − < κ >

< κ >· z1 =

< κ2 > − < κ >

< κ >· < κ >=< κ2 > − < κ >

The formula shows how the number of reachable nodes growths: the domi-nant value is < κ2 >.


Example

If the distribution is Poisson (it is the case of the Erdos-Renyi model) thevariance is equal to the mean value and:

< κ >=< κ2 > −(< κ >)2 =⇒ < κ2 >= (< κ >)2+ < κ >

Therefore:

z2 =< κ2 > − < κ >= (< κ >)2+ < κ > − < κ >= (< κ >)2

�Starting from z2, by iteration, it is possible discover that:

zm =< κ2 > − < κ >

< κ >· zm−1

Since:

. z2 =< κ2 > − < κ >

. z1 =< κ >

the result is:

zm =z2z1· zm−1 =

(z2z1

)m−1· z1

By analysing the fraction z2/z1:

. if: (z2z1

)< 1

when m grows (the distance grows) it seems like a constant, so thereis bad connectivity: it implies that there is not a giant component;

. if: (z2z1

)> 1

on the contrary, all conditions lead to have a giant component;

. if: (z2z1

)= 1

there is the so called critical condition: it is difficult study the be-haviour.


Example

Focusing on the Erdos-Renyi model in critical conditions:

z2 = (< κ >)2z2z1

= 1 =⇒ (< κ >)2

< κ >= 1

therefore:< κ >= 1

Conditions that lead to a giant component is:

(< κ >) > 1

Since:z2z1

=< κ >

It is possible discover that:

zm = (< κ >)m−1 ·z1 =⇒ zm = (< κ >)m−1· < κ > =⇒ zm = (< κ >)m

it means that the discovering process of reachable nodes grows geometrically.

�

2.3.2 Small-world effect

This effect tells that considering a network with a large number of users,the distance between them is relatively small because some of users are verywell connected.

Assuming:z2z1� 1 (2.1)

for sure there is a giant component, therefore the network is very well con-nected. Now m represent the distance between a given node and any other:each iteration (1 , 2 , . . . , m) allow to discover a very high number of newnodes, but is the last iteration, the one that allows to reach nodes at distancem, that lead to discover more nodes. As a consequence, the mean value isdominated by the last hop. If n is the number of nodes, when zl ∼= n, themaximum distant nodes are reached and, thanks to hypothesis 2.1, for sureis possible reach them. In formulas:

zl =

(z2z1

)l−1· z1 = n

By taking the logarithm:

log

(z2z1

)l−1= log

n

z1=⇒ l − 1 =

log n/z1log z2/z1


In conclusion:

l =log n/z1log z2/z1

+ 1

where such l is the average distance inside the network: it is also calleddiameter. The parameter l grows as the logarithm of n: if the number ofnodes is very large, l does not grow too much, therefore the small-worldeffect is ensured. It also means that randomly built graphs have a shortestdistance.

Since in the Erdos-Renyi model z1 =< κ >= z and z2 = (< κ >)2 = z2:

l =log n/z1

log z+ 1 ∼=

log n/z1log z

∼=log n− log z

log z∼=

log n

log z

This behavior is also valid for trees topologies, while regular structures:

. the ring has an average distance that grows with n (because it is n/2);

. a grid topology in which there are n2 nodes has an average distancethat grows with

√n.

It means that regular structures have intrinsically worst performances be-cause:

. have higher distances;

. are less robust to churning (maintenance is hard).

Example

Consider an average delay D = 0.2 s; to not exceed a maximum averagedelay R = 1 s the distance l should be computed as:

l ·D‡ < R

By using:

l ∼=log n

log z·D < R

It is possible obtain:

log z >log n

R·D

Consider:

. n = 104 =⇒ log z > (4 · 0.2) =⇒ z > 6.3

. n = 106 =⇒ log z > (6 · 0.2) =⇒ z > 15.8

‡ This term, l ·D, shows the average delay to reach the farest node.


It means that the degree increases by a factor of 3 every time the numberof nodes increase by a factor of 100.

�

Focusing on the critical condition, it is possible say that:

z2z1

= 1 =⇒ z2 = z1

Therefore:

< κ2 > − < κ >=< κ > =⇒ < κ2 > −2 < κ >= 0

This is:∞∑κ=0

κ · (κ− 2) · Pκ = 0

By analysing this expression, it is clear that terms with κ = 0 , 1 , 2 have noeffect on the final result (the occurrence of the giant component) because:

. terms with κ = 0 are isolated nodes;

. in terms of reachability, κ = 1 , 2 are the same:

=

2.3.3 Clustering

The following analysis are performed for any distribution that is not Poisson;the clustering property shows the probability that two neighbors of a givennodes are neighbors. To be verified, it is need that the orange link in thefollowing picture is established:

A

B

C

Therefore the clustering coefficient describe how much locality is introducedinto the network. Considering that:

. node B has connectivity κi;


. node C has connectivity κj ,

the clustering coefficient is given by:

c =< κi > · < κj >

n · zwhere:

. the numerator represents the all ways in which is possible connect thetwo nodes;

. the denominator represents the average number of links in the networkbecause is given by the number of nodes n multiplied by the averagedegree of each node z.

For 1-hop neighbors the distribution is qκ and it is independent looking addifferent nodes, therefore:

c =< κi > · < κj >

n · z=

1

n · z·

[∑κ

κ · qκ

]2=

1

n · z·[< κ2 > − < κ >

< κ >

]2By multiplying and dividing by z2:

c =z

n·[< κ2 > − < κ >

(< κ >)2

]2=

Now, to the numerator is added and subtracted the quantity (< κ >)2:

c =z

n·[< κ2 > −(< κ >)2 + (< κ >)2− < κ >

(< κ >)2

]2In this way is possible recognize, within the numerator, the variance. Sincethe coefficient of variation is defined as:

cv =

√var

avg=

√Var {< κ2 > −(< κ >)2}

< κ >

within the clustering coefficient it is possible recognize the square:

(cv)2 =

< κ2 > −(< κ >)2

(< κ >)2

Therefore:

c =z

n·[(cv)

2 +< κ > −1

< κ >

]2Since the clustering coefficient depends on the square of coefficient of vari-ation, the dominant value is the variance. In conclusion the variance isextremely important: it ensures high connectivity and introduces locality.

2.4. Heavy-Tailed Distribution 65

Variance

Giant component Clustering coefficient

Example

Using this formulas for the Erdos-Renyi model:

(cv)2 =

Var {κ}(< κ >)2

=< κ >

(< κ >)2=

1

z

Therefore:

c =z

n·[

1

z+z − 1

z

]2=z

n· 1 =

n · pp

= p

Indeed, p is the probability that two nodes have a link that connect them,so it is the also the clustering coefficient.

�

2.4 Heavy-Tailed Distribution

The heavy-tailed distribution (also called power-law) is used to representphenomena like P2P systems, the topology of the Internet, how much a clientis connect (temporarly) and social networks: they both have in common thefeature that their distribution does not decrease as an exponential, thereforeare not representable through a Poisson distribution. It means that theprobability of having large values is not negligible; it is:

Pκ ∼= α · κ−γ

and such γ can take, typically, values:

2 < γ < 3

In mathematical terms, this systems has a finite average, but infinite vari-ance since the second moment tends to infinite:∫ ∞

nκ2 · Pκ dκ −→∞

This behavior is not really good, because both the small world and clusteringproperty depends largely on the variance. But the distribution comes frommeasures and the tail is typically difficult to estimate precisely.


Scale-free property

The scale-free property says that after this change:

κ −→ λ · κ

the shape of the distribution does not change. But, the mean value, isnot too much representative of system described before: think at the timeconnectivity. There are few users that have very long time connections whilethe major part of users have short time connections.

2.5 Watts-Strogatz model

This model represent a family of random graphs that is obtained as anintermediate solution between pure random-graphs and regular structures.This interpolation allows to provide both peculiar properties of the twofamilies:

. regular structures (lattices): notion of locality (clustering);

. random graphs: small world effect.

By considering a regular structure (a ring, for example), a Watts-Strogatzmodel is built introducing randomicity:

The connectivity, considering a given node (marked in blue in the picture),is:

. m nodes in the clockwise order;

. m nodes in the counter clockwise order.

Therefore, each node has a degree equal to 2m. The average distance be-tween nodes grows linearly with the number of nodes n in the network:thanks to short cuts (as in Chord) it is possible reduce it. Indeed, theprocess to obtain a Watts-Strogatz model is:

. for each node:

2.5. Watts-Strogatz model 67

. take each clockwise link;

. rewire it randomly with a probability p (or maintain it with aprobability 1− p).

The following picture shows this procedure:

=⇒

Properties mentioned before (small-world effect and clustering) dependson p:

. if it is large, the system tends to be a pure random-graph (for p → 1tends to be a Erdos-Renyi graph);

. if it is small, the system tends to be a regular structure with highclustering (long fixed routes to reach farthest nodes).

2.5.1 Clustering analysis

When p = 0, the clustering coefficient is:

c =3 · (m− 1)

2 · (2m− 1)

therefore depends basically on m, but it is very high (greater than 0, whilefor Erdos-Renyi is something near 10−4). It means that, the probabilityfor two nodes of being neighbors is high if they have a common neighbor.Indeed, look at the following picture:

the green nodes are neighbors and have a common neighbor: the blue node.This behavior has to taken into account not just considering the degree of


a node, but considering the degree for all of them: the result is a very highlocality.

When p > 0:

c =3 · (m− 1)

2 · (2m− 1)· (1− p)3

it means that when p increases, the connectivity based on locality decreases.

2.5.2 Small-world analysis

The small-world property describe the distance between nodes. The averagedistance depends on the number of nodes in regular structures: if it is a grid:

. with 2 dimensions, the complexity is O(√n);

. with 3 dimensions, the complexity is O( 3√n);

In general:l ∼ O(n)

Look at the following graph:

In the region placed at the left top values of p leads to a regular structure,while the bottom right region describe random graphs. In the center thereis a zone in which are satisfy both the small-world property and clustering.

Considering the ring, it is possible say that, by introducing few shortcuts (few with respect to the number of links) the small-world propertystart to be ensured because those short cuts connect very far nodes. Whenthe number of short cuts inserted increases, their benefit decreases: it isbetter, indeed, introduce few of them an use just to reach farthest regions,then use the locality connections to reach the destination.

With short cuts, the size of regions obtained by splitting is given by:

∼ n∼ np

∼ 1

p‡

where:‡ The complexity of this formula il linear.

2.6. Theory of evolving networks 69

. ∼ n is the space size (number of nodes);

. ∼ np is the number of short cuts introduced.

To ensure the small-world property:

1

p� n =⇒ p� 1

n

If the network is large, the small-world property is ensured by having p alsosmall.

To guarantee clustering:

p� 1‡

In conclusion, to have simultaneously the small world effect and clustering,is necessary have:

1

n� p� 1

This model has been largely used to model P2P systems: for example, inP2P streaming system, too much locality lead to obtain bad performancesbecause to reach, with a chunk, the entire network it take a very large time(so the delay increases). To deal with this fact, sometimes neighbors arerandomly picked: this can be seen as a short cut. In BitTorrent happensthe same: to diversify the content downloadable, neighbors are not alwaysselected based on the tit-for-tat procedure, but sometimes are randomlyselected.

2.6 Theory of evolving networks

This model takes care of the evolution of the network: how the overlayevolves in time. The algorithm:

. define a graph with n final nodes;

. starts with m0 nodes, where m0 < n (such m0 is the initial condition);

. at each step a node is added: it takes n−m0 steps to build the finaltopology.

The time evolution is characterized by the fact that, at each step, nodeshave a different degree: depending on the policy adopted, the system canevolve differently. Basically, the simplest policy is add new nodes to onesthat have an higher degree: this helps to reach more nodes with shortestpath.

‡ This term is due to the term (1− p)3.


Definitions

. s: time in which a node is introduced, it represents the age (oldernodes has more chance to be well connected);

. κs: degree of the node introduced at time s; it is described with a dif-ferential equation: to simplify the math, it is assumed to be continuous(κs(t));

. m: links of the new node.

The evolution of the system is described by:

∂ κs(t)

∂ t= m · π(κs(t)) (2.2)

The increase of the degree depends on the number of links and is proportionalto the degree itself. The term π(·) is a function that describe how new nodesare connected to the already existent network: it is the connection policyand can be considered as the term that describe the system evolution. Atthe beginning the degree is:

κs(s) = m

Barabasi-Albert criterion

This approach says that scale-free networks are built with a preferentialattachment criterion. The algorithm is:

. start with an initial graph;

. at each step a node is attached (m links);

. links are preferentially attached to nodes based on their degree:

π(κs(t)) =1∑

j κj(t)· κs(t) (2.3)

The term: ∑j

κj(t)

is a normalization coefficient that describe, statistically, the amount of allpossible degrees of links.

By substituting (2.3) in (2.2), it is possible obtain:

∂ κs(t)

∂ t=

m · κs(t)(2mt+ 2m0 < κs >)

(2.4)

where:

2.6. Theory of evolving networks 71

. 2mt represents the links already introduced in the network;

. 2m0 < κs > is the initial distribution of the degree since m0 is thenumber of links at time s and < κs > is the average degree at thebeginning.

The denominator is, globally, the coefficient of normalization seen in (2.2).The equation (2.4) shows that at each step t, 2m new links are introduced:this is the contribution of the degree of two different nodes.

At the beginning:

κs(s) = m < κs >= 2m

At the end:

κs(s) ∼= m ·(t

s

)1/2

for t→∞

This suggest that the degree increases as a square root function in time; thedenominator is s and represent the current node: the degree is high if thenode is older, therefore it depends on the age of nodes. Consider a node s

′

older than s where:

s′< s < t

The ratio:κs′ (t)

κs(t)∼=( ss′

)1/2Looking at large values of t:

Pκ = 2 ·m2κ−3

therefore the probability that a node has degree κ is a heavy-tailed distribu-tion: the scale-free property is ensured. For what concern the small-worldproperty and the clustering:

l ∼ log n

log logn

c =m

8n· (log n)2

The small-world property is expected because there are few nodes very wellconnected: they are the oldest nodes. The clustering property is similar tothe Erdos-Renyi model in which decreases with the number of nodes.


2.7 Resume scheme

Model Small-world Clustering

ER l ∼=log n

log zc = p ∼ 1

n

RG with empirical distr. l =log n/z1log z2/z1

c =z

n·[(cv)

2 +z − 1

z

]WS (p = 0) l ∼ n non ensured c =

3 · (m− 1)

2 · (2m− 1)

WS (p > 0) ensured high clustering

BA ensured low clustering

For random graphs with empirical distribution, both, small-world prop-erty and clustering depends on the variance: with power-law the scale-freeproperty is ensured.

For Watts-Strogatz the value p should be taken:

1

n� p� 1

Documents

P2P and multimedia applications over the Internet