55
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “Complex Systems” Mohammad Al-Rifai

Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai

Embed Size (px)

Citation preview

Farnoush Banaei-Kashani and Cyrus Shahabi

Criticality-based Analysis and Design

of Unstructured P2P Networks as “Complex Systems”

Mohammad Al-Rifai

2 December 2003 Mohammad Al Rifai

Outline

Introduction Motivation Flooding search

Probabilistic Flooding Percolation Theory

TTL selection policy Summary Questions

2 December 2003 Mohammad Al Rifai

Introduction• Motivation

- improving scalability of flooding search applied in

unstructured P2P networks (Gnutella)

• Proposed approach- recognizing P2P networks as Complex Systems, and

exploiting the accurate statistical models used to characterize

them for formal analysis and efficient design of P2P networks.

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

• Each query is flooded through the entire network

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

• Each query is flooded through the entire network

Algorithm: – a node initiates a query, sets TTL value, sends the query to all of

its neighbors.

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

• Each query is flooded through the entire network

Algorithm: – a node initiates a query, sets TTL value, sends the query to all of

its neighbors.– each receiver of the query decrements TTL by one, forwards the query to its

neighbors in turn, and so on

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

• Each query is flooded through the entire network

Algorithm: – a node initiates a query, sets TTL value, sends the query to all of

its neighbors.– each receiver of the query decrements TTL by one, forwards the query to its

neighbors in turn, and so on– the flooding continues till the object is found.

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

• Each query is flooded through the entire network

Algorithm: – a node initiates a query, sets TTL value, sends the query to all of

its neighbors.– each receiver of the query decrements TTL by one, forwards the query to its

neighbors in turn, and so on– the flooding continues till the object is found.

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

Problems: – extra overhead through duplicated queries- initial TTL is set regardless

of the size of the network

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

Problems: – extra overhead through duplicated queries- initial TTL is set

regardless of the size of the network

does not scale

2 December 2003 Mohammad Al Rifai

Introduction• Flooding search

Problems: – extra overhead through duplicated queries- initial TTL is set

regardless of the size of the network

does not scale Proposed solutions:

1- Probabilistic flooding search2- TTL self selection policy

2 December 2003 Mohammad Al Rifai

I- Probabilistic Flooding• Each node forwards the query to its neighbors

with probability p, and drops the query with probability

(1 – p). • The normal flooding search is an extreme case of probabilistic flooding with p =1.

2 December 2003 Mohammad Al Rifai

• Each node forwards the query to its neighbors with probability p, and drops the query with probability

(1 – p). • The normal flooding search is an extreme case of probabilistic flooding with p =1.

I- Probabilistic Flooding

• By decreasing the value of p, the probabilistic flooding cuts some paths

(not only redundant ones).

2 December 2003 Mohammad Al Rifai

• Each node forwards the query to its neighbors with probability p, and drops the query with probability

(1 – p). • The normal flooding search is an extreme case of probabilistic flooding with p =1.

I- Probabilistic Flooding

• By decreasing the value of p, the probabilistic flooding cuts some paths

(not only redundant ones).

2 December 2003 Mohammad Al Rifai

• Each node forwards the query to its neighbors with probability p, and drops the query with probability

(1 – p). • The normal flooding search is an extreme case of probabilistic flooding with p =1.

• decreasing the value of p furthermore towards 0, cuts more and more paths, and turns out law reachability, thus an inefficient search.

I- Probabilistic Flooding

2 December 2003 Mohammad Al Rifai

• Each node forwards the query to its neighbors with probability p, and drops the query with probability

(1 – p). • The normal flooding search is an extreme case of probabilistic flooding with p =1.

• decreasing the value of p furthermore towards 0, cuts more and more paths, and turns out law reachability, thus an inefficient search.

I- Probabilistic Flooding

2 December 2003 Mohammad Al Rifai

• Goal:all redundant paths must be cut effectively to eliminate duplicated queries and avoid the overhead cost, while full reachability must be preserved.

• How? p must be tuned to an optimal (critical) operating

point pc.

to achieve that, the system must be formally modeled.

I- Probabilistic Flooding

2 December 2003 Mohammad Al Rifai

• Formalizing and modeling the P2P networksunstructured P2P networks are large-scale, dynamic, and self-configure systems, which are the main characteristics of Complex Systems. Hence, P2P networks can be recognized as Complex Systems, and theoretical and statistical models applied on Complex Systems can be exploited with P2P networks.Percolation Theory is one of the most important theories applied on Complex Systems that can help to find the critical value pc.

I- Probabilistic Flooding

2 December 2003 Mohammad Al Rifai

Given a 2D lattice of some sites (dots) and bonds (lines) connecting neighboring sites as shown

I- Probabilistic Flooding – Percolation Theory

2 December 2003 Mohammad Al Rifai

Given a 2D lattice of some sites (dots) and bonds (lines) connecting neighboring sites as shown

I- Probabilistic Flooding – Percolation Theory

(in terms of P2P networks, sites are nodes and bonds are links between them)

2 December 2003 Mohammad Al Rifai

Given a 2D lattice of some sites (dots) and bonds (lines) connecting neighboring sites as shown

Assuming that each bond can beopen with probability p, orclosed with probability (1 – p).

depending on p, some clusters (sites connected by open bonds) starts to appear.

I- Probabilistic Flooding – Percolation Theory

(in terms of P2P networks, sites are nodes and bonds are links between them)

2 December 2003 Mohammad Al Rifai

Given a 2D lattice of some sites (dots) and bonds (lines) connecting neighboring sites as shown.

I- Probabilistic Flooding – Percolation Theory

(in terms of P2P networks, sites are nodes and bonds are links between them)

The larger the value of p, the larger thesize of clusters is.

Assuming that each bond can beopen with probability p, orclosed with probability (1 – p).

2 December 2003 Mohammad Al Rifai

Given a 2D lattice of some sites (dots) and bonds (lines) connecting neighboring sites as shown.

Giant cluster

I- Probabilistic Flooding – Percolation Theory

Due to Percolation Theory:

above a threshold probability pc, a giant cluster spanning the whole lattice starts to appear.

(in terms of P2P networks, sites are nodes and bonds are links between them)

Assuming that each bond can beopen with probability p, orclosed with probability (1 – p).

2 December 2003 Mohammad Al Rifai

I- Probabilistic Flooding – Percolation Theory

- Unstructured P2P networks are random graphs of size N ∞, with connectivity distribution P(k).

- nodes and links between them may be thought of as sites and bonds respectively in terms of Percolation Theory.

2 December 2003 Mohammad Al Rifai

- Unstructured P2P networks are random graphs of size N ∞, with connectivity distribution P(k).

- nodes and links between them may be thought of as sites and bonds respectively in terms of Percolation Theory.

I- Probabilistic Flooding – Percolation Theory

Percolation Theory verifies that once probabilistic flooding is applied, above a threshold pc

the giant cluster spansthe whole network with minimum connectivity.

2 December 2003 Mohammad Al Rifai

- Unstructured P2P networks are random graphs of size N ∞, with connectivity distribution P(k).

- nodes and links between them may be thought of as sites and bonds respectively in terms of Percolation Theory.

I- Probabilistic Flooding – Percolation Theory

Percolation Theory verifies that once probabilistic flooding is applied, above a threshold pc

the giant cluster spansthe whole network with minimum connectivity.

How could pc be computed ?

2 December 2003 Mohammad Al Rifai

Analysis:

The following assumption has been made:

“percolation threshold takes place when each node i connected

to a node j in the spanning cluster, is also connected to at

least one other node”

I- Probabilistic Flooding

i j

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

ki : the degree of node i

Expected value of ki

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

(1) 2 ) ( ik

iii jikPkjik

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

(1) 2 ) ( ik

iii jikPkjik

Conditional probabilityof a node i having ki degree, given that it

is connected to j

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

(1) 2 ) ( ik

iii jikPkjik

)(

)() (

)(

),( ) (

jiP

kPkjiP

jiP

jikPjikP iii

i

But due to Bayes rule,

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

(1) 2 ) ( ik

iii jikPkjik

1

) ( and 1

)(

N

kkjiP

N

kjiP i

iwhere,

)(

)() (

)(

),( ) (

jiP

kPkjiP

jiP

jikPjikP iii

i

But due to Bayes rule,

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

(1) 2 ) ( ik

iii jikPkjik

1

) ( and 1

)(

N

kkjiP

N

kjiP i

iwhere,

)(

)() (

)(

),( ) (

jiP

kPkjiP

jiP

jikPjikP iii

i

But due to Bayes rule,

N : total number of nodes

2 December 2003 Mohammad Al Rifai

Analysis:

this criterion can be written as follows:

I- Probabilistic Flooding

2 jiki

(1) 2 ) ( ik

iii jikPkjik

1

) ( and 1

)(

N

kkjiP

N

kjiP i

iwhere,

)(

)() (

)(

),( ) (

jiP

kPkjiP

jiP

jikPjikP iii

i

But due to Bayes rule,

Thus, at criticality:

2 2

k

k

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

),(kPGiven the connectivity distribution of the networkusing probability flooding results in the effective connectivity distribution as follows:)(kPe

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

)()1( )( nPppk

nkP knk

kne

(2)

),(kPGiven the connectivity distribution of the networkusing probability flooding results in the effective connectivity distribution as follows:)(kPe

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

holdmust 2 point criticalat 2

e

e

k

k

)()1( )( nPppk

nkP knk

kne

(2)

),(kPGiven the connectivity distribution of the networkusing probability flooding results in the effective connectivity distribution as follows:)(kPe

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

holdmust 2 point criticalat 2

e

e

k

k

(2) using computed are and moments second andfirst 2ee kk

)()1( )( nPppk

nkP knk

kne

(2)

),(kPGiven the connectivity distribution of the networkusing probability flooding results in the effective connectivity distribution as follows:)(kPe

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

0 kn

)()1(k

n

k

knke nPppk k

0 0

1 n

n

k

knk p)(pk

nk P(n)

0

)( n

nnPp

kp …(3)

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

0 kn

22 )()1(k

n

k

knke nPppk k

0 0

2 1 n

n

k

knk p)(pk

n kP(n)

) )1( ( )( 22

0

pnpnpnPn

kppkp )1( 22…(4)

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

1

1 2 )1(

22

ccce

e ppk

kp

k

k…(5)

from (3) and (4) the ratio of the second to first moment is:

k

k 2

where is the ratio of the second to first moment of the actual graph.

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

…(5)1

1 2 )1(

22

ccce

e ppk

kp

k

k

from (3) and (4) the ratio of the second to first moment is:

k

k 2

where is the ratio of the second to first moment of the actual graph.

Gnutella network follows power-law connectivity distributionvkeCkkP / )( of formin i.e.

C is a normalization factor

Power-law exponent

(6)

Exponential cutoff factor required for representing

real-world networks

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

the ratio α is computed from equation (6),

)(Li

)(Li

/ 11

/ 12-

v

v

e

e

)(Li)(Li

)( Li

1 -

1

11

12

/11-

/vτ-

/vτ

v

c e- - e

ep

(7)

Hence, pc is a factor of cutoff-index v and τ

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

the ratio α is computed from equation (6),

)(Li

)(Li

/ 11

/ 12-

v

v

e

e

(7)

Hence, pc is a factor of cutoff-index v and τ

Li τ (x) : τ-th Ploylogarithm of x

1

k

kxk

)(Li)(Li

)( Li

1 -

1

11

12

/11-

/vτ-

/vτ

v

c e- - e

ep

2 December 2003 Mohammad Al Rifai

Analysis:

I- Probabilistic Flooding

the ratio α is computed from equation (6),

Hence, pc is a factor of cutoff-index v and τ

)(Li

)(Li

/ 11

/ 12-

v

v

e

e

(7)

For Gnutella, the power-law exponent is estimated as low as 1.4 and as high as 2.3 in different times, and v is in the range of 100 to 1000.

)(Li)(Li

)( Li

1 -

1

11

12

/11-

/vτ-

/vτ

v

c e- - e

ep

2 December 2003 Mohammad Al Rifai

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

Power-law Exponent τ = 2.3

Power-law Exponent τ = 1.4

100 200 300 400 500 600 700 800 900 1000

pc

Cut-off index v

Critical probability can be less than 0.01

Hence, flooding cost is reduced by more than 99% without losing reachability

I- Probabilistic Flooding

)(Li )(Li

)( Li

11-

/12

/11-

/vv

v

c e-e

ep

v

2 December 2003 Mohammad Al Rifai

0.08

0.07

0.06

0.05

0.04

0.03

0.02

0.01

Power-law Exponent τ = 2.3

Power-law Exponent τ = 1.4

100 200 300 400 500 600 700 800 900 1000

pc

Cut-off index v

Critical probability can be less than 0.01

Hence, flooding cost is reduced by more than 99% without losing reachability

I- Probabilistic Flooding

)(Li )(Li

)( Li

11-

/12

/11-

/vv

v

c e-e

ep

v

i.e. scalable search

2 December 2003 Mohammad Al Rifai

II- TTL selection policy

Problem: in normal flooding search TTL is restricted to the initial value set by the search originator regardless of the actual size of the network.

i.e. not scalable

2 December 2003 Mohammad Al Rifai

II- TTL selection policy

Solution: selection policy is based on the typical length λ of the shortest path between two randomly chosen nodes on any random graph, which is provided by Newman as follows:

Problem: in normal flooding search TTL is restricted to the initial value set by the search originator regardless of the actual size of the network.

i.e. not scalable

2 December 2003 Mohammad Al Rifai

Problem: in normal flooding search TTL is restricted to the initial value set by the search originator regardless of the actual size of the network.

i.e. not scalable

Solution: selection policy is based on the typical length λ of the shortest path between two randomly chosen nodes on any random graph, which is provided by Newman as follows:

II- TTL selection policy

)/z(z

)(z - z) z)(z(N

12

21

2112

ln

ln1ln

N Average number of active nodes is not

heavily variant in short time-intervals z1

number of neighbors which are one hop away

z2 number of neighbors which are two hops away

2 December 2003 Mohammad Al Rifai

II- TTL selection policy

Solution:

Each node estimates z1 and z2 periodically with local

ping packets, and sets TTL of its query to the estimated typical length of path between two nodes λ .

Problem: in normal flooding search TTL is restricted to the initial value set by the search originator regardless of the actual size of the network.

i.e. not scalable

2 December 2003 Mohammad Al Rifai

II- TTL selection policy

Solution:

Each node estimates z1 and z2 periodically with local

ping packets, and sets TTL of its query to the estimated typical length of path between two nodes λ . TTL is adapted based on information collected

locally, hence: scalable TTL selection

Problem: in normal flooding search TTL is restricted to the initial value set by the search originator regardless of the actual size of the network.

i.e. not scalable

2 December 2003 Mohammad Al Rifai

Summary• Flooding search scalability is improved by

employing probabilistic flooding search and adopting new TTL selection policy.

• Percolation Theory is used to formally analyze P2P networks at critical operation points.

• Conclusion: theoretical and statistical models applied on Complex Systems can be exploited effectively to formally model and analyzes unstructured P2P networks.

2 December 2003 Mohammad Al Rifai

Questions ?