46
An Overview of Peer- to-Peer Sami Rollins 11/14/02

Peer to-peer

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Peer to-peer

An Overview of Peer-to-Peer

Sami Rollins

11/14/02

Page 2: Peer to-peer

Outline

• P2P Overview– What is a peer?– Example applications– Benefits of P2P

• P2P Content Sharing– Challenges– Group management/data placement approaches– Measurement studies

Page 3: Peer to-peer

What is Peer-to-Peer (P2P)?

• Napster?

• Gnutella?

• Most people think of P2P as music sharing

Page 4: Peer to-peer

What is a peer?

• Contrasted with Client-Server model

• Servers are centrally maintained and administered

• Client has fewer resources than a server

Page 5: Peer to-peer

What is a peer?

• A peer’s resources are similar to the resources of the other participants

• P2P – peers communicating directly with other peers and sharing resources

Page 6: Peer to-peer

Levels of P2P-ness

• P2P as a mindset– Slashdot

• P2P as a model– Gnutella

• P2P as an implementation choice– Application-layer multicast

• P2P as an inherent property– Ad-hoc networks

Page 7: Peer to-peer

P2P Application Taxonomy

P2P Systems

Distributed ComputingSETI@home

File SharingGnutella

CollaborationJabber

PlatformsJXTA

Page 8: Peer to-peer

P2P Goals/Benefits

• Cost sharing

• Resource aggregation

• Improved scalability/reliability

• Increased autonomy

• Anonymity/privacy

• Dynamism

• Ad-hoc communication

Page 9: Peer to-peer

P2P File Sharing

• Content exchange– Gnutella

• File systems– Oceanstore

• Filtering/mining– Opencola

Page 10: Peer to-peer

P2P File Sharing Benefits

• Cost sharing

• Resource aggregation

• Improved scalability/reliability

• Anonymity/privacy

• Dynamism

Page 11: Peer to-peer

Research Areas

• Peer discovery and group management

• Data location and placement

• Reliable and efficient file exchange

• Security/privacy/anonymity/trust

Page 12: Peer to-peer

Current Research

• Group management and data placement– Chord, CAN, Tapestry, Pastry

• Anonymity– Publius

• Performance studies– Gnutella measurement study

Page 13: Peer to-peer

Management/Placement Challenges

• Per-node state

• Bandwidth usage

• Search time

• Fault tolerance/resiliency

Page 14: Peer to-peer

Approaches

• Centralized

• Flooding

• Document Routing

Page 15: Peer to-peer

Centralized

• Napster model• Benefits:

– Efficient search

– Limited bandwidth usage

– No per-node state

• Drawbacks:– Central point of failure

– Limited scale

Bob Alice

JaneJudy

Page 16: Peer to-peer

Flooding

• Gnutella model• Benefits:

– No central point of failure

– Limited per-node state

• Drawbacks:– Slow searches

– Bandwidth intensive

Bob

Alice

Jane

Judy

Carl

Page 17: Peer to-peer

Document Routing

• FreeNet, Chord, CAN, Tapestry, Pastry model

• Benefits:– More efficient searching

– Limited per-node state

• Drawbacks:– Limited fault-tolerance vs

redundancy

001 012

212

305

332

212 ?

212 ?

Page 18: Peer to-peer

Document Routing – CAN

• Associate to each node and item a unique id in an d-dimensional space

• Goals– Scales to hundreds of thousands of nodes

– Handles rapid arrival and failure of nodes

• Properties – Routing table size O(d)

– Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

Slide modified from another presentation

Page 19: Peer to-peer

CAN Example: Two Dimensional Space

• Space divided between nodes• All nodes cover the entire

space• Each node covers either a

square or a rectangular area of ratios 1:2 or 2:1

• Example: – Node n1:(1, 2) first node that

joins cover the entire space1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1

Slide modified from another presentation

Page 20: Peer to-peer

CAN Example: Two Dimensional Space

• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

Slide modified from another presentation

Page 21: Peer to-peer

CAN Example: Two Dimensional Space

• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3

Slide modified from another presentation

Page 22: Peer to-peer

CAN Example: Two Dimensional Space

• Nodes n4:(5, 5) and n5:(6,6) join

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

Slide modified from another presentation

Page 23: Peer to-peer

CAN Example: Two Dimensional Space

• Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

• Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 24: Peer to-peer

CAN Example: Two Dimensional Space

• Each item is stored by the node who owns its mapping in the space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 25: Peer to-peer

CAN: Query Example• Each node knows its neighbors

in the d-space• Forward query to the neighbor

that is closest to the query id• Example: assume n1 queries f4• Can route around some failures

– some failures require local flooding

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 26: Peer to-peer

CAN: Query Example• Each node knows its neighbors

in the d-space• Forward query to the neighbor

that is closest to the query id• Example: assume n1 queries f4• Can route around some failures

– some failures require local flooding

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 27: Peer to-peer

CAN: Query Example• Each node knows its neighbors

in the d-space• Forward query to the neighbor

that is closest to the query id• Example: assume n1 queries f4• Can route around some failures

– some failures require local flooding

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 28: Peer to-peer

CAN: Query Example• Each node knows its neighbors

in the d-space• Forward query to the neighbor

that is closest to the query id• Example: assume n1 queries f4• Can route around some failures

– some failures require local flooding

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Slide modified from another presentation

Page 29: Peer to-peer

Node Failure Recovery

• Simple failures– know your neighbor’s neighbors– when a node fails, one of its neighbors takes

over its zone

• More complex failure modes– simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors– hopefully, a rare event

Slide modified from another presentation

Page 30: Peer to-peer

Document Routing – Chord

• MIT project• Uni-dimensional ID

space• Keep track of log N

nodes• Search through log N

nodes to find desired key

N32

N10

N5

N20

N110

N99

N80

N60

K19

Page 31: Peer to-peer

Doc Routing – Tapestry/Pastry

• Global mesh• Suffix-based routing• Uses underlying network

distance in constructing mesh

13FE

ABFE

1290

239E

73FE

9990

F990

993E

04FE

43FE

Page 32: Peer to-peer

Comparing Guarantees

logbNNeighbor map

Pastry

b logbNlogbNGlobal MeshTapestry

2ddN1/dMulti-dimensional

CAN

log Nlog NUni-dimensional

Chord

StateSearchModel

b logbN + b

Page 33: Peer to-peer

Remaining Problems?

• Hard to handle highly dynamic environments

• Usable services

• Methods don’t consider peer characteristics

Page 34: Peer to-peer

Measurement Studies

• “Free Riding on Gnutella”

• Most studies focus on Gnutella

• Want to determine how users behave

• Recommendations for the best way to design systems

Page 35: Peer to-peer

Free Riding Results

• Who is sharing what?

• August 2000

70%2,182,0871,667 hosts (5%)

37%1,142,645 333 hosts (1%)

87% 2,692,0823,334 hosts (10%)

99%3,082,5728,333 hosts (25%)

98%3,037,2326,667 hosts (20%)

94%2,928,9055,000 hosts (15%)

As percent of wholeShareThe top

Page 36: Peer to-peer

Saroiu et al Study

• How many peers are server-like…client-like?– Bandwidth, latency

• Connectivity

• Who is sharing what?

Page 37: Peer to-peer

Saroiu et al Study

• May 2001

• Napster crawl– query index server and keep track of results– query about returned peers– don’t capture users sharing unpopular content

• Gnutella crawl– send out ping messages with large TTL

Page 38: Peer to-peer

Results Overview

• Lots of heterogeneity between peers– Systems should consider peer capabilities

• Peers lie– Systems must be able to verify reported peer

capabilities or measure true capabilities

Page 39: Peer to-peer

Measured Bandwidth

Page 40: Peer to-peer

Reported Bandwidth

Page 41: Peer to-peer

Measured Latency

Page 42: Peer to-peer

Measured Uptime

Page 43: Peer to-peer

Number of Shared Files

Page 44: Peer to-peer

Connectivity

Page 45: Peer to-peer

Points of Discussion

• Is it all hype?

• Should P2P be a research area?

• Do P2P applications/systems have common research questions?

• What are the “killer apps” for P2P systems?

Page 46: Peer to-peer

Conclusion

• P2P is an interesting and useful model

• There are lots of technical challenges to be solved