Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed...

Preview:

Citation preview

Introduction to Peer-to-Peer Networks

What is a P2P network

• A P2P network is a large distributed system. It uses the

vast resource of PCs distributed at the edge of the Internet

to build a network that allows resource sharing without

any central authority

• Client-Server vs. Peer-to-peer. A peer is both a client

and a server. Control is decentralized.

• Much more than a system for sharing pirated music.

Why does P2P need attention?

A P2P network is an overlay network

Network of peers. Each link between peers consists of one or

more IP links. The overlay network resides in the

application layer.

Alice Bob

Carol

Well-known P2P Systems

• Napster

• Gnutella

• KaZaA

• eDpnkey

• Chord

• Tapestry

• CAN

• Pastry

• BitTorrent

Some important issues

Search

Storage

Security

Applications

A Distributed Storage Service

Alice Bob

Carol David

Promises

Consider File Sharing as an Example

– Available 24/7

– Durable despite machine failures

– Information is protected

– Resilient to Denial of Service

Additional Goals

• Massive scalability

• Anonymity

• Deniability

• Resistance to censorship

Challenges

• A P2P network must be self-organizing. Join

and leave operations must be self-managed.

• The infrastructure is untrusted and the

components are unreliable. The number of faulty

nodes grows linearly with system size. Yet, the

aggregate behavior has to be trustworthy.

Challenges

• Tolerance to failures and churn

• Efficient routing even if the structure of the

network is unpredictable.

• Dealing with freeriders

• Load balancing

• Security issues

Looking up data

• How do you locate data/files/objects in a large P2P

system built around a dynamic set of nodes in a

scalable manner without any centralized server or

hierarchy?

• Napster index servers used a central database.

Questionable scalability and poor resilience.

• Check how names are looked up in internet’s DNS.

Napster

Developed by Shawn Fanning in 1999, Shut down after 2 years for copyright infringement. Centralized directory servers were a bottleneck..

Root/Redirector

Directoryserver

Directoryserver

Directoryserver

Users

INTERNET

Stores indices of songs only

Gnutella

Truly decentralized system. A search like

where is Double Helix?

is based on the flooding of the query on a graph of

arbitrary topology. Obvious scalability problem, and

the wastage of bandwidth caused serious

inefficiencies.

Gnutella graph

Client looking

for “double helix”

double helix

Unstructured vs. Structured

• Unstructured P2P networks allow resources

to be placed at any node. The network

topology is arbitrary, and the growth is

spontaneous.

• Structured P2P networks simplify resource

location and load balancing by defining a

topology and defining rules for resource

placement.

Distributed Hash Table (DHT)

Object-to-machine mapping uses unique keys.

H (object name) = key (H = hash function)

H (machine name) = key

Object name mapped to key k is placed in machine whose

name is mapped to key k.

Simplifies object location.

Distributed Hash Table (DHT)

keyspace

a

c

b

0N-1

Machine namehashed to b

Object namehashed to b

Basic idea

Recommended