Upload
brandon-thornton
View
221
Download
4
Tags:
Embed Size (px)
Citation preview
Introduction to Peer-to-Peer Networks
What is a P2P network
• A P2P network is a large distributed system. It uses the
vast resource of PCs distributed at the edge of the Internet
to build a network that allows resource sharing without
any central authority
• Client-Server vs. Peer-to-peer. A peer is both a client
and a server. Control is decentralized.
• Much more than a system for sharing pirated music.
Why does P2P need attention?
A P2P network is an overlay network
Network of peers. Each link between peers consists of one or
more IP links. The overlay network resides in the
application layer.
Alice Bob
Carol
Well-known P2P Systems
• Napster
• Gnutella
• KaZaA
• eDpnkey
• Chord
• Tapestry
• CAN
• Pastry
• BitTorrent
Some important issues
Search
Storage
Security
Applications
A Distributed Storage Service
Alice Bob
Carol David
Promises
Consider File Sharing as an Example
– Available 24/7
– Durable despite machine failures
– Information is protected
– Resilient to Denial of Service
Additional Goals
• Massive scalability
• Anonymity
• Deniability
• Resistance to censorship
Challenges
• A P2P network must be self-organizing. Join
and leave operations must be self-managed.
• The infrastructure is untrusted and the
components are unreliable. The number of faulty
nodes grows linearly with system size. Yet, the
aggregate behavior has to be trustworthy.
Challenges
• Tolerance to failures and churn
• Efficient routing even if the structure of the
network is unpredictable.
• Dealing with freeriders
• Load balancing
• Security issues
Looking up data
• How do you locate data/files/objects in a large P2P
system built around a dynamic set of nodes in a
scalable manner without any centralized server or
hierarchy?
• Napster index servers used a central database.
Questionable scalability and poor resilience.
• Check how names are looked up in internet’s DNS.
Napster
Developed by Shawn Fanning in 1999, Shut down after 2 years for copyright infringement. Centralized directory servers were a bottleneck..
Root/Redirector
Directoryserver
Directoryserver
Directoryserver
Users
INTERNET
Stores indices of songs only
Gnutella
Truly decentralized system. A search like
where is Double Helix?
is based on the flooding of the query on a graph of
arbitrary topology. Obvious scalability problem, and
the wastage of bandwidth caused serious
inefficiencies.
Gnutella graph
Client looking
for “double helix”
double helix
Unstructured vs. Structured
• Unstructured P2P networks allow resources
to be placed at any node. The network
topology is arbitrary, and the growth is
spontaneous.
• Structured P2P networks simplify resource
location and load balancing by defining a
topology and defining rules for resource
placement.
Distributed Hash Table (DHT)
Object-to-machine mapping uses unique keys.
H (object name) = key (H = hash function)
H (machine name) = key
Object name mapped to key k is placed in machine whose
name is mapped to key k.
Simplifies object location.
Distributed Hash Table (DHT)
keyspace
a
c
b
0N-1
Machine namehashed to b
Object namehashed to b
Basic idea