Upload
howard-ross
View
213
Download
0
Embed Size (px)
Citation preview
Introduction to Peer-to-Peer Networks
What is a P2P network
• Uses the vast resource of the machines at the edge
of the Internet to build a network that allows resource
sharing without any central authority.
• Client-Server vs. Peer-to-peer. A peer is both a
client and a server. Control is decentralized.
• Much more than a system for sharing pirated
music.
Historical Perspective
• The Internet originally emphasized working in the P2P
mode instead of the client-server mode.
• SRI, UCLA, UCSB and University of Utah had powerful
host machines forming a league of equals. ARPANET
arranged to integrate them in the late 1960’s.
Historical Perspective
• USENET was originally based on UUCP (Unix-to-
Unix Copy Protocol). It allowed users on two different
Unix machines to exchange messages and files.
Why does P2P need attention?
Overlay network
A P2P network is an overlay network. Each link
between peers consists of one or more IP links.
Alice Bob
Carol
Well-known P2P Systems
• Napster
• Gnutella
• KaZaA
• Limewire
• eDonkey
• Chord
• Tapestry
• CAN
• Pastry
• BitTorrent
• Kademlia
• Skype
• Various Social networks
Some important issues
Search
Storage
Security
Applications
A Distributed Storage Service
Alice Bob
Carol David
Promises
Consider File Sharing as an Example
– Available 24/7
– Durable despite machine failures
– Information is protected
– Resilient to Denial of Service
Additional Goals
• Massive scalability
• Anonymity
• Deniability
• Resistance to censorship
Challenges
• A P2P network must be self-organizing. Join
and leave operations must be self-managed.
• The infrastructure is untrusted and the
components are unreliable. The number of faulty
nodes grows linearly with system size. Yet, the
aggregate behavior has to be trustworthy.
Challenges
• Tolerance to failures and churn
• Efficient routing even if the structure of the
network is unpredictable.
• Dealing with freeriders
• Load balancing
• Security issues
Looking up data
• How do you locate data/files/objects in a large P2P
system built around a dynamic set of nodes in a
scalable manner without any centralized server or
hierarchy?
• Napster index servers used a central database.
Questionable scalability and poor resilience.
• Check how names are looked up in internet’s DNS.
Napster
Developed by Shawn Fanning in 1999, Shut down after 2 years for copyright infringement. Centralized directory servers were a bottleneck..
Root/Redirector
Directoryserver
Directoryserver
Directoryserver
Users
INTERNET
Stores indices of songs only
Gnutella
Truly decentralized system. A search like
where is Double Helix?
is based on the flooding of the query on a graph of
arbitrary topology. Obvious scalability problem, and
the wastage of bandwidth caused serious
inefficiencies.
Gnutella graph
Client looking
for “double helix”
double helix
Unstructured vs. Structured
• Unstructured P2P networks allow resources
to be placed at any node. The network
topology is arbitrary, and the growth is
spontaneous.
• Structured P2P networks simplify resource
location and load balancing by defining a
topology and defining rules for resource
placement.
Distributed Hash Table (DHT)
Object-to-machine mapping uses unique keys.
H (object name) = key (H = hash function)
H (machine name) = key
Object name mapped to key k is placed in machine whose
name is mapped to key k.
Simplifies object location.
Distributed Hash Table (DHT)
keyspace
a
c
b
0N-1
Machine namehashed to b
Object namehashed to b
Basic idea