IntroductionIntroduction
Widespread unstructured P2P networkWidespread unstructured P2P network Currently between 200,000 & 300,000 Currently between 200,000 & 300,000
hostshosts
Ideal as a research test bedIdeal as a research test bed Large scale network demonstrates the Large scale network demonstrates the
need for scalable P2P protocolsneed for scalable P2P protocols
A Gnutella client has 4-10 TCP connections A Gnutella client has 4-10 TCP connections to other peersto other peersFor signaling traffic UDP isFor signaling traffic UDP is used and to used and to make use of the benefits of server based make use of the benefits of server based networks a ”ultra-peer” state wasnetworks a ”ultra-peer” state was createdcreated
Introduction (Cont.)Introduction (Cont.)””Ultra-peer” status is self assigned by powerful peers Ultra-peer” status is self assigned by powerful peers and provides some extraand provides some extra functionality compared to functionality compared to ordinary nodesordinary nodesThere exist many freely available GnutellaThere exist many freely available Gnutella clientsclientsSome of the most popular areSome of the most popular are::
LimewireLimewire BearshareBearshare MorpheusMorpheus ShareazaShareaza
ItIt has the most increasing number of users has the most increasing number of usersIt has a veryIt has a very pleasant GUI and connects also to eDonkey and pleasant GUI and connects also to eDonkey and BitTorrentBitTorrent
Its Main FeaturesIts Main Features
This protocol underlies much of the This protocol underlies much of the current file-sharing activity on the current file-sharing activity on the Internet.Internet.It is based on TCP/IP and http!It is based on TCP/IP and http!A file sharing network (fsn) is a bunch of A file sharing network (fsn) is a bunch of machines that exchange files using machines that exchange files using gnutella.gnutella.To connect to a gnutella network, you To connect to a gnutella network, you need the IP address of one single machine need the IP address of one single machine that is already part of the network.that is already part of the network.
GnutellaGnutella
Peer-to-peer indexing and searching Peer-to-peer indexing and searching service.service.
Peer-to-peer point-to-point file Peer-to-peer point-to-point file downloading using HTTP.downloading using HTTP.
A gnutella node needs a server (or a set of A gnutella node needs a server (or a set of servers) to “start-up”… gnutellahosts.com servers) to “start-up”… gnutellahosts.com provides a service with reliable initial provides a service with reliable initial connection pointsconnection points
But introduces a new single point of failure!But introduces a new single point of failure!
Gnutella vs. NapsterGnutella vs. Napster
Like Napster, distributed file storage and Like Napster, distributed file storage and transmissiontransmission
Added the ability to distribute file discoveryAdded the ability to distribute file discovery Ask your direct peers who else they knowAsk your direct peers who else they know Query those machines directlyQuery those machines directly
Concepts of Unstructured Concepts of Unstructured ServicesServices
There are many interesting ideas being explored;There are many interesting ideas being explored; Breaking shared files into many parts to both increase Breaking shared files into many parts to both increase
bandwidth (parallel I/O) and increase security of bandwidth (parallel I/O) and increase security of content as no one site can access files without content as no one site can access files without cooperation from its peerscooperation from its peers
This type of technology makes censorship very hard. This type of technology makes censorship very hard. MojoNation has a load balancing and scheduling MojoNation has a load balancing and scheduling
algorithm in the form of micro payments to reward algorithm in the form of micro payments to reward those who contribute most to the community of peers. those who contribute most to the community of peers.
Gnutella - which is a family of related products -- is Gnutella - which is a family of related products -- is usually described as a P2P search engine as its usually described as a P2P search engine as its interface is nearer that of a search engine than a Web interface is nearer that of a search engine than a Web file systemfile system
CharacteristicsCharacteristics
Gnutella is a distributed system for file Gnutella is a distributed system for file sharingsharing
provide means provide means for network discoveryfor network discovery
provide means provide means for file searching and sharingfor file searching and sharing
Defines a network at the application levelDefines a network at the application level Employs the concept of peer-to-peerEmploys the concept of peer-to-peer
all hosts are equal (symmetry)all hosts are equal (symmetry)
there is no central pointthere is no central point
anonymous search, but reveal the IP anonymous search, but reveal the IP addresses when downloadingaddresses when downloading
connectionconnection
Once you establish connection to the first Once you establish connection to the first servent, you announce your presence. servent, you announce your presence. The first servent will pass on that message The first servent will pass on that message to all the servents that it is connected to, to all the servents that it is connected to, and so on. and so on. These servents all reply with data about These servents all reply with data about themselvesthemselves how many files it is sharinghow many files it is sharing how many kilo bytes the files take uphow many kilo bytes the files take up
This already adds up to a lot of traffic!This already adds up to a lot of traffic!
Gnutella File Sharing model
Users register files with network neighbors Search across the network to find files to copy Does not require a centralized broker (as Napster)
Bob Carol
Ted Alice
Where is Final Fantasy 4? Carol has Final Fantasy 4
Copying Final Fantasy 4
Where is Final Fantasy 4?
Carol has it
Decentralized File-sharing Decentralized File-sharing ModelModel
Peers have same capability and responsibilityPeers have same capability and responsibility
The communication between peers is symmetricThe communication between peers is symmetric
There is no central directory server Index on the There is no central directory server Index on the metadata of shared files is stored locally among metadata of shared files is stored locally among all peersall peers GnutellaGnutella FreeServeFreeServe MojoNationMojoNation
Resource DiscoveryResource Discovery
DecentralizedDecentralized (Cont.) (Cont.)
every user acts as a client, a server or every user acts as a client, a server or both (both (serventservent))
User connects to framework and becomes User connects to framework and becomes a member of the community, allowing a member of the community, allowing others to connect through him/herothers to connect through him/her
Users speak directly to other users with Users speak directly to other users with no intermediate or central authorityno intermediate or central authority
No one entity controls the information No one entity controls the information that passes through the communitythat passes through the community
Resource DiscoveryResource Discovery
Advantages and Advantages and DisadvantagesDisadvantages
Advantages:Advantages: Inherent scalabilityInherent scalability Avoidance of “single point of litigation” Avoidance of “single point of litigation”
problemproblem Fault ToleranceFault Tolerance
Disadvantages:Disadvantages: Slow information discoverySlow information discovery More query traffic on the networkMore query traffic on the network
Resource DiscoveryResource Discovery
Unstructured Decentralized Unstructured Decentralized ServicesServices
There some 200 available Napster clones to support this area There some 200 available Napster clones to support this area http://www.ultimateresourcesite.com/napster/main.htmhttp://www.ultimateresourcesite.com/napster/main.htmCurrently the most popular is Imesh [Currently the most popular is Imesh [http://www.imesh.comhttp://www.imesh.com], ], which has some 2 million users and can share any type of file.which has some 2 million users and can share any type of file. Some of the best known file sharing systems are Some of the best known file sharing systems are MojoNation [MojoNation [http://www.mojonation.nethttp://www.mojonation.net]] Freenet Freenet [http://freenet.sourceforge.net/[http://freenet.sourceforge.net/] ] Gnutella [Gnutella [http://gnutella.wego.com/http://gnutella.wego.com/]]
These three are not server based like Napster but rather support These three are not server based like Napster but rather support waves of software agents expressing resource availability and waves of software agents expressing resource availability and interest propagating among an informal dynamic networks of interest propagating among an informal dynamic networks of peerspeers
DFS VariationsDFS Variations
FTPFTP NFSNFS WebWeb NapsterNapsterShawn Fanning Shawn Fanning
GnutellaGnutellaGene Kan @ AOLGene Kan @ AOL
FreenetFreenetIan ClarkIan Clark
PurposePurpose RemoteRemote file file sharingsharing
Local Local file file sharingsharing
Remote file Remote file sharing sharing (portal)(portal)
File-sharing File-sharing community community
(portal)(portal)
Decentralized Decentralized file sharing file sharing communitycommunity
Decentralized Decentralized anonymousanonymous file sharingfile sharing
Moderated?Moderated? YesYes YesYes YesYes YesYes NoNo NoNo
Access Access control?control? YesYes YesYes NoNo NoNo NoNo NoNo
SearchSearch Server-Server-basedbased
Server-Server-basedbased
Server-Server-basedbased
Server-Server-basedbased p2pp2p p2pp2p
File transferFile transfer Client/Client/ serverserver
Client/Client/ serverserver
Client/Client/ serverserver p2pp2p p2pp2p p2pp2p
File transfer File transfer protocolprotocol ftpftp nfsnfs http, http,
cachingcachingproprietarproprietar
yyhttphttp
Proprietary,Proprietary, encrypted, encrypted,
cachingcaching
DFSDFS: Distributed File S: Distributed File Sharingharing
P2P File Sharing P2P File Sharing BenefitsBenefits
Cost sharingCost sharing
Resource aggregationResource aggregation
Improved scalability/reliabilityImproved scalability/reliability
Anonymity/privacyAnonymity/privacy
DynamismDynamism
Management/Placement Management/Placement ChallengesChallenges
Per-node statePer-node state
Bandwidth usageBandwidth usage
Search timeSearch time
Fault tolerance/resiliencyFault tolerance/resiliency
GnutellaGnutella in Details in Details
Share any type of files Share any type of files (not just music)(not just music)Decentralized search Decentralized search unlike Napsterunlike Napster
You ask your You ask your neighbors for files of neighbors for files of interestinterestNeighbors ask their Neighbors ask their neighbors, and so onneighbors, and so on
TTL field quenches TTL field quenches messages after a messages after a number of hopsnumber of hops
Users with matching Users with matching files reply to youfiles reply to you
Figure from http://computer.howstuffworks.com/file-sharing.htm
The Gnutella protocol (v0.4)The Gnutella protocol (v0.4)
PING – Notify a peer of your existencePING – Notify a peer of your existence
PONG – Reply to a PING request PONG – Reply to a PING request
QUERY – Find a file in the networkQUERY – Find a file in the network
RESPONSE – Give the location of a fileRESPONSE – Give the location of a file
PUSHREQUEST – Request a server behind PUSHREQUEST – Request a server behind a firewall to push a file out to a client.a firewall to push a file out to a client.
Joining Joining Gnutella Gnutella NetworkNetwork
A
Gnutella NetworkThe new node connects to a The new node connects to a well known ‘Anchor’ node.well known ‘Anchor’ node.
Then sends a PING message Then sends a PING message to discover other nodes.to discover other nodes.
PONG messages are sent in PONG messages are sent in reply from hosts offering reply from hosts offering new connections with the new connections with the new node.new node.
Direct connections are then Direct connections are then made to the newly made to the newly discovered nodes.discovered nodes.
NewPING
PINGPING
PINGPING
PINGPING
PINGPING
PING
PONG
PONG
Properties of the FloodingProperties of the FloodingSearching by flooding:Searching by flooding:
If you don’t have the file you want, query 7 of If you don’t have the file you want, query 7 of your partners.your partners.
If they don’t have it, they contact 7 of their If they don’t have it, they contact 7 of their partners, for a maximum hop count of 10.partners, for a maximum hop count of 10.
Requests are flooded, but there is no tree Requests are flooded, but there is no tree structure.structure.
No looping but packets may be received twiceNo looping but packets may be received twice
Note: Play gnutella animation at:
http://www.limewire.com/index.jsp/p2p
Query floodingQuery flooding
Gnutella Gnutella
no hierarchyno hierarchy
use bootstrap node to use bootstrap node to learn about otherslearn about others
join messagejoin message
Send query to neighborsSend query to neighbors
Neighbors forward query to Neighbors forward query to all attached neighbors all attached neighbors ((floodsfloods))
If queried peer has object, it If queried peer has object, it sends message back to sends message back to querying peerquerying peer
join
query
MMore on query floodingore on query flooding
ProsPros
peers have similar peers have similar responsibilities: no responsibilities: no group leadersgroup leaders
highly decentralizedhighly decentralized
no peer maintains no peer maintains directory infodirectory info
ConsCons
excessive query trafficexcessive query traffic
query radius: may not query radius: may not have content when have content when presentpresent
bootstrap node still bootstrap node still requiredrequired
maintenance of maintenance of overlay networkoverlay network
About the FloodingAbout the Flooding
There is nothing that stops a servant flooding its network region with messages.There is nothing that stops a servant flooding its network region with messages.
Cost of Cost of maintaining Networkmaintaining NetworkCost of Cost of searching filesearching file
Breadth-First Search Breadth-First Search (BFS)(BFS)
= forward query
= processed query
= source
= found result
= forward response
Pros and ConsPros and ConsBenefits:Benefits:
Peers speak directly with no central authorityPeers speak directly with no central authorityNobody owns the Gnutella Network and nobody can shut it downNobody owns the Gnutella Network and nobody can shut it downNo central point of failureNo central point of failure
Limited per-node state Isolated node failure can quickly and Limited per-node state Isolated node failure can quickly and automatically be worked aroundautomatically be worked around
Free loading Free loading ScalabilityScalability
Drawbacks:Drawbacks: Searches are less effective and can be slowSearches are less effective and can be slow Bandwidth intensiveBandwidth intensive
Gnutella network evolving to include “controlled Gnutella network evolving to include “controlled
decentralization” (limewire, bearshare, toadnode)decentralization” (limewire, bearshare, toadnode)
Resource DiscoveryResource Discovery
Searching for a FileSearching for a File
Gnutella Network
QUERYQUERY
QUERYQUERY
QUERY
QUERY
QUERY
QUERYQUERY
A node broadcasts its A node broadcasts its QUERY to all its peers who QUERY to all its peers who in turn broadcast to their in turn broadcast to their peers.peers.
Nodes route QUERYHITs Nodes route QUERYHITs along the QUERY path back along the QUERY path back to the sender containing file to the sender containing file location details.location details.
To download files a direct To download files a direct connection is made using connection is made using details of the host in the details of the host in the QUERYHIT messages.QUERYHIT messages.
HIT
HIT
The Cooperation SpectrumThe Cooperation Spectrum
Free RidingFree Riding
File sharing networks rely on users sharing dataTwo types of free riding
Downloading but not sharing any data Not sharing any interesting data
On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query Didn’t have “interesting” data
Data from E. Adar and B.A. Huberman (2000), “Free Riding on Gnutella”
Example: GNUTELLAExample: GNUTELLA
Summary of the Gnutella’s Summary of the Gnutella’s FeaturesFeatures
DecentralizedDecentralized No single point of failureNo single point of failure Not as susceptible to denial of serviceNot as susceptible to denial of service Cannot ensure correct resultsCannot ensure correct results
Flooding queriesFlooding queries Search is now distributed but still not Search is now distributed but still not
scalablescalable
Initials Problems and FixesInitials Problems and Fixes
Freeloading: WWW sites offering search/retrieval from Gnutella network without providing file sharing or query routing
Block file-serving to browser-based non-file-sharing users
Prematurely terminated downloads: Software bugs long download times over modems modem users run gnutella peer only briefly
(Napster problem also!) or any users becomes overloaded
fix: peer can reply “I have it, but I am busy. Try again later”
Initials Problems and Fixes 2Initials Problems and Fixes 22000: avg size of reachable network only 400-800 hosts
Why so small?modem users: not enough bandwidth to provide search routing capabilities: routing black holesFix: create peer hierarchy based on capabilities
previously: all peers identical, most modem blackholes
connection preferencing:favors routing to well-connected peersfavors reply to clients that themselves serve large number of files: prevent freeloading
Limewire gateway functions as Napster-like central server on behalf of other peers
for searching purposes
Gnutella EnhancementsGnutella Enhancements
Pings/Pongs can consume up to 50% of bandwidthSolutions:
Pong Limiting Pong Caching Ping Multiplexing
http://www.limewire.com/index.jsp/pingpong
Gnutella enhancements 2Gnutella enhancements 2
Cache query responsesResultsEvolving Protocol
Gnutella Developer Forum
UltraPeersAlternative query routing algorithms
Can Heterogeneity Make GnutellaScale?
Ideas Replace query flooding with multiple
random walks Proactive replication
#replicas proportional to sqrt(request rate)
Result: Two orders of magnitude improvement in terms of query-time, per node load and message traffic
Can Heterogeneity Make GnutellaScale? 2
Gnutella assumption: All peers are equal Not true! Heterogeneity among P2P peers
(dial-up users vs. college users) Evolve topology to match node capacities Use random walks over this topology
Can Heterogeneity Make GnutellaScale? 3
Solution outline C_i, node capacity in[j,i] messages from j->i, out[i,j]
messages i->j Init in[i,j]=out[i,j]=0, OutMax[i,j]=c_i/d_I Update according the messages received/sent Check if overloaded
If so redirect high-input neighbor to neighbor with high OutMax (spare capacity)Intuitively, take yourself out of the loopIf node cannot be found ask neighbor to throttle back
Result: Average query length reduces from 70 to 2-9 hops
depending on topology
Measurement ResultsMeasurement Results
Who is sharing Who is sharing what?what?
August 2000August 2000
The top Share As percent of whole
333 hosts (1%) 333 hosts (1%) 1,142,645 1,142,645 37%37%
1,667 hosts (5%)1,667 hosts (5%) 2,182,0872,182,087 70%70%
3,334 hosts (10%) 3,334 hosts (10%) 2,692,0822,692,082 87% 87%
5,000 hosts (15%)5,000 hosts (15%) 2,928,9052,928,905 94%94%
6,667 hosts (20%)6,667 hosts (20%) 3,037,2323,037,232 98%98%
8,333 hosts (25%)8,333 hosts (25%) 3,082,5723,082,572 99%99%
Protocol scalabilityProtocol scalability Message broadcast technique imposes limitations Message broadcast technique imposes limitations
on the network sizeon the network size
packets per message = packets per message = ∑∑noPeersnoPeersii
IInn November 2000 dial-up bandwidth barrier November 2000 dial-up bandwidth barrier reachedreached
Overlay network efficiencyOverlay network efficiency Random selection of peers results in inefficient use Random selection of peers results in inefficient use
of the underlying networkof the underlying network Redundant traffic generated on the InternetRedundant traffic generated on the Internet
Problems With GnutellaProblems With Gnutella
TTL
i=0
Heterogeneous connection Heterogeneous connection qualitiesqualities of the Gnutella of the Gnutella
35% have upstream bottleneck bandwidth 35% have upstream bottleneck bandwidth of at least 100Kbpsof at least 100Kbps
only 8% have at least 10Mbps bandwidthonly 8% have at least 10Mbps bandwidth
22% have bandwidth 100kbps or less22% have bandwidth 100kbps or less
Number of Shared FilesNumber of Shared Files
Why Look at GnutellaWhy Look at GnutellaWidespread unstructured P2P networkWidespread unstructured P2P network
Currently between 200,000 & 300,000 hostsCurrently between 200,000 & 300,000 hosts 2006: 2006: still heavily in use by about 2 million users Gnutella clients (among others):Gnutella clients (among others):
LimeWireLimeWireMorpheusMorpheusBearShareBearShareOpenColaOpenColaShareazaShareaza
It has the most increasing number of usersIt has the most increasing number of users It has a very pleasant GUI and connects also to It has a very pleasant GUI and connects also to
eDonkey and BitTorrenteDonkey and BitTorrentIdeal as a research test bedIdeal as a research test bed
Large scale network demonstrates the need for scalable Large scale network demonstrates the need for scalable P2P protocolsP2P protocols
Limewire: Improvement on Limewire: Improvement on GnutellaGnutella
CCreatreationion peer hierarchy based on capabilities peer hierarchy based on capabilities previously: all peers identical, most modem previously: all peers identical, most modem
blackholesblackholes connection preferencing:connection preferencing:
favors routing to well-connected peersfavors routing to well-connected peers
favors reply to clients that themselves serve large favors reply to clients that themselves serve large number of files: prevent freeloadingnumber of files: prevent freeloading
Limewire gateway functions as Napster-like Limewire gateway functions as Napster-like central server on behalf of other peerscentral server on behalf of other peers for searching purposesfor searching purposes
LimewireLimewire
The Limewire P2P file sharing program connects to The Limewire P2P file sharing program connects to the Gnutella P2P networkthe Gnutella P2P network
Limewire client software is widely recognized for its Limewire client software is widely recognized for its clean user interface that does not contain adwareclean user interface that does not contain adware
Sometimes billed as the „fastest file sharing Sometimes billed as the „fastest file sharing program”program”
Limewire claims to offer relatively good search and Limewire claims to offer relatively good search and download performancedownload performance
Free Limewire software downloads are available for Free Limewire software downloads are available for Windows, Linux and Macintosh operating systemsWindows, Linux and Macintosh operating systems
Limewire Pro pay clients also existLimewire Pro pay clients also exist
BearShareBearShare
The BearShare P2P file sharing program is The BearShare P2P file sharing program is a popular free software client for the a popular free software client for the Gnutella P2P networkGnutella P2P network
Both free and pay downloads of BearShare Both free and pay downloads of BearShare file sharing programs existfile sharing programs exist
ShareazaShareaza
Shareaza is an up-and-coming P2P file sharing Shareaza is an up-and-coming P2P file sharing programprogramThis client offers an extremely powerful search This client offers an extremely powerful search engine capable of connecting to multiple popular engine capable of connecting to multiple popular P2P networks including eDonkey, BitTorrent and P2P networks including eDonkey, BitTorrent and GnutellaGnutellaShareaza file sharing software includes intelligence Shareaza file sharing software includes intelligence for detecting fake and/or corrupted filesfor detecting fake and/or corrupted filesThe free Shareaza download also contains no ads The free Shareaza download also contains no ads or spywareor spywareAs the installed base of Shareaza client users growsAs the installed base of Shareaza client users grows expect Shareaza to become an even better P2P expect Shareaza to become an even better P2P
file sharing programfile sharing program
Anonymous?Anonymous?
The person you are getting the file from knows The person you are getting the file from knows who you arewho you are That’s not anonymous.That’s not anonymous.
Other protocols exist where the owner of the files Other protocols exist where the owner of the files doesn’t know the requester.doesn’t know the requester.
Peer-to-peer anonymity existsPeer-to-peer anonymity exists
SummarySummarypeer-to-peer networking: applications connect to peer applications peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for filesfocus: decentralized method of searching for fileseach application instance serves to:each application instance serves to:
store selected filesstore selected files route queries (file searches) from and to its neighboring peersroute queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locallyrespond to queries (serve file) if file stored locally
Gnutella history:Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn3/14/00: release by AOL, almost immediately withdrawn too late: 23K users on Gnutella at 8 am this AMtoo late: 23K users on Gnutella at 8 am this AM many iterations to fix poor initial design (poor design turned many iterations to fix poor initial design (poor design turned
many people off)many people off)What we care about:What we care about:
How much traffic does one query generate?How much traffic does one query generate? how many hosts can it support at once?how many hosts can it support at once? What is the latency associated with querying?What is the latency associated with querying? Is there a bottleneck?Is there a bottleneck?