5
Computer Networks Lecture 8: Content Delivery Infrastructure: Peer-to-Peer 2015 Internet Traffic Analysis Sandvine’s Global Internet Phenomena Report: https://www.sandvine.com/trends/global-internet-phenomena/ Content Distribution Most popular content can only be served if it is highly replicated across multiple servers reduce load at origin server improve performance for end users Most Content Delivery Infrastructures (CDI) have a large number of servers distributed across the Internet and cache content on these servers Content Delivery Infrastructure Peer-to-peer (p2p): hybrid p2p with a centralized server pure p2p hierarchical p2p end-host (p2p) multicast Content-Distribution Network (CDN)

2015 Internet Traffic Analysisweb.eecs.umich.edu/~sugih/courses/eecs489/lectures/08-P2P.pdf · Sandvine’s Global Internet Phenomena Report: https: ... To limit resource usage, the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Computer Networks

    Lecture8:ContentDeliveryInfrastructure:Peer-to-Peer

    2015InternetTrafficAnalysis

    Sandvine’sGlobalInternetPhenomenaReport:https://www.sandvine.com/trends/global-internet-phenomena/

    ContentDistribution

    Mostpopularcontentcanonlybeservedifitishighlyreplicatedacrossmultipleservers•  reduceloadatoriginserver•  improveperformanceforendusersMostContentDeliveryInfrastructures(CDI)havealargenumberofserversdistributedacrosstheInternetandcachecontentontheseservers

    ContentDeliveryInfrastructure

    Peer-to-peer(p2p):•  hybridp2pwithacentralizedserver•  purep2p•  hierarchicalp2p•  end-host(p2p)multicast

    Content-DistributionNetwork(CDN)

  • HybridP2PandCentralizedServer

    Napster:•  P2Pfiletransfer•  centralizedfilesearch:•  peersregisterIPaddressandcontentatacentralindexserver

    •  peersquerycentralindexservertolocatecontent

    register

    queryreply

    PureP2PArchitecture• noalways-onserver• arbitraryendsystemsdirectlycommunicate• peersareintermittentlyconnectedandchangeIPaddresses• example:Gnutella• highlyscalable(why?)• butdifficulttomanage•  howtofindpeer?•  howtofindcontent?

    Gnutella•  nocentralizedindexserver•  networkdiscoveryusingpingandpongmessages

    •  filediscoveryusingqueryandqueryHitmessages

    •  bothpingandquerymessagesareforwardedusingthefloodingalgorithm:forwardonalllinksexceptincomingone

    •  previouslyseenmessagesarenotfurtherforwarded

    •  newversionofgnutellausesKaZaA-likesupernodes

    A

    B

    ping

    pong

    A

    B

    query

    queryHit

    HierarchicalP2PFastTrackusedbyKaZaA,Groskster,iMesh,Morpheus•  hierarchicalarchitecture•  peersdividedintosupernodesandordinarynodes

    •  eachsupernodekeepsanindexofallitschildren’sfiles

    •  requestsaresenttosupernodes•  supernodesqueryeachotherforfilesnotintheirlocalindices

    •  ordinarynodesare“promoted”tosupernodesiftheyhaveenoughresourcesandhavestayedonnetworklongenough

    •  paralleldownloadoffiles

    reg

    req

  • HierarchicalP2P:SkypeSkypeformsahierarchicalP2P:•  indexmappingusernamestoIPaddressesisdistributedacrosssupernodes•  searchesforSkypeusersaresenttosupernodes

    •  supernodesqueryeachotherforusersnotintheirlocalindex

    • supernodeschooseapeertoactasrelayfortwoNATtedusers

    eDonkey/eMulealsobuildsahierarchicalnetwork,butthe“supernodes”arededicatedservers,notjustmoreequalpeers

    openhost

    A

    B

    maintain a central index of files, so that userscan send requests directly to information hold-ers. Unfortunately, centralization creates a sin-gle point of failure that is easy to attack. Forexample, if you were trying to phone MichaelJordan, the simplest way to get his numberwould ordinarily be to call directory assistance.However, because directory assistance is central-ized, your access can be easily blocked if Jordanor someone else decides to remove his directoryentry, or if the service goes down.

    Systems like Gnutella broadcast queries to everyconnected node within some radius. Using thismethod, you would ask all of your friends if anyof them knew Jordan’s number, get them to asktheir friends, and so on. Within a few steps, thou-sands of people could be looking for his number.Although this process would eventually find youranswer, it is clearly wasteful and unscalable.

    Freenet avoids both problems by using asteepest-ascent hill-climbing search: Each nodeforwards queries to the node that it thinks isclosest to the target. You might start searchingfor Jordan by asking a friend who once playedcollege basketball, for example, who might passyour request on to a former coach, who couldpass it to a talent scout, who might pass it to Jor-dan’s agent, who could put you in touch with theman himself.

    Requesting files. Every node maintains a routingtable that lists the addresses of other nodes and theGUID keys it thinks they hold. When a nodereceives a query, it first checks its own store, and ifit finds the file, returns it with a tag identifyingitself as the data holder. Otherwise, the node for-wards the request to the node in its table with theclosest key to the one requested. That node thenchecks its store, and so on. If the request is suc-

    cessful, each node in the chain passes the file backupstream and creates a new entry in its routingtable associating the data holder with the request-ed key. Depending on its distance from the holder,each node might also cache a copy locally.

    To conceal the identity of the data holder, nodeswill occasionally alter reply messages, setting theholder tags to point to themselves before passingthem back up the chain. Later requests will stilllocate the data because the node retains the truedata holder’s identity in its own routing table andforwards queries to the correct holder. Routingtables are never revealed to other nodes.

    To limit resource usage, the requester gives eachquery a time-to-live limit that is decremented ateach node. If the TTL expires, the query fails,although the user can try again with a higher TTL(up to some maximum). Because the TTL can giveclues about where in the chain the requester is,Freenet offers the option of enhancing security byadding an initial mix-net route before normalrouting. This effectively repositions the start of thechain away from the requester.

    If a node sends a query to a recipient that isalready in the chain, the message is bounced backand the node tries to use the next-closest keyinstead. If a node runs out of candidates to try, itreports failure back to its predecessor in the chain,which then tries its second choice, and so on.

    Figure 1 depicts a typical request sequence. Theuser initiates a request at node A and forwards therequest to B, which forwards it to C. Node C isunable to contact any other nodes and returns a“request failed” message to B. Node B then triesits second choice, E, which forwards the requestto F. Node F forwards the request to B, whichdetects a loop and bounces the message back.Unable to contact any additional nodes, node Fbacktracks one step to E, which forwards therequest to its second choice, D, and locates thefile. D returns the file via E and B back to A,which sends it to the user. Along the way, E, B,and A might also cache the file.

    With this approach, the request homes in closerwith each hop until the key is found. A subsequentquery for this key will tend to approach the firstrequest’s path, and a locally cached copy can sat-isfy the query after the two paths converge. Sub-sequent queries for similar keys will also jump overintermediate nodes to one that has previously sup-plied similar data. Nodes that reliably answerqueries will be added to more routing tables, andhence, will be contacted more often than nodesthat do not.

    44 JANUARY • FEBRUARY 2002 http://computer.org/internet/ IEEE INTERNET COMPUTING

    Peer-to-Peer Networking

    Figure 1.Typical request sequence.The request moves through thenetwork from node to node, backing out of a dead-end (step 3) anda loop (step 7) before locating the desired file.

    = Data request

    = Data reply

    = Request failedRequester

    Data holder

    2 3

    58

    910

    4

    1

    11

    12

    67

    a b

    c

    d

    ef

    Freenet:AnonymousP2P•  noindexserver•  requesterdoesn’tconnectdirectlytocontentprovider

    •  instead,contentispassedinabucket-brigadefashionfromprovidertorequester

    •  subsequentrequestforthesamecontentissatisfiedfromthenearestcache

    •  requestercannotdifferentiateproviderfromacacheholderoraforwardingpeer(allowsforanonymity)

    BitTorrentContentdistribution:•  contentisdividedintoNpiecesof16KBeachandsenttoNpeers

    Contentdownload:•  todownloadafile,apeermustfirstregisterwithaTracker•  Trackerreturnsarandomlistofpeerswhohavethefile•  peeropensabout5TCPconnectionstotheprovidedpeers•  apeerwillonlyuploadtopeersfromwhomitcanalsodownload(“tit-for-tat”)

    Summary:P2POverlayNetworksP2Papplications/peersneedto:•  trackidentitiesandIPaddressesofpeers•  theremaybealargenumberofpeers•  peersmaycomeandgofrequently(highchurn)•  can’tkeeptrackofallpeers•  routemessagesamongpeers•  maybemulti-hop

    Overlaynetwork•  peershavetodobothnamingandrouting•  IPbecomes just thelow-leveldeliverysubstrate•  allIProutingisopaque

    application

    transport

    network

    link

    physical

  • ModesofDeliveryUnicast,broadcast,multicastAssumingavideoconference

    involvingS,D2,andD3•  unicasting:twocopiesofpacketsfromSaresentovertheSRlink•  broadcasting:onecopyofpacketssentfromStoalldestinations,butpacketssenttoD1andD4unnecessarily•  multicasting:onecopyofpacketsfromSissentovertheSRlink,RthensendsonecopyeachtoD2andD3

    D4

    D3

    D2

    D1

    RS

    MulticastDelivery

    Usesofmulticasting:•  videoconferencing,distancelearning,distributedcomputation,p2pdelivery,multi-playergaming,etc.

    Multicastdesigngoals:•  cansupportmillionsofreceiverspermulticastgroup•  receiverscanjoinandleaveanygroupatanytime•  sendersdon’tknowallreceivers•  sendersdon’thavetobemembersofagrouptosend•  therecouldbemorethanonesenderspergroup

    MulticastGroupManagementIssuesinmulticastgroupmanagement:1.  howtoadvertise/discoveramulticastgroup?2.  howtojoinamulticastgroup?3.  deliveringmulticastpacketstothegroup

    IPmulticast:• usemulticastaddressesasanonymousrendezvouspoint:•  IPv4:Class-D(224.0.0.0to239.255.255.255[RPC3171])•  265Mmulticastgroupsatmost•  IPv6:multicastprefix:FF00::/8

    •  createawell-knownmulticastgroup(address)toadvertise/discovermulticastgroups

    MulticastDeliveryIPmulticast:

    •  sendersendsasinglepackettotheIPmulticastaddress

    •  multicastdataissentbest-effort,usingUDP(why?)

    •  routersdeliverpacketsoutallinterfacesthathasareceiverbelongingtothemulticastgroup

    •  receiversjoingroupsbyinformingupstreamrouters,e.g.,byusingInternetGroupManagementProtocol(IGMP)

    •  notuniformlydeployedthroughouttheInternet

  • FloodandPruneHowtoensurethatonlyonecopyofpacketfromSisforwardedbyP3toP4?•  keeptrackofpacketsequencenumber•  onlyforwardpacketthatcomesfromshortestpathfrom(to)source

    HowtoensurethatonlyonecopyofpacketfromSreachesP3?•  onlyforwardifselfisonneighbor’sshortestpathfrom(to)source

    •  prune(P3tellsP2nottoforwardpacketsfromS )•  mustbedonepersourceiftherearemultiplesources,

    eachsourceformingitsownmulticastgroupand(logically)itsownmulticasttree

    •  mustperiodicallyfloodincaseofmembershipchange

    S

    P3

    P2P1

    P4

    End-hostMulticastIssuesinmulticastgroupmanagement:1.  howtoadvertise/discoveramulticastgroup?2.  howtojoinamulticastgroup?3.  deliveringmulticastpacketstothegroup

    End-host(p2p)multicast:•  usesawell-known,centralizedrendezvousserver•  eachpeermustregisterwithrendezvousserver•  rendezvousserverreturnsa(random)listofpeers•  eachpeercansupportonlyalimitednumberofpeers•  avoidsendingduplicatemessagesandlooping:•  ifsinglesource,constructsashortest-pathtreerootedatsource•  orusesflood-and-prunealgorithm

    •  preferspeersinsamesubnet

    P2PChallenges

    RelativetoIPnetworking:• muchhigherfunction,moreflexible• muchlesscontrollable/predictable

    Relativetootherparallel/distributedsystems:•  noadministrativeorganizations•  fewguaranteesontransport,storage,etc.•  partialfailure•  churn•  networkbottlenecksandotherresourceconstraints•  trustissues:security,privacy,incentives

    ChallengesforP2PNetworks1.  NATandfirewall:•  cannotpeerwithahostyoucan’taddress

    Solutions:•  Gnutella:•  queriersendsPUSHmessagetoresponderoverthep2pnetwork•  responderopensaTCPconnectiontoquerierandsendsoverthefile•  noluckifbotharebehindfirewalls

    •  KaZaA,eDonkey,Skype:•  asupernodeactsasrelayifbothpeersarebehindfirewalls

    •  StandardstotraverseNAT(andfirewall!):UPnP,STUN,TURN

    2. Download/uploadbandwidthasymmetryneedsbandwidthsubsidybycontentproviderorCDN,orsufferlongdownloadtime