Upload
nigel-frank-norris
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Collaborative Collaborative Content DeliveryContent Delivery
Werner VogelsRobbert van Renesse, Ken BirmanDept. of Computer Science, Cornell University
A peer-to-peer solution for web-based publish/subscribe
.: DRAFT :.
© Copyright 2002 Werner Vogels
Presentation duality …Presentation duality …
The case for Collaborative Content The case for Collaborative Content DeliveryDelivery
vsvs The innovative technology used to The innovative technology used to
build the systembuild the system Spectacularly scalable technologySpectacularly scalable technology Secure, reliable, robust & fastSecure, reliable, robust & fast A solution to many distributed A solution to many distributed
management problemsmanagement problems
© Copyright 2002 Werner Vogels
Epidemic Theory of Infectious Diseases and its Applications
N.T.J. BaileyHafner Press
Second Edition, 1975
Late night readingLate night reading
© Copyright 2002 Werner Vogels
The ProblemThe Problem
Access to real-time information at Access to real-time information at syndicated news sites is highly syndicated news sites is highly inefficientinefficient
An estimated 70%-80% of the An estimated 70%-80% of the bandwidth is wasted on redundant bandwidth is wasted on redundant transport both at the consumer and transport both at the consumer and at the publisherat the publisher
Consumers frequently return to the Consumers frequently return to the website to receive timely updateswebsite to receive timely updates
© Copyright 2002 Werner Vogels
Isn’t this solved already?Isn’t this solved already?
RSS – channels provide summaries for RSS – channels provide summaries for processing by bots.processing by bots. But the mechanism remains “But the mechanism remains “pullpull””
HTTP – Delta should reduce bw costHTTP – Delta should reduce bw cost News feeds from major vendorsNews feeds from major vendors
““pushpush” is the right model for frequently ” is the right model for frequently changing data with timely deliverychanging data with timely delivery
Proprietary formats and high feesProprietary formats and high fees Email summary as cheap alternativeEmail summary as cheap alternative Still high bandwidth cost at the publisherStill high bandwidth cost at the publisher
Hybrid “Hybrid “push/pullpush/pull” by organizations ” by organizations exploiting distributed content deliveryexploiting distributed content delivery
© Copyright 2002 Werner Vogels
Scale is a major obstacleScale is a major obstacle
No coordinated action by No coordinated action by syndication sites to provide shared syndication sites to provide shared information push infrastructureinformation push infrastructure
The one-to-many technologies used The one-to-many technologies used currently are inherently not scalablecurrently are inherently not scalable
No technology is available that can No technology is available that can deliver data from thousands deliver data from thousands publishers to millions of subscribers publishers to millions of subscribers in real-time.in real-time.
© Copyright 2002 Werner Vogels
We can do betterWe can do better
Current push solutions fail to exploit the Current push solutions fail to exploit the collaborative power of the Internetcollaborative power of the Internet
Ideally the publishers inject one update Ideally the publishers inject one update into the world and all interested into the world and all interested subscribers will receive this.subscribers will receive this.
In this model all consumers are In this model all consumers are collaborating to route the information to collaborating to route the information to right subscribersright subscribers
The information arrives at all desktops The information arrives at all desktops within tens of seconds after publishingwithin tens of seconds after publishing
© Copyright 2002 Werner Vogels
Peer-to-Peer SolutionPeer-to-Peer Solution
P2P is the only approach to a cost P2P is the only approach to a cost effective, scalable solutioneffective, scalable solution
Subscribers weave an ad-hoc Subscribers weave an ad-hoc infrastructure for subscription based infrastructure for subscription based routing routing
Scalable, autonomous & Scalable, autonomous & decentralized managementdecentralized management
High level of robustness and High level of robustness and reliability in message deliveryreliability in message delivery
Authentication of publishersAuthentication of publishers
© Copyright 2002 Werner Vogels
Emerging technologiesEmerging technologies
Astrolabe, CAN, Cord, Pastry, are Astrolabe, CAN, Cord, Pastry, are emerging research technologies.emerging research technologies.
Astrolabe the furthest inAstrolabe the furthest in ScalabilityScalability Security integrationSecurity integration Manageable Manageable Firewall, proxy and NAT supportFirewall, proxy and NAT support
Complete technology that we are Complete technology that we are now using to develop applicationsnow using to develop applications
© Copyright 2002 Werner Vogels
Astrolabe/MarinerAstrolabe/Mariner
A system for ultra-scalable, A system for ultra-scalable, distributed state managementdistributed state managementRobustRobust, through the use of epidemic , through the use of epidemic
techniquestechniques ScalableScalable, through the use of , through the use of
information aggregation and fusioninformation aggregation and fusion SecureSecure, through certificates, through certificates FlexibleFlexible,, through secure mobile code through secure mobile code
Simulated, Emulated, Tested and Simulated, Emulated, Tested and Deployed.Deployed.
AstrolabeAstrolabe
Robust and Scalable Technology for Distributed System Monitoring, Management and Data Mining
© Copyright 2002 Werner Vogels
Distributed Systems ManagementDistributed Systems Management
Is extremely important in the Is extremely important in the deployment of large systemsdeployment of large systems
Scalable managementScalable management of applications of applications and systems is still a major Questand systems is still a major Quest
Management technology needs to be Management technology needs to be integrated into applicationsintegrated into applications
The management subsystem is often The management subsystem is often more complex than the application more complex than the application itselfitself
© Copyright 2002 Werner Vogels
AstrolabeAstrolabe
Information/state management Information/state management systemsystem
Monitors the dynamically changing Monitors the dynamically changing state of sets of distributed resourcesstate of sets of distributed resources
Reports summaries to its consumersReports summaries to its consumers Uses information hierarchies to Uses information hierarchies to
organize the dataorganize the data Uses aggregation techniques to Uses aggregation techniques to
continuously compute the summary continuously compute the summary nodes in the systemnodes in the system
© Copyright 2002 Werner Vogels
Current use of MarinerCurrent use of Mariner
Monitor and control applications, Monitor and control applications,
systems and infrastructuresystems and infrastructure
Resource discoveryResource discovery
Collaboration managementCollaboration management
Coordination of distributed tasksCoordination of distributed tasks
Edge-caching controlEdge-caching control
CDN dynamic managementCDN dynamic management
© Copyright 2002 Werner Vogels
Intuitively Intuitively
You can see mariner as a large You can see mariner as a large database with information about the database with information about the global systemglobal system
None of this information resides on a None of this information resides on a single serversingle server
Each principal has a row in the virtual Each principal has a row in the virtual database in which it is allowed to update database in which it is allowed to update with <attribute, value> pairs.with <attribute, value> pairs.
A principal can only directly access the A principal can only directly access the rows of other nodes in its zone and its rows of other nodes in its zone and its intermediate nodes in the hierarchy to intermediate nodes in the hierarchy to the root.the root.
© Copyright 2002 Werner Vogels
Mariner in a single zoneMariner in a single zone
Name1Name1 LoadLoad Weblogic?Weblogic? SMTP?SMTP? Word Word VersionVersion
……
swiftswift 2.02.0 00 11 6.26.2
falconfalcon 1.51.5 11 00 4.14.1
cardinalcardinal 4.54.5 11 00 6.06.0
Lowest level in the hierarchies can be nodes or Lowest level in the hierarchies can be nodes or finer grained if the application requires itfiner grained if the application requires it
Security key for zone needed to add a new column; Security key for zone needed to add a new column; user key needed to update rowuser key needed to update row
© Copyright 2002 Werner Vogels
Scalability through HierarchyScalability through Hierarchy
Leafs are organized into zonesLeafs are organized into zones Each leaf has a self-managed attribute Each leaf has a self-managed attribute
listlist The base zone is the collection of The base zone is the collection of
individual attribute lists of its leafsindividual attribute lists of its leafs Each intermediate zone is the collection Each intermediate zone is the collection
of attribute list constructed out of of attribute list constructed out of aggregation of the information in its child aggregation of the information in its child zoneszones
Each list has some basic attributes, that Each list has some basic attributes, that Mariner uses to manage itself such Mariner uses to manage itself such contact lists, timestamps, etc.contact lists, timestamps, etc.
© Copyright 2002 Werner Vogels
Simple HierarchySimple Hierarchy
NameName LoadLoad Weblogic?Weblogic? SMTP?SMTP? Word Word VersionVersion
……
swiftswift 2.02.0 00 11 6.26.2
falconfalcon 1.51.5 11 00 4.14.1
cardinalcardinal 4.54.5 11 00 6.06.0
NameName LoadLoad Weblogic?Weblogic? SMTP?SMTP? Word Word VersionVersion
……
gazellegazelle 1.71.7 00 00 4.54.5
zebrazebra 3.23.2 00 11 6.26.2
gnugnu .5.5 11 00 6.26.2
NameName Avg Avg LoadLoad
WL contactWL contact SMTP contactSMTP contact
SFSF 2.62.6 123.45.61.3123.45.61.3 123.45.61.17123.45.61.17
NJNJ 1.81.8 127.16.77.6127.16.77.6 127.16.77.11127.16.77.11
ParisParis 3.13.1 14.66.71.814.66.71.8 14.66.71.1214.66.71.12
San Francisco
New Jersey
© Copyright 2002 Werner Vogels
Information AggregationInformation Aggregation
Aggregation functions are Aggregation functions are programmableprogrammable
Subset of SQLSubset of SQL Code is embedded in aggregation Code is embedded in aggregation
function certificates (AFC)function certificates (AFC) Signed certificate is installed into Signed certificate is installed into
an attribute listan attribute list Used to construct (new) attributes Used to construct (new) attributes
in zones of the hierarchyin zones of the hierarchy
© Copyright 2002 Werner Vogels
Epidemic DisseminationEpidemic Dissemination
Each Astrolabe instance maintains Each Astrolabe instance maintains all the zones on its path to the rootall the zones on its path to the root
No centralized servers for No centralized servers for intermediate zonesintermediate zones
Consequently each instance has a Consequently each instance has a copy of the root zonecopy of the root zone
Replication is achieved through Replication is achieved through gossip techniques.gossip techniques.
Guarantees Guarantees eventual consistencyeventual consistency
© Copyright 2002 Werner Vogels
AFC propagationAFC propagation
1.1. Output of the AFC includes a copy of it Output of the AFC includes a copy of it self – results in a copy of the AFC into self – results in a copy of the AFC into the parent zonethe parent zone
Reaches the root and other zone leafsReaches the root and other zone leafs
2.2. AdoptionAdoption – check the ancestors lists to – check the ancestors lists to find new AFC’sfind new AFC’s
Spreads through the system in the Spreads through the system in the order of tens of seconds.order of tens of seconds.
Certificates have an expiration date, Certificates have an expiration date, unless refreshed aggregation eventually unless refreshed aggregation eventually halts halts
© Copyright 2002 Werner Vogels
I’ll skipI’ll skip
Aggregation function detailsAggregation function details Mobile code detailsMobile code details Eventual consitencyEventual consitency CertificatesCertificates AuthenticationAuthentication Firewalls, & nat’sFirewalls, & nat’s
© Copyright 2002 Werner Vogels
Robustness through GossipRobustness through Gossip
Use of Epidemic Techniques to Use of Epidemic Techniques to disseminate data and AFC’sdisseminate data and AFC’s
Pure peer-to-peer communicationPure peer-to-peer communication Full autonomous progressFull autonomous progress Actions based on probability theoryActions based on probability theory Robustness improves with scaleRobustness improves with scale Fixed low overhead, independent of Fixed low overhead, independent of
scalescale Control as well as Data transportControl as well as Data transport
© Copyright 2002 Werner Vogels
GossipGossip
Conceptually: each zone periodically picks another zone Conceptually: each zone periodically picks another zone at random and exchanges the state of those zonesat random and exchanges the state of those zones
Slightly more complex because there are virtual zones Slightly more complex because there are virtual zones ……
© Copyright 2002 Werner Vogels
Gossip target selectionGossip target selection
AsiaAsia CornellCornell Node1Node1 SystemSystem
EuropeEurope MITMIT Node2Node2 InventoryInventory
USAUSA USCDUSCD Node3Node3 MonitorMonitor
U-WashU-Wash Node4Node4
1. Each instance update the issued attribute, evaluates depending AFC’s
2. An agent (instance) will gossip on behalf of those zones for which it is a contact, with a rate depending on configuration
3. At each level pick at random a child from the contact list and exchange state
© Copyright 2002 Werner Vogels
MembershipMembership
Failure detectionFailure detection If no update seen for an agent in time If no update seen for an agent in time
TTfailfail, remove it from the system, remove it from the system
IntegrationIntegration After partitions, crashes, etc. After partitions, crashes, etc.
renegate trees can be formedrenegate trees can be formed Use of broadcast, multicast, hints, to Use of broadcast, multicast, hints, to
discover other agentsdiscover other agents
© Copyright 2002 Werner Vogels
Subscription routingSubscription routing
At the leafs the subscribers store At the leafs the subscribers store subscription informationsubscription information
Aggregation functions combine the Aggregation functions combine the subscriptions of participants into subscriptions of participants into subscriptions for the zonesubscriptions for the zone
Publishers use Publishers use zone.send(subscription, data)zone.send(subscription, data)
which is forwarded if the zone has which is forwarded if the zone has children that match the subscriptionchildren that match the subscription
© Copyright 2002 Werner Vogels
Routing infrastructureRouting infrastructure
Each zone dynamically selects 2-3 Each zone dynamically selects 2-3 routing nodes using AFC’s using routing nodes using AFC’s using various load factorsvarious load factors
These nodes receive news items for These nodes receive news items for their children in their zonetheir children in their zone
Forwarding based on the individual Forwarding based on the individual subscription informationsubscription information
Redundancy used to achieve Redundancy used to achieve robustness and reliabilityrobustness and reliability
© Copyright 2002 Werner Vogels
SummarySummary