Upload
solomon-bennett
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Building Very Large Overlay NetworksJorg Liebeherr
University of Virginia
HyperCast Project
• HyperCast is a set of protocols for large-scale overlay multicasting and peer-to-peer networking
Research problems:• Investigate which topologies are best
suited for
… a large number of peers… that spuriously join and leave … in a network that is dynamically changing
• Make it easier to write applications for overlays easier by
… providing suitable abstractions (“overlay socket”)
… shielding application programmer from overlay topology issues
Acknowledgements
• Team: – Past: Bhupinder Sethi, Tyler
Beam, Burton Filstrup, Mike Nahas, Dongwen Wang, Konrad Lorincz, Jean Ablutz, Haiyong Wang, Weisheng Si, Huafeng Lu
– Current: Jianping Wang, Guimin Zhang, Guangyu Dong, Josh Zaritsky
• This work is supported in part by the National Science Foundation:
D E N A L I
Applications with many receivers
Numberof Receivers
Number of Senders
Streaming
SoftwareDistribution
10 1,000
1
1,000,000
10
1,000
1,000,000
CollaborationTools
Games
Peer-to-PeerApplications
Sensor networks
Need for Multicasting ?
• Maintaining unicast connections is not feasible• Infrastructure or services needs to support a “send to group”
Multicast support in the network infrastructure (IP Multicast)
• Reality Check:
– Deployment has encountered severe scalability limitations in both the size and number of groups that can be supported
– IP Multicast is still plagued with concerns pertaining to scalability, network management, deployment and support for error, flow and congestion control
Overlay Multicasting
• Logical overlay resides on top of the layer-3 network infrastructure (“underlay”)
• Data is forwarded between neighbors in the overlay• Transparent to underlay network• Overlay topology should match the layer-3 infrastructure
Overlay-based approaches for multicasting
• Build an overlay mesh network and embed trees into the mesh:– Narada (CMU)– RMX/Gossamer (UCB)– more
• Build a shared tree:– Yallcast/Yoid (NTT, ACIRI)– AMRoute (Telcordia, UMD – College Park)– Overcast (MIT)– Nice (UMD)– more
• Build an overlay using distributed hash tables and embed trees:– Chord (UCB, MIT)– CAN (UCB, ACIRI)– Pastry/Scribe (Rice, MSR)– more
HyperCast Overlay Topologies
Applications organize themselves to form a logical overlay network with a given topology
Data is forwarded along the edges of the overlay network
Hypercube
Delaunaytriangulation
Spanning tree(for mobile ad hoc)
Programming in HyperCast: Overlay Socket
• Socket-based API
• UDP (unicast or multicast) or TCP
• Supports different semantics for transport of data
• Supports different overlay protocols
• Implementation in Java
OverlaySocket
Forwarding Engine Message Store
Overlay Socket Interface
Sta
tis
tic
s I
nte
rfa
ce
Messages ofthe OverlayProtocol
ApplicationReceiveBuffer
Overlay Node
Overlay Node Interface
Node Adapter
Adapter Interface
Socket Adapter
Adapter Interface
ApplicationMessages
Application Program
Underlay Network
• HyperCast Project website: http://www.cs.virginia.edu/hypercast
Network of overlay sockets
• Overlay socket is an endpoint of communication in an overlay network
• An overlay network is a collection of overlay sockets
• Overlay sockets in the same overlay network have a common set of attributes
• Each overlay network has an overlay ID, which can be used the access attributes of overlay
• Nodes exchange data with neighbors in the overlay topology
Internet
Overlaysocket
Application
Overlaysocket
Application
ApplicationOverlaysocket
Application
ApplicationOverlaysocket
Application
Unicast and Multicast in overlays
Root(sender)
Root(receiver)
Multicast Unicast
Sender
• Unicast and multicast is done along trees that are embedded in the overlay network.
• Requirement: Overlay node must be able to compute the child nodes and parent node with respect to a given root
Overlay Message Format
1 2 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-------+---------------+---+---+-------------------------------+ |Version|LAS|Dmd| Traffic Class | Flow Label | Next Header | +-------+---------------+---+---+-------------------------------+ | OL Message Length | Hop Limit | +-------------------------------+-------------------------------+ | Src LA | +--------------------------------------------------------------- | Dest LA | +---------------------------------------------------------------+
• Common Header:
Version (4 bit): Version of the protocol (current Version is 0x0)LAS (2 bit): Size of logical address fieldDmd (4bit) Delivery Mode (Multicast, Flood, Unicast, Anycast)Traffic Class (8 bit): Specifies Quality of Service class (default: 0x00)Flow Label (8 bit): Flow identifierNext Header (8 bit) Specifies the type of the next header following this header OL Message Length (8 bit) Specifies the type of the next header following this header.Hop Limit (16 bit): TTL field Src LA ((LAS+1)*4 bytes) Logical address of the source Dest LA ((LAS+1)*4 bytes Logical address of the destination
Loosely modeled after IPv6 minimal header with extensions
Socket Based API
//Generate the configuration object OverlayManager om = new OverlayManager(“hypercast.prop”); String overlayID = om.getDefaultProperty(“MyOverlayID")OverlaySocketConfig config = new om.getOverlaySocketConfig(overlayID);
//create an overlay socket OL_Socket socket = config.createOverlaySocket(callback);
//Join an overlay socket.joinGroup();
//Create a message OL_Message msg = socket.createMessage(byte[] data, int length);
//Send the message to all members in overlay network socket.sendToAll(msg);
//Receive a message from the socket OL_Message msg = socket.receive();
//Extract the payload byte[] data = msg.getPayload();
//Generate the configuration object OverlayManager om = new OverlayManager(“hypercast.prop”); String overlayID = om.getDefaultProperty(“MyOverlayID")OverlaySocketConfig config = new om.getOverlaySocketConfig(overlayID);
//create an overlay socket OL_Socket socket = config.createOverlaySocket(callback);
//Join an overlay socket.joinGroup();
//Create a message OL_Message msg = socket.createMessage(byte[] data, int length);
//Send the message to all members in overlay network socket.sendToAll(msg);
//Receive a message from the socket OL_Message msg = socket.receive();
//Extract the payload byte[] data = msg.getPayload();
• Tries to stay close to Socket API for UDP Multicast
• Program is independent of overlay topology
Property File
# This is the Hypercast Configuration File## # LOG FILE:LogFileName = stderr # ERROR FILE:ErrorFileName = stderr # OVERLAY Server: OverlayServer =
# OVERLAY ID:OverlayID = 224.228.19.78/9472KeyAttributes = Socket,Node,SocketAdapter # SOCKET:Socket = HCast2-0HCAST2-0.TTL = 255HCAST2-0.ReceiveBufferSize = 200HCAST2-0.ReadTimeout = 0
. . .
# This is the Hypercast Configuration File## # LOG FILE:LogFileName = stderr # ERROR FILE:ErrorFileName = stderr # OVERLAY Server: OverlayServer =
# OVERLAY ID:OverlayID = 224.228.19.78/9472KeyAttributes = Socket,Node,SocketAdapter # SOCKET:Socket = HCast2-0HCAST2-0.TTL = 255HCAST2-0.ReceiveBufferSize = 200HCAST2-0.ReadTimeout = 0
. . .
# SOCKET ADAPTER:SocketAdapter = TCP SocketAdapter.TCP.MaximumPacketLength = 16384SocketAdapter.UDP.MessageBufferSize = 100
# NODE:Node = HC2-0HC2-0.SleepTime = 400HC2-0.MaxAge = 5HC2-0.MaxMissingNeighbor = 10HC2-0.MaxSuppressJoinBeacon = 3
# NODE ADAPTER:#NodeAdapter = UDPMulticast NodeAdapter.UDP.MaximumPacketLength = 8192NodeAdapter.UDP.MessageBufferSize = 18NodeAdapter.UDPServer.UdpServer0 = 128.143.71.50:8081NodeAdapter.UDPServer.MaxTransmissionTime = 1000NodeAdapter.UDPMulticastAddress = 224.242.224.243/2424
# SOCKET ADAPTER:SocketAdapter = TCP SocketAdapter.TCP.MaximumPacketLength = 16384SocketAdapter.UDP.MessageBufferSize = 100
# NODE:Node = HC2-0HC2-0.SleepTime = 400HC2-0.MaxAge = 5HC2-0.MaxMissingNeighbor = 10HC2-0.MaxSuppressJoinBeacon = 3
# NODE ADAPTER:#NodeAdapter = UDPMulticast NodeAdapter.UDP.MaximumPacketLength = 8192NodeAdapter.UDP.MessageBufferSize = 18NodeAdapter.UDPServer.UdpServer0 = 128.143.71.50:8081NodeAdapter.UDPServer.MaxTransmissionTime = 1000NodeAdapter.UDPMulticastAddress = 224.242.224.243/2424
• Stores attributes that configure the overlay socket (overlay protocol, transport protocol and addresses)
Hypercast Software: Demo Applications
Distributed Whiteboard Multicast file transfer
Data aggregation in P2P Net: Homework assignment in CS757 (Computer Networks)
Application: Emergency Response Network for Arlington County, Virginia
Project directed by B. Horowitz and S. Patek
Application of P2P Technology to
Advanced Emergency Response Systems
Priority Messaging
Situational Awareness
Dynamic Grouping andInformation Management
Other Features of Overlay Socket
Current version (2.0):• Statistics: Each part of the overlay socket maintains statistics which are
accessed using a common interface • Monitor and control: XML based protocol for remote access of statistics
and remote control of experiments• LoToS: Simulator for testing and visualization of overlay protocols• Overlay Manager: Server for storing and downloading overlay
configurations
Next version (parts are done, release in Summer 2004):– “MessageStore”: Enhanced data services, e.g., end-to-end reliability,
persistence, streaming, etc.– HyperCast for mobile ad-hoc networks on handheld devices– Service differentiation for multi-stream delivery– Clustering– Integrity and privacy with distributed key management
Delaunay Triangulation Overlays
0,0 1,0 2,0 3,0
0,1
0,2
0,3
12,0
10,8
5,2
4,9
0,6
Nodes are assigned x-y coordinates
(e.g., based on geographic location)
Nodes are assigned x-y coordinates
(e.g., based on geographic location)
Nodes in a Plane
12,0
10,8
5,2
4,9
0,6
The Voronoi region of a node is the region of the plane that is closer to this node than to any other node.
The Voronoi region of a node is the region of the plane that is closer to this node than to any other node.
Voronoi Regions
The Delaunay triangulation has edges between nodes in neighboring Voronoi regions.
The Delaunay triangulation has edges between nodes in neighboring Voronoi regions.
12,0
10,8
5,2
4,9
0,6
Delaunay Triangulation
An equivalent definition:A triangulation such that each circumscribing circle of a triangle formed by three vertices, no vertex of is in the interior of the circle.
An equivalent definition:A triangulation such that each circumscribing circle of a triangle formed by three vertices, no vertex of is in the interior of the circle.
12,0
10,8
5,2
4,9
0,6
Delaunay Triangulation
A
B
C
D
Locally Equiangular Property
• Sibson 1977: Maximize the minimum angle
For every convex quadrilateral formed by triangles ACB and ABD that share a common edge AB, the minimum internal angle of triangles ACB and ABD is at least as large as the minimum internal angle of triangles ACD
and CBD.
A
B
C
D
Next-hop routing with Compass Routing
• A node’s parent in a spanning tree is its neighbor which forms the smallest angle with the root.
• A node need only know information on its neighbors – no routing protocol is needed for the overlay.
Root Node
B
A
15°
30°
B is the Node’s Parent
12,0
4,9
Spanning tree when node (8,4) is root. The tree can be calculated by both parents and children.
Spanning tree when node (8,4) is root. The tree can be calculated by both parents and children.
0,6
5,2
4,9
12,0
10,8
8,4
Evaluation of Delaunay Triangulation overlays
• Delaunay triangulation can consider location of nodes in an (x,y) plane, but is not aware of the network topology
Question: How does Delaunay triangulation compare with other overlay topologies?
Overlay Topologies
Delaunay Triangulation and variants– DT– Hierarchical DT– Multipoint DT
Hypercube
Degree-6 Graph– Similar to graphs generated in Narada
Degree-3 Tree– Similar to graphs generated in Yoid
Logical MST– Minimum Spanning Tree
overlays that assume knowledge of network topology
overlays used by HyperCast
Transit-Stub Network
Transit-Stub• GeorgiaTech
topology generator
• 4 transit domains
• 416 stub domains
• 1024 total routers
• 128 hosts on stub domain
Evaluation of Overlays
• Simulation:– Network with 1024 routers (“Transit-Stub” topology)– 2 - 512 hosts
• Performance measures for trees embedded in an overlay network:
– Degree of a node in an embedded tree
– “Stretch”: Ratio of delay in overlay to shortest path delay
– “Stress”: Number of duplicate transmissions over a physical link
Illustration of Stress and Stretch
AA
BBStress = 2
Stress = 2
Stretch for AB: 1.5
1 1
1 1
Unicast delay AB : 4
11 1
1
1
1
Delay AB in overlay: 6
Average StretchS
tret
ch (
90th P
erce
ntil
e)
90th Percentile of Stretch
Delaunay triangulation
Str
etch
(90
th P
erce
ntil
e)
Average Stress
Delaunay triangulation
90th Percentile of Stress
Delaunay triangulation
The DT Protocol
Protocol which organizes members of a network in a Delaunay Triangulation
• Each member only knows its neighbors
• “soft-state” protocol
Topics:• Nodes and Neighbors• Example: A node joins• State Diagram• Rendezvous• Measurement Experiments
Each node sends Hello messages to its neighbors periodically
Each node sends Hello messages to its neighbors periodically
12,0
5,2
4,9
0,6
10,8HelloHelloH
ello
HelloHello
Hello
HelloHello
HelloHelloH
ello
Hello
HelloHello
• Each Hello contains the clockwise (CW) and counterclockwise (CCW) neighbors
• Receiver of a Hello runs a “Neighbor test” ( locally equiangular prop.)
• CW and CCW are used to detect new neighbors
• Each Hello contains the clockwise (CW) and counterclockwise (CCW) neighbors
• Receiver of a Hello runs a “Neighbor test” ( locally equiangular prop.)
• CW and CCW are used to detect new neighbors
12,0
5,2
4,9
0,6
10,8
Hello
CW =
12,
0
CCW =
4,9
Nei
gh
bo
r
5,2 12,0 4,9 4,9 5,2 –12,0 – 10,8
CC
W
CW
Neighborhood Table of 10,8
A node that wants to join the triangulation contacts a node that is “close”
A node that wants to join the triangulation contacts a node that is “close”
12,0
10,8
5,2
4,9
0,6
8,4
New node
Hello
Node (5,2) updates its Voronoi region, and the triangulation
Node (5,2) updates its Voronoi region, and the triangulation
12,0
4,9
0,6
8,4
10,8
5,2
(5,2) sends a Hello which contains info for contacting its clockwise and counterclockwise neighbors
(5,2) sends a Hello which contains info for contacting its clockwise and counterclockwise neighbors
Hello
12,0
4,9
0,6
8,4
10,8
5,2
12,0
4,9
0,6
(8,4) contacts these neighbors ...(8,4) contacts these neighbors ...
8,4
10,8
5,2
Hello
Hello
12,0
4,9
12,0
4,9
… which update their respective Voronoi regions.
… which update their respective Voronoi regions.
0,6
10,8
5,2
4,9
12,0
8,4
12,0
4,9
0,6
(4,9) and (12,0) send Hellos and provide info for contacting their respective clockwise and counterclockwise neighbors.
(4,9) and (12,0) send Hellos and provide info for contacting their respective clockwise and counterclockwise neighbors.
10,8
5,2
4,9
12,0
8,4
Hello
Hello
12,0
4,9
0,6
(8,4) contacts the new neighbor (10,8) ...
(8,4) contacts the new neighbor (10,8) ...
10,8
5,2
4,9
12,0
8,4H
ello
10,8
12,0
4,9
0,6
…which updates its Voronoi region...…which updates its Voronoi region...
5,2
4,9
12,0
10,8
8,4
12,0
4,9
0,6
…and responds with a Hello…and responds with a Hello
5,2
4,9
12,0
10,8
8,4H
ello
12,0
4,9
This completes the update of the Voronoi regions and the Delaunay Triangulation
This completes the update of the Voronoi regions and the Delaunay Triangulation
0,6
5,2
4,9
12,0
10,8
8,4
Rendezvous Methods
• Rendezvous Problems:– How does a new node detect a member of the overlay?– How does the overlay repair a partition?
• Three solutions:
1. Announcement via broadcast
2. Use of a rendezvous server
3. Use `likely’ members (“Buddy List”)
12,0
10,8
5,2
4,9
0,6
8,4
New node
Hello
Rendezvous Method 1: Announcement via broadcast (e.g., using IP Multicast)
Rendezvous Method 1: Announcement via broadcast (e.g., using IP Multicast)
12,0
10,8
5,2
4,9
0,6
Leader Rendezvous Method 1:
A Leader is a node with a Y-coordinate higher than any of its neighbors.
Rendezvous Method 1:
A Leader is a node with a Y-coordinate higher than any of its neighbors.
12,0
10,8
5,2
4,9
0,6
8,4
New node
Rendezvous Method 2: New node and leader contact a rendezvous server. Server keeps a cache of some other nodes
Rendezvous Method 2: New node and leader contact a rendezvous server. Server keeps a cache of some other nodes
Server
ServerRequestServerReply(12,0)
NewNode
Hello NewNode
12,0
10,8
5,2
4,9
0,6
8,4
New nodewith Buddy List: (12,0) (4,9)
Rendezvous Method 3: Each node has a list of “likely” members of the overlay network
Rendezvous Method 3: Each node has a list of “likely” members of the overlay network
NewNode
Hello NewNode
State Diagram of a Node
Leaderwithout
Neighbor
Leader withNeighbor
NotLeader
Leaving
Stopped
Neighbor added(with smaller coordinates)
All neighborsleave or timeout
Neighbor added(with larger coordinates)
All neighborsleave or timeout
Application starts
A new neighbor withgreater coordinates is added
After removing some neighbor,this node has largest coordinates
Send Goodbye Send Goodbye
SendGoodbye
Applicationexits
Sub-states of a Node
Stable WithoutCandidateNeighbor
Stable WithCandidateNeighbor
NotStable
Node contained in NewNodepasses neighbor test
After handling thecandidate neighbor,node remains stable
After neighborhoodupdating, node becomes
not stable
After neighborhood updating,node becomes stable.
After neighborhoodupdating,
node becomes not stable
• A node is stable when all nodes that appear in the CW and CCW neighbor columns of the neighborhood table also appear in the neighbor column
• A node is stable when all nodes that appear in the CW and CCW neighbor columns of the neighborhood table also appear in the neighbor column
Measurement Experiments
• Experimental Platform: Centurion cluster at UVA (cluster of 300 Linux PCs) – 2 to 10,000 overlay members – 1–100 members per PC
• Random coordinate assignments
Switch 8
Switch 9
Switch 11
Switch 10
Switch 4
Switch 5
Switch 6
Switch 7
Switch 3
Internet
centurion149-167centurion183centurion253-255
centurion246centurion250centurion251
centurion249centurion252
centurion168-182centurion164-187
centurion188-211
centurion228-247centurion128-147
Gigabit Ethernet
How long does it take to add M members to an overlay network of N members ?
Experiment: Adding Members
M+N members
Tim
e to
Co
mp
lete
(se
c)
Experiment: Throughput of Multicasting
Number of Members N
Bandwidth bounds(due to stress)
Measuredvalues
Ave
rag
e th
rou
gh
pu
t
(Mb
ps)
100 MB bulk transfer for N=2-100 members (1 node per PC) 10 MB bulk transfer for N=20-1000 members (10 nodes per PC)
Experiment: Delay
100 MB bulk transfer for N=2-100 members (1 node per PC) 10 MB bulk transfer for N=20-1000 members (10 nodes per PC)
Del
ay o
f a
pac
ket
(mse
c)
Number of Nodes N
Wide-area experiments
• PlanetLab: worldwide network of Linux PCs• Obtain coordinates via “triangulation”
– Delay measurements to 3 PCs
– Global Network Positioning (Eugene Ng, CMU)
Planetlab Video Streaming Demo
Summary
• Overlay socket is general API for programming P2P systems• Several proof-of-concept applications • Intensive experimental testing:
– Local PC cluster (on 100 PCs)• 10,000 node overlay in < 1 minute
• Throughput of 2 Mbps for 1,000 receivers
– PlanetLab (on 60 machines)• Up to 2,000 nodes
• Video streaming experiment with 80 receivers at 800 kbps had few losses
HyperCast (Version 2.0) site: http://www.cs.virginia.edu/hypercastDesign documents, download software, user manual