46
Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM Research Haifa HotNets, October 5, 2008

Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Embed Size (px)

Citation preview

Page 1: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Dr. Multicast    for Data Center Communication Scalability

Ymir Vigfusson   Hussam Abu-Libdeh   Mahesh Balakrishnan   Ken BirmanCornell University

Yoav TockIBM Research Haifa

HotNets, October 5, 2008

Page 2: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• IPMC is not used in data centers

Page 3: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• IPMC is not used in data centers• Would speed up products that use multicast

Page 4: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• Why is IP multicast rarely used?

Page 5: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• Why is IP multicast rarely used?o Limited IPMC scalability on switches/routers and

NICs

Page 6: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• Why is IP multicast rarely used?o Limited IPMC scalability on switches/routers and

NICso Broadcast storms: Loss triggers a horde of

NACKs, which triggers more loss, etc. o Disruptive even to non-IPMC applications.

Page 7: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• IP multicast has a bad reputation

Page 8: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• IP multicast has a bad reputationo Works great up to a point,                                

after which it breaks                                         catastrophically

Page 9: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

IP Multicast in Data Centers

• Bottom line:o Administrators have no control over multicast

use ...o Without control, they opt for never.

Page 10: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM
Page 11: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Dr. Multicast  

Page 12: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Dr. Multicast (MCMD)

• Policy: Permits data center operators to selectively enable and control IPMC

 • Transparency: Standard IPMC interface, system

calls are overloaded. • Performance: Uses IPMC when possible,

otherwise point-to-point unicast • Robustness: Distributed, fault-tolerant service

 

Page 13: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Terminology

• Process: Application that joins logical IPMC groups

• Logical IPMC group: A virtualized abstraction• Physical IPMC group: As usual• UDP multi-send: New kernel-level system-call 

  • Collection: Set of logical IPMC groups with

identical membership

Page 14: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Acceptable Use Policy

• Assume a higher-level network management tool compiles policy into primitives

• Explicitly allow a process to use IPMC groupso allow-join(process,logical IPMC)o allow-send(process,logical IPMC)

• UDP multi-send always permitted • Additional restraints

o max-groups(process,limit)o force-udp(process,logical IPMC)

Page 15: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

 Overview

• Library module• Mapping module• Gossip layer

 • Optimization

questions • Results

Page 16: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• Transparent. Overloads the IPMC functions o setsockopt(), send(), etc.

 • Translation. Logical IPMC map to a

set of P-IPMC/unicast addresses.o Two extremes

MCMD Library Module

Page 17: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• MCMD Agent runs on each machineo Contacted by the library modules  o Provides a mapping

  • One agent elected to be a leader:

o Allocates IPMC resources according to the current policy    

MCMD Mapping Role

Page 18: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

 • Allocating IPMC resources: An optimization problem

      

Procs 

L-IPMC

MCMD Mapping Role

This box intentionally left  

BLACK

Procs 

Collections

L-IPMC

Page 19: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• Runs system-wide as part of the agent • Automatic failure detection 

 • Group membership fully replicated via gossip

o Node reports its own stateo Future: Replicate more selectively o Leader runs optimization algorithm on data and

reports the mapping    

MCMD Gossip Layer

Page 20: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• But gossip is slow... • Implications:

o Slow propagation of group membershipo Slow propagation of new mapso We assume a low rate of membership churn

 • Remedy: Broadcast module

o Leader broadcasts urgent messages o Bounded bandwidth of urgent channelo Trade-off between latency and scalability

    

MCMD Gossip Layer

Page 21: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Overview

• Library module• Mapping module• Gossip layer

 • Optimization

questions • Results

Page 22: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Optimization Questions

Procs   L-IPMC

BLACK

Collections

Procs    L-IPMC

• First step: compress logical IPMC groups

Page 23: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Optimization Questions

• How compressible are subscriptions?o Multi-objective optimization: 

Minimize number of collectionsMinimize bandwidth overhead on network

 o Thm: The general problem is NP-completeo Thm: In uniform random allocation, "little"

compression opportunity. o Social preferences o Lots of duplicates due to replication (e.g. for

load balancing)   

Page 24: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Optimization Questions

• Which collections get an IPMC address?o Thm: Ordered by decreasing traffic*size, 

assign P-IPMC addresses greedily, we minimize bandwidth.

• Tiling heuristic:o Sort L-IPMC by traffic*sizeo Greedily collapse identical groupso Assign IPMC to collections in reverse order of

traffic*size, UDP-multisend to the rest• Building tilings incrementally

 

Page 25: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Experimental Results

Page 26: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• Insignificant overhead when mapping L-IPMC to P-IPMC.

            

klk;l    Overhead (max. throughput)

Page 27: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• Insignificant overhead when mapping L-IPMC to P-IPMC.

            

klk;l    Overhead (CPU utilization)

Page 28: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Network Overhead

• Gossip Layer uses constant background bandwidth, urgent channel behaves well

       

Page 29: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Latency

• Latency of propagation of joins/leaves and new maps

    

Page 30: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects

            

klk;l    Policy control

<Traffic starts<New policy

Page 31: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Conclusion

• IPMC has been a bad citizen...

 

Page 32: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Conclusion

• IPMC has been a bad citizen...

 • Dr. Multicast has the cure!

• Opportunity for big performance enhancements and policy control.

Page 33: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Thank you!

Page 34: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Thank you!  

Page 35: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• Insignificant overhead when mapping L-IPMC to P-IPMC.            

klk;l    Overhead

Page 36: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects

            

klk;l    Policy control

Page 37: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects

            

klk;l    Policy control

Page 38: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

• Linux kernel module increases UDP-multisend throughput by 17% (compared to user-space UDP-multisend)

             

klk;l    Overhead

Page 39: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Latency of events

• Gossip: 99% of nodes aware of change within 9 epochs (now 1 sec)

    

Page 40: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Conclusions

• Policy: Allows data center operators to        enable and control IPMC

 • Transparency: Standard IPMC interface, system

calls are overloaded. • Performance: Uses IPMC when possible,

otherwise point-to-point UDP • Robustness: Distributed, fault-tolerant service

 

Page 41: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Results

• Library Moduleo Insignificant slowdown

    

o Linux Kernel module provides 17% speed-up for UDP multi-send

Page 42: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

klk;l    Optimization questions

Users

Topics

This box intentionally left  

BLACKUsers

 Groups

Topics

• Multi-objective: o Minimize number of groupso Minimize bandwidth overhead on network

• Thm: This problem is NP-completeo Reduction to Minimum Normal Set Basis

   

Page 43: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

MCMD Library Layer

• Overloads the IPMC functions o setsockopt(), send(), etc.

• Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy

• Notifies MCMD immediately about joins/leaves

• Learns about new mappings from MCMD

• Keeps statistics about group traffic rates

Page 44: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

MCMD Library Layer

• Overloads the IPMC functions o setsockopt(), send(), etc.

• Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy

 • Caches translation maps• Maintains a connection to MCMD for

updates

Page 45: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM
Page 46: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM

Overview

• Library module• Mapping module• Gossip layer

 • Optimization

questions • Results