Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Progress on SDN, OVS and dynamic circuits
Ramiro VoicuCalifornia Institute of Technology
LHCOPN-LHCONE meetingAmsterdam, October 28-29, 2015
LHC Collisions at 13 TeVImage source: CMS
Complex Workflow: the Flow Patterns Have Increased in Scale and Complexity, even at the start of LHC Run2
WLCG Dashboard Transfer Throughput
CMS
ATLAS
ALICE
LHCb
June 2015 October 2015
15GB/s 20GB/s
PhEDEx and Dynamic CircuitsUsingdynamiccircuitsinPhEDEx allowsformoredeterministicworkflows,usefulforco-schedulingCPUwithdatamovementIntegratingcircuitawarenessintotheFileDownload agent:
• Applicationisbackendagnostic;NomodificationstoPhEDEx DB• AllcontrollogicisintheFileDownload agent• TransparentforallotherPhEDEx instances
PhEDEx throughput on a shared path
(with 5 Gbps of UDP cross traffic)
Seamless switchoverNo interruption of service
sandy01-amswood1-ams
T2_ANSE_Amsterdam
sandy01-gva hermes2
T2_ANSE_Geneva
HighspeedWANcircuit
Sharedpath
PhEDEx throughput on a dedicated path
1h moving average
1h moving average
FDTsustainedrates:~1500MB/sAverageover24hrs:~1360MB/s
Phed
exRa
teM
B/s
Integrating NSI Circuits with PhEDEx: Control Logic
1. PhEDEx requests a circuit between sites A and B; waits for confirmation
2. Wrapper gets a vector of source and destination IPs of all servers involved in the transfer, via an SRM plugin
3. Wrapper passes this information to the OF controller4. PhEDEx receives the confirmation of the circuit, informs the OF
controller that a circuit has been established between the two sites5. OF controller adds routing information in the OF switches that direct
all traffic on the subnet to the circuit
Advanced Network Services for Experiments (ANSE) Integrating PhEDEx with Dynamic Circuits for CMS
• BuildingontheAutoGOLEFabricofOpenExchangePointsandNSIstandardvirtualcircuitsOSCARS+NSIcircuitsareusedtocreateWANpathswithreservedbandwidthacrosstheAutoGOLE fabric
• OFflow-matchingisdoneonspecificsubnetstorouteonlythedesiredtraffic
• Openflow alsocanbeusedtoselectpathsoutsidethefabric
Lapatadescu, Wildish, Mughal, Bunn, Legrand, Newman
ANSE: 1st real life application integration of NSI and the AutoGOLE fabric with PhEDEx for CMS
Open vSwitch OVS
q “Open vSwitch is a production quality, multilayer virtual switch”
q OpenFlow protocol support (1.3)q Kernel and user space forwarding enginesq Runs on all major Linux distributions used in HEPq NIC bondingq Fine grained QoS supportq Ingress qdisc, HFSC, HTBq Used in a variety of hardware platforms (e.g. Pica8) and
software appliances like Mininetq Interoperates with OpenStack
q OVN (Open Virtual Network): Virtualized network “implemented on top of a tunnel-based (VXLAN, NVGRE, IPsec., etc) overlay network”
Using OVS for end-host orchestrationIntegrating PhEDEx with Dynamic Circuits for CMS
Standard OpenFlow (or OVSDB) protocol for end-hostnetwork orchestration (no need for custom SB protocol)Simple procedure to migrate to OVS on the end-host. SDN controller not required in the initial deployment phaseHost type (storage, compute) dynamically discovered using OF identification string
Use SDN controller to create an overlay networkfrom circuit end-point (Border Router) to the storage
• OVS2.3.1withstockRH6.xkernel
• OVSbridgedinterfaceachievedsameperformanceashardware(10Gbps)
SDN QoS: Traffic Shaping withOpen vSwitch (OVS)
• egressrate-limit• BasedonLinuxkernel:
• HTB(HierarchicalTokenBucket)
• HFSC(HierarchicalFair-ServiceCurve)
• OVS2.4withstockkernel• NSIcircuitCaltech->UMICH(~60ms)
• Verystableupto7.5Gbps• Fairlygoodshapingabove• 8Gbps (smallinstabilities)
Traffic Shaping withOpen vSwitch (OVS) WAN tests over NSI
OVS benefitsq Standard OpenFlow (and/or OVSDB) end-host
orchestration q QoS SDN orchestration in non-OpenFlow clustersq OVS works with stock SL/CentOS/RH 6.x kernel used in
HEP; works out-of-the-box on SL7/CC7q OVS bridged interface achieved the same performance as
the hardware (10Gbps)q No CPU overhead when OVS does traffic shaping on the
physical portq Traffic shaping (egress) of outgoing flows may help
performance in such cases when the upstream switch (or ToR) has smaller buffers
https://indico.cern.ch/event/376098/contribution/24/material/slides/1.pdf
ODL controlling OVS via OVSDB and OF
ODL Controller
q OVSDB – Open vSwitchDatabase Management
q RFC7047q Support as SB protocol
in major SDN controllersq Used to create the virtual
bridgesq Virtual bridges can use
standard OF to speak with the controller
q Normal routing if the controller is down
OpenFlow topology discovery with non-OpenFlow islands
ODL Controller
Non OF switch
q Same topology used for QoS tests
q OVS instances on the two hosts controlled by a standard ODL(BE)
q Non-OpenFlow switch in the middle
Topology seen by the controller is missing the link between the OVS instances
OpenFlow topology discovery with non-OpenFlow islands
ODL Controller
Non OF switch
Use the same EtherType0x88cc (LLDP)
Instead of bridge-filteredmulticast MAC (01:80:C2:00:00:0E) use a normal multicast mac address (01:23:00:00:00:01)
Unofficially known as OpenFlow Discovery ProtocolOFDP
OpenFlow islands over WAN & NSI circuit
ODL @ Caltech
OF island 1
OF island 3(Umich)
OF island 2
NSI Circuit Caltech - UMICH
DTN/FDTServer@UMICH
DTN/FDTServer@Caltech
A. Mughal, I.Legrand
q Simplified topology management using a “single” controller
q The distributed SDN controller is a redundant setup for HA. The controllers share the same view of the network (support for redundancy embedded in ODL and ONOS)
q OVS can be used to identify the flows right at the edgesq Can also be used for QoS right at the edgeq Standard (and widely supported) protocols and software
components for controlq Seamless topology discovery even with Non-OF devices and/or
NSI(L2) circuits in the middle
Possible architecture with a “single” controller
NSI Circuit
100G DTN TESTS
DTN: 100G network tests
NetworkTopology:Serversusedintests:Sc-100G-1andSc100G-2
q Two Identical Haswell Serversq E5-2697 v3q X10DRi
Motherboardq 128GB DDR4
RAMq Mellanox VPI NICsq Qlogic NICsq CentOS 7.1q Dell Z9100 100GE
Switch q 100G CR4 Cables
from Elpeus for switch connections
q 100G CR4 Cables from Mellanox for back to back connections
100G TCP tests
2 TCP StreamsStable 94Gbps
1 TCP StreamsStable 68Gbps
Line rate (100G) with 4+ TCP Streams
Client #numactl --physcpubind=20--localalloc java-jarfdt.jar-c1.1.1.2–nettest -P1-p7000Server #numactl --physcpubind=20--localalloc java-jarfdt.jar-p7000
CPU: Single Core - 100% utilization
Pacific Research Platform (DTN Test)
Disk to Disk Transfers (actual files on disk)
UCSD à Caltech = Avg 36Gbps, Peaks of 40GbpsUCLA à Caltech = Avg 35Gbps, Peaks of 36GbpsUCSC à Caltech = Avg 10Gbps
Caltech– FDTServer=4xPCIe Gen3NVMEDrivesUCSD– FIONA=Zpool (16xSSDDrives,Intel530,120G)UCLA– FIONA=Zpool (16xSSDDrives,Intel530,120G)
Several issues (mostly Disk I/O related):NVME Drives:
• Zpool performance Tuning, poor performance compared to individual drives
• Software RAID, sync hangs, system crashes
Dual DTN Test – Disk performance
Peaks of5.7 GBytes/s
Reading from disk at both UCSD and UCLAThe disk pools at each of the - Zpool of 16 Intel 530 SSD drivesReceiver side at Caltech8 Intel D3700 NVME drives were used (4 each for UCSD and UCLA)Traffic peaked at 5.7GB/sec
At this rate with TCP, CPU cores were fully utilized.
Several issues (mostly Disk I/O related):NVME Drives:
• Zpool performance Tuning, poor performance compared to individual drives• Software RAID, sync hangs, system occasional hangs
THANK YOU!
QUESTIONS?
EXTRA SLIDES
SDN Multipath OpenDaylightDemonstrations at Supercomputing 2014
q 100 Gbit links between Brocade and Extreme switches at Caltech, iCAIR and Vanderbilt booths
q 40 Gbit links from many booth hosts to switches
q Single ODL/Multipath Controller operating in “reactive” mode
§ For matching packets: Controller writes flow rules into switches, with a variety of path selection strategies
§ Unmatched packets “punted” to Controller by switch
Demonstrated:• Successful, high speed, flow path calculation,
selection and writing• OF switch support from vendors• Resilience against changing net topologies [At
layer 1 or 2]• Monitoring and Control
SC14Demo:Caltech/iCAIR/VanderbiltOFlinks
Focused Technical Workshop Demo 2015:SDN-Driven Multipath Circuits
- Hardened OESS and OSCARS installations at Caltech, Umich, AmLight- Updated Dell switch firmware to operate stably with OpenFlow
A. MughalJ. Bezerra
- Dynamic circuit paths under SDN control - Prelude to the ANSE architecture: SDN load-balanced, moderated flows
Caltech, Michigan, FIU, ANSP and Rio, with Network Partners:Internet2, CENIC, Merit, FLR, AmLight, ANSP and Rio in Brazil