View
6
Download
0
Category
Preview:
Citation preview
Evolution of OpenStack Networking at CERN
Nova Network, Neutron and SDN
Belmiro Moreira @belmiromoreirabelmiro.moreira@cern.ch
Ricardo Rocha @ahcorportoricardo.rocha@cern.ch
Founded in 1954
What is 96% of the universe made of?
Fundamental Science
Why isn’t there anti-matter in the universe?
What was the state of matter just after the Big Bang?
6
7
CELL 1
TOP
CELL 2
CELL N
Compute
GPU
Compute
Nova Network
Neutron
Neutron CapabilitiesCPU PinningHuge Pages
SMPGPU
...
ConfigurationNeutron vs Nova Network
Allowed Projects...
Moving from CellsV1 to CellsV2 at CERN, Mon 21 11:35
Scalability & Flexibility
8
CELL
NODE 2
NODE 1
VN
V2
V1
V3
V2
V1
V3
Hypervisors Virtual Machines
● Order of ~10s of cells (currently 70), with ~200 hypervisors per cell● Number of virtual machines per hypervisor varies per use case
○ From 4 to 30 VMs per hypervisor
9
CELL
NODE 2
NODE 1
VN
V2
V1
V3
V2
V1
V3S513-V-IP123
137.1XX.43.0/24( Primary Service )
S513-V-VM908188.1XX.191.0/24
( Secondary Service )
Hypervisors Virtual Machines
● Flat but segmented network, with multiple broadcast domains○ Scalability○ Segmentation done on Primary Services
● Primary Services can have multiple Secondaries● No route if Secondary is in a different Primary
○ VM IP allocation must belong to the hypervisor’s Primary
10
LanDB
Source of Truth
● All devices must be present● Used for different purposes
○ Security checks○ DNS/DHCP Configuration○ Switch/router configuration○ Active Directory, …
Primary Services Secondary Services
HypervisorsVirtual Machines
IPv4 DNSIPv6 Aliases
Aliases IPv6 Readiness Ownership ...
11
Phase 1.Nova Network
Phase 2.Neutron
Phase 3.SDN
12
Phase 1. Nova Network
● Custom NetworkManager● Late IP allocation - after scheduling to compute nodes● Patching done directly in the Nova code
● Nova Network is being deprecated...○ Quantum is the new thing… Neutron is the new thing...
NOVA COMPUTE
LanDB
NOVA DB
13
Phase 2. Neutron
● Linuxbridge, Flat / Provider networks● Better integration using ML2, mechanism driver and extensions
○ Quickly became possible to have it out of tree○ Our extensions have a similar role to Neutron Segments
● Gradual enroll, cell by cell● Vanilla upstream packages for Neutron, much smaller patch on Nova● More split pieces, potential points of failure
○ Periodic consistency checks
NOVA COMPUTE Neutron1
23
4a
LanDB
4b
https://gitlab.cern.ch/cloud-infrastructure/openstack-neutron-cern
14
Phase 2. Neutron
Subnet Cluster
Which subnets belong to this cluster?
neutron cluster-list+--------+----------------------+-------------------------------------------------------+| id | name | subnets |+--------+----------------------+-------------------------------------------------------+| ... | VMPOOL SXXXX-C-IPZZZ | ... 188.xxx.yy.zz/22 || ... | VMPOOL SBBBB-C-IPWWW | ... 137.aaa.bb.ccc/25 || | | ... 137.bbb.cc.0/25 || | | ... 137.bbb.dd.0/25 |+--------+----------------------+-------------------------------------------------------+
15
Phase 2. Neutron
Host Restrictions
Which subnets can i use for this hypervisor?
neutron host p06253927y321a1+----------------------------+--------------------------------------+| Field | Value |+----------------------------+--------------------------------------+| all_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || available_random_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || available_subnets | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || least_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 || most_available_subnet | 4ca09148-32b5-4da4-95f9-35e83e2e1984 |+----------------------------+--------------------------------------+
16
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
3 Virtual Machines→5x 64GB Virtual Machines
~default rabbit configuration
~default neutron configuration
~looking ok(ish)
< 1000 Nodes
17
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Cluster crashes once, crashes constantlyCannot allocate 1318267840 bytes of memory (of type "heap").
Statistics db issues→collect_statistics_interval = 60000
Agents (too) aggressively trying to reconnect→rabbit_retry_backoff = 60
Agents not re-connecting properly→restart neutron servers
Scale up Rabbit nodes, larger VMs
1200 Nodes
18
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Cluster crashes periodicallyLots of queued messages, until it goes
( neutron server )→rpc_thread_pool_size = 2048→rpc_conn_pool_size = 60→rpc_response_timeout = 120→rpc_workers = 4
( rabbit )→tcp_backlog: 4096→tcp_listen_options { reuseaddr: true, keepalive, true }→tcp_keepalive = true→rabbitmq_server_erl_args = '+K +A128 +P 1048576'→vm_memory_high_watermark = 0.8→ulimits (65536 for nofile/nproc soft and hard)→cluster_partition_handling = autoheal
2000 Nodes
19
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Cluster crashes less, but still happensLots of queued messages, until it goes
( rabbit virtual machines )→ip link set %k txqueuelength 10000
( neutron agent )→report_interval=43200
( neutron server )→agent_downtime=86500
Other Considerations ( not done, not helpful )→increase rpc_state_report_workers→heartbeat timeouts on the rabbit cluster
2400 Nodes
20
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
Stable cluster→5x 64GB Virtual Machines
Ocasional network partitions→recovering most times, but not always→procedure for a quick cluster rebuild (~10min downtime)
~5000 Nodes
Stable cluster→5x 64GB Virtual Machines
Ocasional network partitions→recovering most times, but not always→procedure for a quick cluster rebuild (~10min downtime)
21
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
~5000 Nodes
22
Phase 2. Neutron
● Single control plane, no partitioning (as with Nova cells)
Scaling RabbitMQ wasis a challenge
~5000 Nodes
23
Phase 2. Neutron
Migrating existing cells from Nova Network
● Puppet for reconfiguration● Custom command for the live VM changes
$ openstack network cluster migrate --dry-run --host p06146676a327ab$ openstack network cluster migrate --host p06146676a327ab
$ openstack network cluster migrate --cluster ‘VMPOOL SXXXX-C-IPZZZ’
https://gitlab.cern.ch/cloud-infrastructure/python-neutronclient-cern
commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, raw_device), "ip link set %s down" % NOVA_BRIDGE, "ip link set %s name %s" % (NOVA_BRIDGE, CERN_NETWORK_BRIDGE), "brctl addif %s %s" % (CERN_NETWORK_BRIDGE, raw_device), "ip link set %s up" % CERN_NETWORK_BRIDGE, "ip route add default via %s dev %s" % (gw, CERN_NETWORK_BRIDGE),])
for instance in instances: ip = instance.addresses['CERN_NETWORK'][0] mac = ip['OS-EXT-IPS-MAC:mac_addr'] nova_tap = nova_interfaces[mac] neutron_tap = nova_interfaces[mac] commands.extend([ "brctl delif %s %s" % (NOVA_BRIDGE, nova_tap), "ip link set %s name %s" % (nova_tap, neutron_tap), ])
24
Phase 3. SDN
● Current network deployment has significant limitations
● Limited IP Mobility○ Segmented broadcast domains○ Live migration limited to single cluster○ Ad-hoc tunnels for hardware retirement campaigns
● Hardware Repurposing○ Multiple network domains (General, Services, …)○ Services dedicated to a single domain
● No Floating IPs
● No Tenant/Private Networks
25
Phase 3. SDN
● Small prototype setups to evaluate functionality
Neutron/OpenVSwitch OpenDaylight OVN
DHCP Neutron Neutron/Built-in Built-in
Floating IPs Yes Yes Yes
Distributed Routing Only with DVR Yes Yes
Tunneling Protocols vxlan / GRE / geneve vxlan / GRE / geneve vxlan / geneve
Security Groups IPTables OpenFlow Native OpenFlow Native + Logging
Load Balancing Octavia Octavia Octavia / OVN Native
Acceleration Limited DPDK DPDK DPDK
Tracing tcpdump tcpdump ovn-trace
Physical Switch Integr. L2 / L3 L2 / L3 L2 / L3
26
Phase 3. SDN
● In the end we picked OpenContrail / Tungsten
27
Phase 3. SDN
● In the end we picked OpenContrail / Tungsten
OPENSTACK
WANGATEWAY
XMPP
BGP
CONTROLLER
NETCONF/EVPNOVSDB
CONTROLLER
HYPERVISOR
HYPERVISORPHYSICAL
PHYSICAL
VROUTER
MPLSoUDP/GRE
VXLAN
28
Phase 3. SDN
● In the end we picked OpenContrail / Tungsten
OPENSTACK
WANGATEWAY
XMPP
BGP
CONTROLLER
NETCONF/EVPNOVSDB
CONTROLLER
HYPERVISOR
HYPERVISORPHYSICAL
PHYSICAL
VROUTER
MPLSoUDP/GRE
VXLAN
Cassandra, Config, Analytics, ...https://github.com/Juniper/contrail-helm-deployer
29
Phase 3. SDN
● In the end we picked OpenContrail / Tungsten
OPENSTACK
WANGATEWAY
XMPP
BGP
CONTROLLER
NETCONF/EVPNOVSDB
CONTROLLER
HYPERVISOR
HYPERVISORPHYSICAL
PHYSICAL
VROUTER
MPLSoUDP/GRE
VXLAN
Neutron ML2 vs Monolithic
Separate Region
● Scaling Neutron was not trivial, mostly due to the agents / rabbitmq○ Deployed in production and stable
● Currently finalizing the migration from Nova Network to Neutron
● Evaluated different SDN solutions
● Ongoing work deploying Tungsten in a new Region
● Looking forward to offer Floating IPs, Private Networks and much more
Summary
30
31
Questions?
Belmiro Moreirabelmiro.moreira@cern.ch
@belmiromoreira
Ricardo Rocharicardo.rocha@cern.ch
@ahcorporto
Recommended