24
Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014

L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

Neutron L3 Agent HAOr: How I Learned to Stop Worrying and Love the API

Kevin Bringard // OpenStack Juno Summit // May 2014

Page 2: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• There is no “one right way” • The goal is to move L3 resources to a new L2

resource as quickly and seamlessly as possible • This is a really difficult, but important, problem to

solve

Page 3: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

Layer 3Internet Happens

Page 4: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 5: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 6: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

Layer 2The ARPing is the hardest part

Page 7: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• One L3 resource may only be tied to one L2 resource at a time

• Many technologies exist to sort of work around this • HSRP • VRRP • CARP

• Work is being done to implement VRRP like functionality into Juno • https://blueprints.launchpad.net/neutron/+spec/l3-

high-availability • Nothing is currently integrated into OpenStack

Page 8: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

Pacemakerhttp://docs.openstack.org/high-availability-guide/content/_highly_available_neutron_l3_agent.html

Page 9: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• False positives — caused more downtime than actual outages

• Split brain possibilities • Assumes control of L3 agent start/stop functions • Limited Horizontal Scale

• More difficult to run multiple Active L3 agents • Failover requires entire services starts/stops

• Active/Passive Model Requires More Hardware • Works on a “per agent” level • Akin to RAID1

Page 10: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 11: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 12: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 13: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

Neutron HA Toolhttps://raw.githubusercontent.com/stackforge/cookbook-openstack-network/master/files/default/neutron-ha-tool.py

Page 14: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• API Driven • Uses native API calls to perform all functions • Can be run externally from infrastructure or cross

site • Supports any operations the neutron client

libraries supports • Easily Extendable

• Written in python • Leverages standard OpenStack libraries

• Works on a “per resource” level

Page 15: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 16: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent

router1 router2router3router4router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 17: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

L3 agent L3 agent

router1 router2

router3router4

router5

router6

VM1 VM3VM2 VM4 VM5 VM7VM6

Core Router

Page 18: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• Only routers/IPs on the affected L3 agent are impacted

• Recovery time depends on the number of routers which need to be migrated and the number of IPs on each router

• Migration happens quickly, but every IP on the routers must re-ARP to the upstream switch

• Meta-data proxies migrate with the routers

Page 19: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

OK, so what’s the catch?

Page 20: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• Not seamless • The ARP processes happen in parallel, but generally

take 60-90 seconds for all IPs to complete • Various *aaS offerings further complicate things

• Currently only accounts for “l3-agent” controlled services

• No coordination between HA tools • How do you HA the HA?

• Currently not daemonized, runs from cron • Add 60 seconds to total recovery time • Jitter protection adds additional total recovery time

• No mechanism by which to ensure resources actually come up/work

Page 21: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

What about DHCP?

Page 22: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• Multiple DHCP agents may be run Active/Active • DHCP agents per subnet may be specified in your

agent config file • Each agent requires an IP in the tenant’s subnet • DHCP is multi-cast

• All agents have the same lease file • The first one to reply binds to the VM

• Any DHCP agent may reply to a DNS request and resolve all known leases

• By default, each DHCP agent hands out a list of every agent as available resolvers

• HA tool has an option to replicate DHCP to all agents

Page 23: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

• VRRP Like functionality • Specify number of Active L3 agents per subnet • Leverage conntrackd/keepalived • Point of diminishing returns for HA tool? • The beauty of open source:

• There is no “one right way” • Think outside the box • Do cool things

Moving Forward

Page 24: L3 Agent HA - object-storage-ca-ymq-1.vexxhost.net · Neutron L3 Agent HA Or: How I Learned to Stop Worrying and Love the API Kevin Bringard // OpenStack Juno Summit // May 2014 •

Questions?