49
© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 Mark T. Voelker, Technical Leader @ Cisco OpenStack ATC/StackForge Puppet Core/Foundation Member #54 All Things Open 2014

Considerations for Operating an OpenStack Cloud

Embed Size (px)

Citation preview

Page 1: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1

Mark T. Voelker, Technical Leader @ Cisco

OpenStack ATC/StackForge Puppet Core/Foundation Member #54

All Things Open 2014

Page 2: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

@marktvoelker

• Tech Lead at Cisco, StackForge Puppet core developer, OS Foundation Member #54

• Fact: can be bribed with doughnuts

• Currently works in Cisco’s Cloud & Virtualization Group

• In copious (hah!) spare time: OpenStack solutions, Big Data, Massively Scalable Data Centers, Devops, making sawdust with extreme prejudice

Page 3: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3

• Tech lead, manager, software developer, architect

• Started in OpenStack in 2011 at the Diablo Design Summit

Page 4: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

The great thing about my job is that I get to have fun exploring a lot of new things…

Page 5: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

….and I get to help build a LOT of clouds.

Page 6: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

Today’s talk won’t be overly formal….

Page 7: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

…because I tend to get excited by this stuff.

Page 8: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

Page 9: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

Page 10: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

……then you know how to get to Day 1.

Now let’s talk about getting to Day 30…

Page 11: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD

• Packaging

• Automated test

• Monitoring

• Up/down alerting

• Trending data

• Logging and log search

Page 12: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

High

Availability?

Sounds

great--I’ll

take two!

Page 13: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

• Consider whether you want active/active or active/passive

• Setup and tooling differs a bit, but I generally like active/active

• Note that docs.openstack.org has an HA Guide

• A bit dated…patches welcome!

• Prioritize HA for the control plane

• That also means thinking about your database, network, and RPC bus

• Instance-level HA: there be dragons

• But yes, it’s being looked at

• Pets vs cattle

• Note: HA == more hardware

• Some components need at least 3 nodes

Page 14: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

• Stuff OpenStack needs to run: message brokers

• Check out RabbitMQ clustering and mirrored queues

• Check out Galera for MySQL/MariaDB

• I usually see Percona XtraDB

• Frontend with an HAProxy/Keepalived pair

Page 15: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

• Don’t do rabbit clustering

over a WAN

• Be aware of the SELECT…

FOR UPDATE issue

Page 16: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

• Long story short: Neutron and some parts of Nova invoke an SQL pattern known as “SELECT…FOR UPDATE” which Galeradoesn’t support due to issues with cross-node locking.

• Can cause deadlocks symptoms.

• Neutron/nova code being refactored to remove, but will likely not be done until at least Kilo.

• Meanwhile: use HAProxy to send writes to a single Galera node and you should be fine

• With the obvious scalability bottleneck

• More info here.

• Thank Jay Pipes & Peter Boros for

the find!

Page 17: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17

• Use Swift, Ceph, or other highly available storage to back Glance

• Pick a highly available storage backend for Cinder too

• Use Keepalived/HAProxy to front-end multiple API servers

• Or another load balancer technology of your choice

• Can be deployed as dedicated nodes for scale, or cohabitate

• Network: DVR vs Provider Network Extensions

• Distributed Virtual Routers are a new experimental feature in Juno (not yet ready for production)

• Please go test it and report/fix bugs!

• Provider networks essentially punt the availability issue to your physical network

• Allows you to use standard tools like virtual port channels and VRRP

• Also highly performant

Page 18: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 19: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19

We start with bare metal.

Page 20: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20

• For a cloud of any real size, you don’t want to be installing operating systems by hand

• Remember that baremetal bringup actually isn’t something that just happens once…often recurs for upgrades, capacity expansion, etc.

• Baremetal bringup tools can also have other uses, like inventory or bootstrapping configuration management agents.

Page 21: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21

• A simple (~15k lines of Python code) tool for managing baremetaldeployments

• Flexible usage (API, CLI, GUI)

• Allows you to define systems (actual machines) and profiles (what you want to do with them)

• Provides hooks for Puppet so you can then do further automation once the OS is up and running

• Provides control for power (via IPMI or other means), DHCP/PXE (for netbooting machines), and more.

Page 22: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22

Page 23: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23

Page 24: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

• Razor• Developed by EMC, managed by Puppet Labs (occasionally used with Chef

too)

• Initial release in 2012

• Uses a “microkernel” loaded onto the machine to gather facts before provisioning

• Tag + Policy model

• Crowbar• Originally written by Dell, now a community project

• Originally designed to deploy OpenStack on all the way from baremetal

• Now deploys other stuff too (namely, Hadoop)

• Uses Chef to handle everything after the OS install

• Foreman• Used by Red Hat among others

• Does baremetal bringup and serves as a Puppet ENC

Page 25: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 26: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

Page 27: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27

“Cloud isn’t just an infrastructure technology….it’s a new operations model. And with OpenStack in particular, it’s one that’s very well suited to a DevOps style of management. Many companies aren’t just adopting cloud, they’re changing how they operate.”

“Besides, logging into servers to mess with config files makes me sad.”

--That ranty guy in Raleigh again

Page 28: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28

• Remember, OpenStack is a set of interoperating distributed systems

• That means you’re going to have a lot of software to configure on a lot of machines

• You’re probably going to want to make changes over time

• You’re probably going to have more than one person touching your cloud

• CM tools help you treat configuration as code, so you can collaborate more easily

Page 29: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29

Pile of

Bash

Scripts

Page 30: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30

Page 31: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31

• An increasingly common pattern:

• Puppet or Chef for configuration management, PLUS

• Ansible or Salt for cross-node orchestration

• Recommendation: use the tools that work for you!

• But remember: you don’t have to do it alone.

• Several CM tools have thriving collaborators in the OpenStack community

• Links for later:

• Puppet for OpenStack

• Chef for OpenStack

• Ansible for OpenStack

• SaltStack for OpenStack

• Pile of bash scripts for OpenStack

Page 32: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32

• Unit tests for your deployment code are a good idea

• ServerSpec tests to make sure your config management system did what it was supposed to are great

Page 33: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 34: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34

…well, haven’t you always wanted a butler?

Page 35: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35

• DevOps: actually pretty handy

• OpenStack change velocity (community’s and yours)

• Anecdote: the majority of deployments I work with have some customizations or backports from future releases

• It’s not just OpenStack, it’s all the underpinning components and your CM code too!

Page 36: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36

• OpenStack itself uses CI/CD tools in it’s development process…you should consider using them in your cloud buildouttoo!

• The OpenStack Infra team has created some awesome tools: JJB, Zuul, etc

• They’re all open source and you can even see how OpenStack’s own CI is set up (check out Elizabeth Joseph’s slides from yesterday for more!).

• The basics:

• An integration server (Jenkins, Go, Travis, etc)

• A code review and repository tool (Gerrit, Cgit, GitHub, etc)

• A battery of automated tests (lint checks, rspec-puppet, Tempest, Rally, etc)

• Some form of packaging (rpmbuild/mock, sbuilder/pbuilder, etc)

• An artifact repository (Artifactory, yum/apt repos, etc)

• Optionally, some deployment jobs (usually powered by your CM tool)

Page 37: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37

• …you never intend to change the code yourself

• …building your own packages would violate a support contract with your distribution

• …you’ve never used a CI/CD pipeline before (but really: you should start learning)

• …you have a static environment that absolutely will not change, need to add capacity, etc.

Page 38: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD• Packaging

• Automated test

• Monitoring• Up/down alerting

• Trending data

• Logging and log search

Page 39: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39

• Now that you have a cloud, you’ll probably want to know that all it’s parts stay in good working order.

Page 40: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40

Page 41: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41

Page 42: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42

• I’ve worked on a lot of OpenStack clouds and almost everyone has their own preferred monitoring toolset.

• One possible exception: almost everybody seems to love Graphite.

• The golden rule is: use the tools that work for you!

• Very often this will be whatever you’re using in the rest of your infrastructure.

• Break it down into at least two buckets:

• Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are OpenStack plugins out there on NagiosExchange)

• Trending data collection/plotting (ex: collectd/statsd feeding graphite)

• Also: use your peers!

• Check out Tong Li’s Monitoring as a Service talk later today!

• Operators often willing to share, so ask on the openstack-operators list.

Page 43: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43

• Architecture

• Components

• High Availability

• Bare metal bring-up

• Config management

• CI/CD

• Packaging

• Automated test

• Monitoring

• Up/down alerting

• Trending data

• Logging and log search

Page 44: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44

Page 45: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45

• Distributed systems generate logs…..all over the place.

• Finding the root of problems may mean correlating logs from different machines…but which?

• OpenStack in particular *can* be pretty verbose

• You may also be dealing with logs from other distributed tools in your cloud (RabbitMQ, databases, etc)

• Generally you want to get logs together, be able to search them, and be able to visualize them.

Page 46: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46

Unlike monitoring tools, there seems to be pretty broad consensus on good tools here in deployments I’ve worked with….

Page 47: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 47

http://www.elasticsearch.org/blog/openstack-elastic-recheck-powered-elk-stack/

(visualization)

(collection)

(search/analytics)

Page 48: Considerations for Operating an OpenStack Cloud

© 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 48

Questions?@marktvoelker

http://openstack.org/

http://cisco.com/go/openstack/

(yes, we’re hiring!)

Page 49: Considerations for Operating an OpenStack Cloud