20
Sirish Raghuram Co-founder, CEO Platform9 7 OpenStack Best Practices Private Clouds Made Easy Roopak Parikh Co-founder, VP Engineering Platform9

Webinar: OpenStack Best Practices for Production

Embed Size (px)

Citation preview

Page 1: Webinar: OpenStack Best Practices for Production

Sirish Raghuram Co-founder, CEO

Platform9

7 OpenStack Best Practices

Private Clouds Made Easy

Roopak Parikh Co-founder, VP Engineering

Platform9

Page 2: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production

Speaker Bio

2

Sirish Raghuram

• Co-founder, CEO at Platform9

• Previously: Staff Engineer at VMware (12 years)

• Technical and Management responsibility for multiple VMware products

Roopak Parikh

• Co-founder, VP Engineering at Platform9

• Previously: Staff Engineer at VMware (7 years)

• Architect for multiple VMware products

Page 3: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Best practices from managing 50+ active OpenStack deployments

• Recommended for technical audience looking to use OpenStack in production

• Assumes fair knowledge of OpenStack

Preamble

3

Page 4: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production

OpenStack Architecture

4

Clarity UI

Nova !!

Cin

de

r

Scheduler

Keystone (Identity)

CLI / Tools Scripts Heat (Orchestration)

Ne

utr

on

Gla

nce

(Im

age

s)

Basic Storage

Compute

Basic Network

BlockStorage

NetworkController

Page 5: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production

Platform9 Managed OpenStack:

• Your servers host your data

• Platform9 hosts the OpenStack controller as a Service, with an SLA

• No need to install, monitor, troubleshoot or upgrade OpenStack

Platform9 Managed OpenStack

5

Page 6: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Controller API logs

• Nginx or Apache

• Controller services

• /var/log/nova/*, /var/log/glance/*, /var/log/keystone…

• Rabbit/MQ

• /var/log/rabbitmq

• Controller system health

• CPU, Memory, Disk, N/W

• File Descriptors

• Sockets

• Compute node logs (occasionally)

• nova, glance, other services

• Rarely, libvirt

#1 — Instrument & Monitor

6

Page 7: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

Platform9 Log Telemetry

7

raw log

raw log

raw log

raw log

… Pre-process(filter)

log storage, archival and

search

Alert filters

alertmechanism

Alerts

Page 8: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• 100% automation is key

• Alerts can be very noisy

• Future:

• Sentry / Rollbar / to easily discern problem areas by severity and priority

• Migrate from papertrail to E-L-K?

Takeaways

8

Page 9: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Common points of failure

• OpenStack Controller

• Database

• Python applications (Keystone, Nova, Glance, et al)

• Rabbit-mq

• Compute Nodes

• Agent software uptime

#2 — High Availability Configuration

9

Page 10: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

Platform9 HA Architecture

10

Compute NodeCompute NodeCompute NodeCompute Node …

Internet

OpenStack Controller

OpenStack Controller

OpenStack Controller

UI

VirtualIP

Load Bala-ncer

Intranet

ReplicatedDB

Page 11: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• SLA —> must recover quickly from losing Controller

• Backup Controller DB

• Backup Controller State

• Automated recipe to restore from backup

• Test restore recipe

#3 — Backup / Restore

11

Page 12: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Automated mechanism to rollout

• Controller upgrade

• Compute node agent upgrade

• Plan for testing upgrade before committing

• Roll-back if required

#4 — Upgrade / Patch Rollout

12

Page 13: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. @Platform9Sys

Platform9 Orchestration

13

Vanilla OS

customer state

Template Image V1

Customer Server V1

Fresh Install

Upgrade

Vanilla OS Template Image V2

Customer Server V2

Page 14: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

Platform9: Havana to Juno Upgrade

14

Page 15: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Segregate underlying infrastructure for different classes of workloads (or users!)

• By workload, hardware type, geography or organization

• Illustrations:

• Test/Dev vs Production

• Tier 1 vs Tier 2

• SSD vs HDD

#5 — Workload Tiering

15

Page 16: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

Intelligent Placement

16

DevOps

Tier-2Infra

Tier-1Infra

Private Cloud

Tier-2Tier-1

Page 17: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• OpenStack controller and compute node software communicate over message queues

• Reliable message delivery is critical to OpenStack

• Issue

• Once in ~2-5000 API requests, compute node or controller node can lose connection to queue

• Result: messages stuck in queue and never delivered

• Result: operations can stall, seemingly at random

• Resolution

• oslo messaging heart-beating applied Jan 2015

• Ref: https://github.com/openstack/oslo.messaging/commit/b9e134d7e955b9180482d2f7c8844501c750adf6

• Disabled in April: https://github.com/openstack/oslo.messaging/commit/287a4f56f45ed9cd40116a9e7b6e529f3382a925

• Platform9 has a Platform9 specific heart-beat mechanism, leverages Platform9 web socket architecture

#6 — Hardened Messaging Libs

17

Page 18: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Issue #6 is an example of an issue you will run into

• Be prepared to

• Debug / diagnose

• It took us ~7 man days to debug issue #6 (worst case example)

• Roll out a patch

• Techniques

• Separate webinar topic!

#7 — Troubleshooting / Debugging

18

Page 19: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Reviewed 7 best practices to running OpenStack successfully

• Share your own tips — share via GTM chat panel!

Recap

19

Page 20: Webinar: OpenStack Best Practices for Production

© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?

• Production grade OpenStack without the hard work

• Request your own Platform9 account

• Related resources

• OpenStack benefits for KVM / VMware — recorded webinars

• Upcoming webinar: Jun 7, 2015

• Have questions?

• Ask away!

• Get in touch:

• @Platform9Sys

[email protected]

Summary

20