Cinder Live Migration and Replication - OpenStack Summit Austin

Preview:

Citation preview

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---1

WE’VE GOT ALL YOUR

OPENSTACKSTORAGE COVERED.

WE’VE GOT ALL YOUR

OPENSTACKSTORAGE COVERED.

Ed Balduf - ed.balduf@solidfire.com, @madskier5Alex Meade - alex.meade@netapp.com, @mralexmeade

Cinder: How Stuff worksLive Migration and Replication

Live Migration

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---3

▪Block Migration▪Disk must be copied between Compute nodes

▪Shared Storage▪Compute nodes share instance storage

▪Volume-based▪Instance information is stored on Cinder backend

Guest OS on VM has no indication it changed compute nodes

False documentation

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---4

Live Migration and Storage Compatibility

Migration Type Local Storage Cinder Volumes Shared Storage

Block Migration

Live Migration

BM w/ RO devices

LM w/ RO devices

The Config Drive

▪2 ways to inject configuration information into a VM▪MetaData service▪Config Drive

▪The Config Drive is a R/O storage device▪See previous slide▪Nova force_config_drive option may be used to force a config drive▪Do not use this option. ▪Or use shared storage for the config drive

▪Users can specify one if they want

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---6

Live Migration Flow with Block storage

1. Pre-Migration▪Check Memory, CPU and Disk resources2. Reservation▪Mount Disks as needed ▪Calls Cinder initalize_connection() again.

3. Pre-Copy4. Stop and Copy5. Commitment6. Clean-up▪Unmount disks as necessary.

See the great presentation from Vancouver: https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/dive-into-vm-live-migration

Nova & Cinder

Hypervisor

Pre-Migration

Storage

Compute A

VM A[Running]

Compute B

Storage Protocol

Reservation

Storage

Compute A

VM A[Running]

Compute B

VM A[Reserved]

Pre-Copy

Storage

Compute A

VM A[Running]

Compute B

VM A[Paused]

Copy Memory

Stop and Copy

Storage

Compute A

VM A[Paused]

Compute B

VM A[Paused]

Copy Dirty Memory and CPU state

**NOTE** Max time in this phase is equal to the live_migration_downtime Nova config option. Which defaults to 500 milliseconds.

Commitment

Storage

Compute A

Compute B

VM A[Running]

Clean UP

Storage

Compute A

Compute B

VM A[Running]

Demo

Gotchas

▪Error reporting is non-existent▪If you have authentication wrong or firewall doesn’t allow libvirt port then it silently fails.▪Mitaka is better about doing upfront storage checks in API.

▪Cinder User Messages (coming in Newton)▪Ex: cinder message-show 07ce25a6-3af4-4f05-9169-bf540eea9e22

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---15

+------------------+--------------------------------------------------------+| Property | Value |+------------------+--------------------------------------------------------+| created_at | 2016-04-13T21:21:50.000000 || event_id | MULTIPLE_ATTACHMENT_ERROR || guaranteed_until | 2016-05-13T21:21:50.000000 || id | 07ce25a6-3af4-4f05-9169-bf540eea9e22 || message_level | ERROR || request_id | req-03110a48-3769-419b-b40b-e200ddf2c378 || resource_type | VOLUME || resource_uuid | 450a62fd-f809-4226-96a2-75593a4ad558 || user_message | Could not map target LUN to multiple initiators. |+------------------+--------------------------------------------------------+

Live Migration Resources

▪Live Migration Configuration▪Current Openstack Documentation is now fantastic at describing this:▪http://docs.openstack.org/admin-guide/compute-configuring-migrations.html

▪Blogs:▪Remy van Elst - Kilo Release - 6/13/2015▪https://raymii.org/s/articles/Openstack_-_(Manually)_migrating_(KVM)_Nova_Compute_Virtual_Machines.html#

Configure_(live)_migration

▪John Griffith - Juno Release - 12/8/2014▪http://j-griffith.github.io/2014/12/08/openstack-live-migration-with-cinder-backed-instances/

▪Kimi Zhang - Grizzly Release - 8/26/1013▪https://kimizhang.wordpress.com/2013/08/26/openstack-vm-live-migration/

▪Sébastien Han - Essex Release - 7/12/2012▪http://www.sebastien-han.fr/blog/2012/07/12/openstack-block-migration/

▪Video:▪https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/dive-into-vm-live-migrati

on

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---16

Replication in Cinder.

Why are we up here again?Replication in the cloud with Multiple vendor backends is HARD!We’re on design #4

Early designs - Vendor centric. No knowledge in the cloud or applications.Official V1 - Juno. IBM only.Official V2 - Liberty. No drivers released.Official V2.1 (aka Cheesecake) - Mitaka mid-cycle

Game plan for CheesecakeSimplified use case:

Disaster Recovery only. Admin disaster recovery only.Fail everything which is replicated to the DR site.Non-replicated volumes are ‘Offline’

Before Cinder learned about replication

Vendor specific volume type extra specs - indication of replication state of the backend

Examples:▪SolidFire example from Essex (sf:replication:all-of-the-replication-infos)▪mvip: IPaddr, api_port: portNum, login: loginToMvip, password: secretPassword▪not in tree: https://github.com/j-griffith/nova/blob/essex-sf-replication/nova/volume/san.py

▪NetApp example (netapp_mirrored)

OpenStack is completely unaware. If failover occurs, the admin must re-configure OpenStack.

18

Use Case for Cheesecake!

Straight forward DRNon-automated failover of replicated volumes.

When Disaster declared…. API for Cloud Administrator to call to cause failover.

DR storage system is not seen or managed in OpenStack until failoverNon-replicated volumes are “Offline”

There is no split decision. DR Storage unit becomes the backend.

No failback (your primary is on fire remember!)No concept of a managed secondary!

Terms this time around

Fail-overSwitch over to the secondary array.

Volumes which are replicated will be there.Volumes not replicated will not be available.Attached volumes will need to be re-attached manually.

Freeze Do not allow any resource create/delete actions

snapshot-create, xxx-delete, resize, retype etc should return an InvalidCommand errorI/O is still allowed, but this is an admin freezeThe idea is to keep thing stable for recovery (if possible)

UnfreezeAllow resource create/delete commands.

Old Terms (no longer used)

Terms:promoteReenableenabled/disabled

Status:disabledinactiveactiveactive-stoppederror

Tasks (Admin)replication enablereplication disablereplication failoverlist replication targets

How it works and what it does:

Driver must report: replication_enabled = True

In it’s capabilities.

[solidfire-1]volume_driver = cinder.volume.drivers.solidfire.SolidFireDriversan_ip = 172.27.1.50san_login = adminsan_password = solidfirevolume_backend_name = solidfiresf_account_prefix = balduf-masterreplication_device = backend_id:172.27.1.191,mvip:172.27.1.191,login:admin,password:admin

(Note: No trailing comma allowed in replication_device KV pair list)

Volume extra specs

Keywords:

replication : enabled/disabled

All others are vendor specific:

Example: HP type: sync/periodic

Drivers Supporting Replication

Available in Mitaka:SolidFire (out of tree)DellEMC HPHuaweiStorewizeIBMPure

In process, coming in Newton:NetApp Data ONTAP & E-series.

Fail-back or lack thereof▪If there really is a disaster and ‘A’ is burned to a crisp, there is no fail-back!▪But how do we make ‘B’ the new master?▪And someday buy ‘C’ and replicate to it?

▪Fix the database

$ mysql -u rootMariaDB [(none)]> use cinderMariaDB [cinder]> select id,host,disabled,disabled_reason,replication_status,frozen,active_backend_id from services;+----+----------------------------------------------+----------+-----------------+--------------------+--------+-------------------+| id | host | disabled | disabled_reason | replication_status | frozen | active_backend_id |+----+----------------------------------------------+----------+-----------------+--------------------+--------+-------------------+| 1 | devstack-master.pm.solidfire.net | 0 | NULL | not-capable | 0 | NULL || 2 | devstack-master.pm.solidfire.net | 0 | NULL | not-capable | 0 | NULL || 3 | devstack-master.pm.solidfire.net@solidfire-1 | 1 | NULL | failed-over | 0 | 172.27.50.191 || 4 | devstack-master.pm.solidfire.net@lvmdriver-1 | 0 | NULL | disabled | 0 | NULL |+----+----------------------------------------------+----------+-----------------+--------------------+--------+-------------------+4 rows in set (0.00 sec)

MariaDB [cinder]> update services set disabled=0,disabled_reason=NULL,replication_status='disabled',active_backend_id=NULL where id=3;

▪Goto Page #1

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---25

Demo

Tiramisu (next) Newton

Design cycle here in Austin.‘Goal’ is some control by the tenant

What if the tenant doesn’t want to wait for Admin?What if the tenant has a disaster somewhere else in their application.

‘Goal’ to deal with vendor/tenant grouping constructs for replicationMay become a separate effort

© 2016 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---28

Recommended