Disaster recovery of OpenStack Cinder using DRBD

Disaster recovery of OpenStack Cinder using DRBD

1


By

Viswesuwara Nathan ([email protected]) & Bharath Krishna M ([email protected])

On May 25 2015

mailto:([email protected])

mailto:([email protected])


2


3

Contents Introduction ............................................................................................................................................ 4

DRBD (Distributed Replicated Block Devices) ........................................................................................... 4

The DRBD operation ............................................................................................................................ 4

Openstack Cinder .................................................................................................................................... 6

Configuring OpenStack Disaster recovery using DRBD in Ubuntu 14.04 LTS.............................................. 6

Configuration ................................................................................................................................. 7

LVM over DRBD Configuration ....................................................................................................... 10

OpenStack Cinder Configuration .................................................................................................... 12

Attaching exiting logical volume in OpenStack target node ............................................................ 14


4


Introduction Storage plays an important part in a cloud environment. We want it to be fast, to be network-accessible and to be as reliable as possible. One way is to buy a SAN solution from a prominent vendor for solid money. Another way is to take commodity hardware and use open source magic to turn it into distributed network storage. Guess what we did?

We have several primary goals ahead. First, our storage has to be reliable. We want to survive both minor and major hardware crashes – from HDD failure to host power loss. Second, it must be flexible enough to slice it fast and easily and resize slices as we like. Third, we will manage and mount our storage from cloud nodes over the network. And, last but not the least, we want decent performance from it.

DRBD (Distributed Replicated Block Devices) For now, we have decided on the DRBD driver for our storage. DRBD® refers to block devices designed as a building block to form high availability (HA) clusters. This is done by mirroring a whole block device via an assigned network. DRBD can be understood as network-based RAID-1. It has lots of features, has been tested and is reasonably stable.

The DRBD has been supported by the Linux kernel since version 2.6.33. It is implemented as a kernel module and included in the mainline. We can install the DRBD driver and command line interface tools using a standard package distribution mechanism;

The DRBD software is free software released under the terms of the GNU General Public License version 2.

The DRBD operation

Now, let's look at the basic operation of the DRBD. Figure 1 provides an overview of DRBD in the context of two independent servers that provide independent storage resources. One of the servers is commonly defined as the primary and the other secondary. Users access the DRBD block devices as a traditional local block device or as a storage area network or network-attached storage solution. The DRBD software provides synchronization between the primary and secondary servers for user-based Read and Write operations as well as other synchronization operations.


5

Figure 1: Basic DRBD model of operation

In the active/passive model, the primary node is used for Read and Write operations for all users. The secondary node is promoted to primary if the clustering solution detects that the primary node is down. Write operations occur through the primary node and are performed to the local storage and secondary storage simultaneously (see Figure 2). DRBD supports two modes for Write operations called fully synchronous and asynchronous.

In fully synchronous mode, Write operations must be safely in both nodes' storage before the Write transaction is acknowledged to the writer. In asynchronous mode, the Write transaction is acknowledged after the write data are stored on the local node's storage; the replication of the data to the peer node occurs in the background. Asynchronous mode is less safe, because a window exists for a failure to occur before data is replicated, but it is faster than fully synchronous mode, which is the safest mode for data protection. Although fully synchronous mode is recommended, asynchronous mode is useful in situations where replication occurs over longer distances (such as over the wide area network for geographic disaster recovery scenarios). Read operations are performed using local storage (unless the local disk has failed, at which point the secondary storage is accessed through the secondary node).


6

Figure 2. Read/Write operations with DRBD

DRBD can also support the active/active model, such that Read and Write operations can occur at both servers simultaneously in what's called the shared-disk mode. This mode relies on a shared-disk file system, such as the Global File System (GFS) or the Oracle Cluster File System version 2 (OCFS2), which includes distributed lock-management capabilities.

Openstack Cinder

Cinder is a Block Storage Service for OpenStack. The logical volumes that were created in the production node (SAN, NAS, Local hard disk) block storage should be synced with the target node for OpenStack IaaS storage disaster recovery. The DRBD solution helps in sync the storage node blocks created in the production node to the target storage node;

The logical volumes created on production side block storage node must be synced in target storage node for effective management of RTO and RPO during the disaster recovery process.

From, OpenStack Juno release, the Cinder provides the command line option of managing (mount/ umount) the existing volumes created on the target node and from Kilo the option is extended to dashboard (OpenStack Horizon) too.

Configuring OpenStack Disaster recovery using DRBD in Ubuntu 14.04 LTS

For the purpose of this white paper, I’ll name production and target storage node as drbd-1 and drbd-2; storage nodes to be able to resolve each other hostnames, so we


7

should either have DNS or enter hostnames to /etc/hosts manually. Since drbd can start before dhcp client gets an IP, you should set up both servers with static IPs.

hostname IP address partition for drbd drbd-1 192.168.0.1 /dev/sdb3 drbd-2 192.168.0.2 /dev/sdb3

/etc/hosts:

127.0.0.1 localhost 192.168.0.1 drbd-1 192.168.0.2 drbd-2

Figure 3, show the Gparted utility screen showing /dev/sda3 in partitioned to 55.82 GiB and it is not formatted; This size should be similar in drbd-1 and drbd-2 storage node.

Figure 3: Gparted utility showing /dev/sda

From this point you should do everything as root (sudo -i).

Next, install drbd8-utils package.

Configuration drbd needs resource file - /etc/drbd.d/r0.res . This file should be identical on both servers. This file should look like this: If the file doesn’t exit, then I would recommend creating them on primary and target storage node.

resource r0 {

device /dev/drbd0;

disk /dev/sda3;

meta-disk internal;

net {

allow-two-primaries;


8

after-sb-0pri discard-zero-changes;

after-sb-1pri discard-secondary;

after-sb-2pri disconnect;

sndbuf-size 0;

}

startup {

become-primary-on both;

}

on drbd-1 {

address 192.168.0.1:7789;

}

on drbd-2 {

address 192.168.0.2:7789;

}

Configuration Walkthrough We are creating a relatively simple configuration: one DRBD resource shared between two nodes. On each node, the back-end for the resource is device /dev/sda3. The hosts are connected back-to-back by Ethernet interfaces with private addresses

resource r0 {

device /dev/drbd0;

disk /dev/sda3;

meta-disk internal;

on drbd-1 {

address 192.168.0.1:7789;

}

on drbd-2 {

address 192.168.0.2:7789;

}

}


9

As we need write access to the resource on both nodes, we must make it ‘primary’ on both nodes. A DRBD device in the primary role can be used unrestrictedly for read and write operations. This mode is called ‘dual-primary’ mode. Dual-primary mode requires additional configuration. In the ‘startup’ section directive, ‘become-primary-on’ is set to ‘both’. In the ‘net’ section, the following is recommended:

net {

allow-two-primaries;

after-sb-0pri discard-zero-changes;

after-sb-1pri discard-secondary;

after-sb-2pri disconnect;

sndbuf-size 0;

}

The ‘allow-two-primaries‘ directive allows both ends to send data. Next, three parameters define I/O error handling. The ‘sndbuf-size‘ is set to 0 to allow dynamic adjustment of the TCP buffer size.

Enabling DRBD service

To create the device /dev/drbd0 for later use, we use the drbdadm command:

# drbdadm create-md r0

We need to run this command on both the storage nodes. The /dev/sda3 should not be formatted to any filessystem at the time of running this command.

Next, on both nodes, start the drbd daemon.

# sudo service drbd start

On drbd-1, the primary storage node, enter the following:

# sudo drbdadm -- --overwrite-data-of-peer primary all

After executing the above command, the data will start syncing with the secondary host. To watch the progress, on drbd02 enter the following:

# watch -n1 cat /proc/drbd

To stop watching the output press Ctrl+c.


10

With this we are done with DRBD configuration. From now onward any changes made on the primary storage node will get synced to the target storage node.

LVM over DRBD Configuration Let us now start with creating volume group on the primary storage node to be made available to the OpenStack Cinder.

Configuration of LVM over DRBD requires changes to /etc/lvm/lvm.conf. First, physical volume is created:

# pvcreate /dev/drbd0 nova

This command writes LVM Physical Volume data on the drbd0 device and also on the underlying sda3 device. This can pose a problem as LVM default behavior is to scan all block devices for the LVM PV signatures. This means two devices with the same UUID will be detected and an error issued. This can be avoided by excluding /mnt/md3 from scanning in the /etc/lvm/lvm.conf file by using the ‘filter’ parameter:

# filter = [ "r/md3/", "a/drbd.*/", "a/md.*/" ]

The vgscan command must be executed after the file is changed. It forces LVM to discard its configuration cache and re-scan the devices for PV signatures. Different ‘filter’ configurations can be used, but it must ensure that: 1. DRBD devices used as PVs are accepted (included); 2. Corresponding lower-level devices are rejected (excluded).

It is also necessary to disable the LVM write cache:

# write_cache_state = 0

These steps must be repeated on the peer node. Now we can create a Volume Group using the configured PV /dev/drbd0 and Logical Volume in this VG. Execute these commands on one of nodes:

# vgcreate nova /dev/drbd0

To validate the configuration, you can run the following commands on the primary storage node.

# root@drbd-1:~# vgdisplay --- Volume group --- VG Name nova System ID Format lvm2


11

Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 55.82 GiB PE Size 4.00 MiB Total PE 14289 Alloc PE / Size 5120 / 20.00 GiB Free PE / Size 9169 / 35.82 GiB VG UUID uqdzuK-NiRa-EGkr-GNQb-MsQp-7e5E-CG9jQa

# root@drbd-1:~# pvdisplay --- Physical volume --- PV Name /dev/drbd0 VG Name nova PV Size 55.82 GiB / not usable 1.22 MiB Allocatable yes PE Size 4.00 MiB Total PE 14289 Free PE 9169 Allocated PE 5120 PV UUID N17uup-90oa-1WqO-dfDy-hm1K-z2ed-BT655B

The same gets replicated in the target storage node automatically and we can confirm it by running the following commands in target storage node – drbd2.

# root@drbd-2:~# vgdisplay --- Volume group --- VG Name nova System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 3 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 55.82 GiB PE Size 4.00 MiB Total PE 14289 Alloc PE / Size 5120 / 20.00 GiB Free PE / Size 9169 / 35.82 GiB VG UUID uqdzuK-NiRa-EGkr-GNQb-MsQp-7e5E-CG9jQa


12

# root@drbd-2:~# pvdisplay --- Physical volume --- PV Name /dev/drbd0 VG Name nova PV Size 55.82 GiB / not usable 1.22 MiB Allocatable yes PE Size 4.00 MiB Total PE 14289 Free PE 9169 Allocated PE 5120 PV UUID N17uup-90oa-1WqO-dfDy-hm1K-z2ed-BT655B

OpenStack Cinder Configuration Openstack cinder configuration file /etc/cinder/cinder.conf needs to be modified in both Openstack nodes (production and target) to update the new volume group that was created in /dev/sda3. We added new volume backend named lvmdriver-2. lvmdirver-1 shown below is created by default as part of the devstack installation of Openstack Cinder.

default_volume_type = lvmdriver-2

enabled_backends=lvmdriver-1, lvmdriver-2

[lvmdriver-2]

iscsi_helper = tgtadm

volume_group = nova

volume_driver = cinder.volume.drivers.lvm.LVMVolumeDriver

volume_backend_name = lvmdriver-2

On updating /etc/cinder/cinder.conf file we need to restart following cinder services

c-vol, c-api and c-sch

to enable lvmdriver-2 volume backend to OpenStack Cinder.

The changes made in cinder.conf file require updating in Cinder volume of Openstack dashboard. We need to create a new volume as admin user specifying backend key as “volume_backend_name” and key as “lvmdriver-2”


13

Now, on the production node, we can create a logical volume on volume backend lvmdriver-2 using Openstack dashboard. Any number of logical volumes that were created in lvmdriver-2 backend will go to /dev/sda3.

The logical volume created can be attached to the powered on virtual machine instance.

NOTE: To make use of the logical volumes that were attached to the virtual machine instance, we need to create a filesystem before mounting it.

To view the logical volumes that were created in storage node, we can run the following command on the storage node

root@drbd-1:~# lvdisplay --- Logical volume --- LV Path /dev/nova/volume-253adf38-2713-4097-a24a-8564099548c3 LV Name volume-253adf38-2713-4097-a24a-8564099548c3 VG Name nova LV UUID KqNRCI-f3GW-CAyH-37wD-vOF6-ChFW-6liLsh LV Write Access read/write LV Creation host, time drbd-1, 2015-05-15 15:26:20 +0530 LV Status available # open 0 LV Size 20.00 GiB Current LE 5120 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0

The logical volume created in the primary storage node will be created and synced in target storage node too with the help of DRBD.

root@drbd-2:~# lvdisplay --- Logical volume --- LV Path /dev/nova/volume-253adf38-2713-4097-a24a-8564099548c3 LV Name volume-253adf38-2713-4097-a24a-8564099548c3 VG Name nova LV UUID KqNRCI-f3GW-CAyH-37wD-vOF6-ChFW-6liLsh LV Write Access read/write LV Creation host, time drbd-1, 2015-05-15 15:26:20 +0530 LV Status available # open 0 LV Size 20.00 GiB Current LE 5120 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 252:0


14

Attaching exiting logical volume in OpenStack target node

To make use of the VG and LV on the target node, we must make it active on it:

# vgchange -a y nova

1 logical volume(s) in volume group "nova" now active

And to make it available for Openstack, we must create backend driver as we did in primary Openstack setup.

To add the exiting volume in OpenStack Cinder that is available on the target storage node, we can either make use of CLI option “cinder manage” available from Juno release version 1.1.1 of Cinder or dashboard option (Horizion) available from Kilo release.

On OpenStack dashboard, use “Manage volume” to add the existing logical volumes as specified in figure 4. Make use of “lvdisplay” CLI command to get more details about the logical volumes to fill this form.


15

Figure 4: OpenStack dashboard for Cinder Manage volumes

On adding the exiting volume using an OpenStack Cinder service, we can attach the logical volumes to the virtual machines running on the target node.

Documents

Disaster recovery of OpenStack Cinder using DRBD