Cinder Cheesecake & iSCSI Secondary Devices

By Chad Morgenstern and Jenny Yang

Overview

In the blog post Cinder Cheesecake: Things to Consider, we wrote that we would be looking at various failure scenarios. Scenarios that build on one another to give a clear view of what happening in the environment.  This blog focuses on scenario 1, namely:

  • Your environment encounters an issue where the storage platform has failed yet the Nova compute node has survived.  In this scenario, the failure does not affect the root volume. In our studies, we chose to use a persistent root volume, though the scenario is no different with an ephemeral root.

Now we show you how to perform failover and failback when using iSCSI Cinder backends. Specifically,  this scenario entails recovery by Cinder failover-host following the loss of an iSCSI backend while the root volumes remain accessible. More blogs will be coming soon covering other disaster recovery scenarios involving Cheesecake, the subject of this sub-series.

If this is the first blog post that you are reading in the series, please STOP and review the blog post Cinder Cheesecake: Things to Consider before proceeding forward.  All technologies have strengths and weaknesses to be considered before implementing; Cheesecake is no exception.

Before diving in, here is a summary of the steps detailed in the main body of the blog.  Consider it a check list of sorts for when you do this yourself.  Please don’t be intimidated by the density of this blog. The process is simple:

Configure backends and get replication working from production (site A).

  1. Configure the production and disaster recovery (or site A and site B) backends in the cinder.conf file.
  2. Restart Cinder volume services: systemctl restart openstack-cinder-volume.services.
  3. Time passes, users create Cinder volumes, work happens…
  4. Trigger Cinder failover: cinder failover-host --backend_id <backend_name> <host@backend>.
  5. Stop the impacted Nova instances, detach, and reattach Cinder block devices, start Nova instances.
    Note: See the Python script found here. It is designed to detach and reattach Cinder block devices while preserving attachment order. The script is unsupported, so please feel free to modify it. Use is, of course, at your own risk.
  6. Re-enable the Cinder backend if you plan to add additional Cinder volumes while failed over.

Make the disaster recovery site (site B) the primary site and replicate back to site A.

  1. Shutdown Cinder services: systemctl stop openstack-cinder*.
  2. Capture a database backup.
  3. Modify some rows in volumes and cinder.services tables.
  4. Modify the enable_backends stanza in /etc/cinder.conf to make the disaster recovery backend active instead of the production backend.
  5. Start up all Cinder services: systemctl start openstack-cinder[api|volume|scheduler].service.

Make production (site A) the primary site again and replicate back to site B.

  1. Shut down all Nova instances that are using “to be failed back” Cinder volumes.
  2. Manually perform a final SnapMirror update.
  3. Shut down Cinder services: systemctl stop openstack-cinder*.
  4. Capture a database backup.
  5. Modify some rows in volumes and cinder.services tables.
  6. Modify the enable_backends stanza in /etc/cinder.conf to make the production backend active instead of the disaster recovery backend.
  7. Start up all Cinder services: systemctl start openstack-cinder[api|volume|scheduler].service.
  8. Detach, and reattach, Cinder block devices, start Nova instances.
    Note: See the Python script attached to this blog, it is designed to detach and reattach Cinder block devices while preserving attachment order. The script is unsupported, so please feel free to modify it. Use is, of course, at your own risk.

Getting production ready: preparing for DR

1) Modify the /etc/cinder/cinder.conf to enable replication

Our test environment includes both iSCSI and NFS backends. We setup an NFS backend to store the root device for the Nova instances, as Cheesecake is being evaluated in this blog against the iSCSI backends only.  The NFS backend is known by the name loweropenstack1.  All references to backends from this point forward are to the iSCSI backends only.

Please notice in the below /etc/cinder/cinder.conf that replication is setup in each iSCSI backend stanza.  This is to simplify the replicate -> failover -> replicate -> failback process. Note that replication only occurs for the enabled backends.

To keep things simple, we:

  • Only activated one set of backends at a time, either disaster recovery or production.
  • Used the cluster admin user. You may use SVM users.  Please make sure to familiarize yourself with the for your OpenStack release to ensure that the SVM user is defined with the appropriate level of permissions.
  • Established the cluster peer relationship in advance between the two ONTAP clusters:
  • Established the SVM peer relationships in advance for each set of backends:

2) Enable the Cinder backend and check that the backend is initialized

Please take note, it is possible for the volume services to have a status of ‘up’ even if the backend failed to properly initialize.  While It is important to view the services via cinder service-list --withreplication, please do not forget to inspect the /var/log/cinder/volume.log file for initialization errors as well.

3) Ensure that SnapMirror has initialized and is in a healthy state

4) Remove resource limitation for the demo project

Modify the quotas for the demo project, set instance and volume resources to unlimited.

5) Create Cinder extra-specs and map volume backend names thereon

Use the command cinder get-capabilities to see the possible Cinder volume attributes.  This comes in handy when assigning the extra-specs to volume types too.

6) Create Cinder volumes on their respective backends:

Create the root Cinder volumes using the netapp_nfs volume type.

Create the data Cinder volumes using the netapp_iscsi volume type.

7) Create Nova instances and associate the Cinder volumes

Before creating the Nova instances, we need to create a security group (firewall) and keypair.

Create forty-nine Nova instances associated with the forty-nine CirrOS root volumes.

Connect the pre-failover Cinder volumes to Nova instances.

Review the attachments.

8) Leave breadcrumbs in the Cinder volumes for later review

At every stage of this process Cinder volumes will be formatted and mounted, and a file will be written within each Nova instance. In effect, we are leaving breadcrumbs to prove that the process works at every point.

Connect to the Nova instance to prepare the block device for use.

Simulate a failure of the production backend

While a true disaster recovery scenario can happen at any moment, planned failovers afford the opportunity to avoid data loss.  In the event of an unplanned disaster recovery scenario, only perform step 3.

  1. (Planned failover only) Power off all instances accessing impacted volumes.
  2. (Planned failover only) With the instances powered off, issue a SnapMirror update from either NetApp’s OnCommand System Manager or via the snapmirror update command. In the event of a planned outage, steps 1 and 2 ensure zero data loss.
  3. Issue the cinder failover-host --backend_id <backend_name> <host@backend> command.

To simulate an actual failure scenario, we chose to disable iSCSI protocol access on storage virtual machine backing the production backend. Towards that end, we disabled the production SVM.

1) Disable protocol access on the source storage cluster

2) Trigger failover of the production Cinder backend

3) Check the status of the Cinder services, ensure that production is failed over

If I/O is performed against the prefailover-iscsi-* block device attached to any of the Nova instances an I/O error will be triggered. Cinder failover-host has no immediate effect on the Nova instances, as they are still pointing to the production target (iqn and LUN). A look inside the Nova database will clear things up a bit.

The nova.block_device_mapping table shows that the Nova instances have not been instructed to disconnect from the production SVM (iprod) and reconnect to the disaster recovery SVM (idr).

The nova.block_device_mapping table maintains a listing of all block devices attached or formerly attached to each instance. We see for example that the instance f7714ff4-c55e-45f6-bf40-a52f37988bb4 currently sees that a device known to the database as /dev/vdb is presently attached (deleted = 0). The connection_info column shows us that Nova is tracking information about the connection behind /dev/vdb.

The connection_info column of the block_device_mapping table is a json object. We need to focus specifically on the key:value pairs found within data key. The data key is a hash storing key:value pairs describing the target, or storage device backing the vdb.

The target_iqn is the production storage arrays iSCSI Qualified Name.

The target_portal is the IP address belonging to a logical interface on the production) SVM.

4) Detach and reattach the Cinder volumes from Nova instances

Show the current state of the Cinder volumes from Cinder’s perspective.

Issue the request to detach the Cinder volumes from the all Nova instances.

Below we see that the detachment request occurred successfully.

For academic purposes, let’s look once more inside the Nova database.

A similar query of the Nova database’s block_device_mapping table shows that the detach has no impact on information maintained in the connection info field. Notice however that the deleted field is now a positive integer, signifying that Nova is aware that the device has been detached. To make the output easier to read, we collapsed the connector hash.

Reattach the Cinder volumes to the Nova instances.

Let’s look once more inside of the Nova database to see what happens upon reattach.

This time the query returns a second entry backing /dev/vdb for the f7714ff4-c55e-45f6-bf40-a52f37988bb4 instance in addition to the first shown above. The first is the one we just looked at above, and for clarity’s sake it is not shown below. Please notice that this second entry’s attachment information comes from the disaster recovery SVM’s iSCSI IQN and LIF. This shows us that the connection information is only modified on attach.

Detach/reattach left instances the in a funky state, proving the necessity of rebooting the instances.

As we attached the block device without rebooting the instances, each instance sees the reattached device as /dev/vdc rather than /dev/vdb which causes problems accessing the filesystems mounted at /dev/vdb. The following demonstrates the point.

At this point we reboot the instances to clean up the instances device entries.

Check access to the volumes had been reset by remounting, then leave breadcrumbs.

So far so good, breadcrumbs are in place.

5) Enable creation of Cinder volumes in DR

At this point we have confirmed that pre-existing Cinder volumes can be accessed from DR. At this point, we have yet to re-enable replication from disaster recovery to any other site, but this is coming up soon. First, let’s test out the creation/attachment of additional Cinder volumes in DR.

Enable the disabled backend so that we can create a new Cinder volume.

The status of the iprod backend is enabled.

6) Create post-failover Cinder volumes, then attach the volumes to their respective instances

7) Create more breadcrumbs, this time on the /dev/vdc device

Prepare idr for production needs and eventual failback

Though the disaster recovery site is serving in the place of production, it is not yet the source of replication. Additionally, you may eventually wish to fail back to the original production site. The steps in this section accomplish both needs:

Keep the following in mind:

  • Cheesecake only enables replication on active backends. At this point, the idr backend is only active by reference of iprod being failed over. The steps in this section enable replication. Replication will occur between idr and the backend configured in the /etc/cinder/cinder.conf.
  • When the iprod environment is brought back online (eventually), you may wish to fail back to iprod. This section walks through making that possible.

Before triggering the failback

1) Prepare for fail back by enabling idr backend and resyncing backwards from idr to iprod

Modify /etc/cinder/cinder.conf placing idr in the list of enabled_backends instead of iprod.

2) Shutdown all Cinder services, as we are about to modify the Cinder database; this is non-disruptive to client I/O

3) Create breadcrumbs

The OpenStack control plane has no impact on I/O, even with openstack-cinder-*.services disabled, client I/O continues. To prove that, we created breadcrumbs here.

Backup your database:

The instructions to follow will take you through modifications of the tables(s) within the Cinder database. Modification of OpenStack database(s) is inherently dangerous, as such it is imperative to preserve the database against the eventuality that something should go wrong. Numerous database engines can be used in OpenStack, so please consult the users guide for your database flavor regarding database backups/dumps.

4) Modify cinder.volumes and cinder.services tables using the following SQL code

Before making changes to the cinder database, we need to understand what the tables that we are about to change look like. @iprod will be replaced with @idr when done.

Replace iprod with idr in the host column of the volumes table. Post failover, all Cinder volumes on the iprod backend had been accessed via the idr backend. From this point forward, the Cinder volumes will get accessed directly from idr without reference of iprod.

The next set of commands remove the failed-over flags from all volumes and services, and the active-backend field is made NULL. Once this is done, two benefits are derived:

  • Cheesecake will be able to resync from idr to iprod. Otherwise, though Cheesecake will setup the SnapMirror relationship, it will not re-establish the transfers between the two sites.
  • iprod, when the time comes, will be able to actively serve data once again.

Before making the change, pay attention to the active_backend_id, disabled_reason and replication status fields.

NOTE: The above command will have no effect, as you already enabled the service. However, the command is included here just in case you had not done so previously.

Update the replication_status fields for affected volumes.

5) Start up the openstack-cinder services

Check the status of the Cinder services as well as the replication status. The idr host no longer shows as failed-over, and idr is shown to be active. Just as cool, replication is now occurring between site idr and iprod.

6) Create and attach one more set of Cinder volumes at the disaster recovery site

7) Once more, we leave breadcrumbs

Failback from idr to iprod

Failing back from the disaster recovery site to the production site is a planned event.  It is important that you shutdown your Nova instances, manually initiate a final SnapMirror transfer, and then fail back.  This section details the process of failing back from the DR backend to the production backend.

Before triggering the failback

1) Enable the SVM on site 1

Recovering the production environment may be more complicated than enabling the production SVM.

2) Shut down the instances prior to failing back

Shutting down the instances prevents further I/O and ensures no data loss.

Confirm that the instances have shut down.

4) Initiate a manual SnapMirror update

SnapMirror transfers have been occurring between idr and iprod, perhaps for some time. After shutting down the Nova instances, trigger a final SnapMirror transfer to pick up the last set of data.

5) Trigger the failback

Notice that idr is now listed as failed over to iprod.

Make the production site production again

1) Modify /etc/cinder/cinder.conf by placing iprod in the list of enabled_backends instead of idr

2) Shut down all Cinder services, as we are about to modify the Cinder database

3) Modify the Cinder database, replacing idr with iprod

NOTE: Get a MySQL database backup before performing this step.

4) Restart Cinder services

5) Re-enable the idr backend so it’s ready to go in case you want to fail over again in the future

6) Using automation, detach, and reattach the Cinder volumes to the Nova instances

Reattaching (detaching and reattaching) multiple block devices from a Nova instance is more complicated than reattaching a single block device because of attachment order issues.  If you want to maintain the attachment order, the devices must all be detached from an instance before any are reattached.  Also, the devices must be reattached in the same order each time.

To simplify this process, I have written and attached to this blog a short Python script called recinder.py. The script is currently only intended for use with Nova instances that are booted from persistent volumes.  In its current form, it will only detach and reattach non-root disks.

How it works:

recinder.py makes an ordered list of all:

  • Cinder volumes attached to Nova instances
  • The Nova instance names
  • The current device names associated with the Cinder volumes according to Cinder. For example /dev/vdb, /dev/vdc, etc…

Using this ordered list, recinder.py detaches all non-root devices (anything other than /dev/vda) from all instances and then reattaches the Cinder volumes in the original order of device_name, i.e. starting at /dev/vdb, then /dev/vdc, etc… In this version, the root device is not considered as root devices cannot be detached. Future versions may have the ability to destroy the Nova instances associated with persistent root volumes and then redeploy and reattach all associated Cinder volumes.

7) Create and attach a final set of Cinder volumes

View all post-failback Cinder volumes. Let’s focus on the total attachment list for a couple of Nova instances.

8) Start up the Nova instances

9) Create more breadcrumbs

10) Check for breadcrumbs

We left breadcrumbs at every stage of the test. These breadcrumbs help us to confirm that filesystems residing on the cinder block devices attached to each nova instance were usable at the end of every stage. Rather than clutter an already wordy blog post, we delayed showing the breadcrumbs until this final point.

Confirm that all filesystems exist, that they contain the correct number of files, and that the files are named as expected.

Picking a sample of instances to examine:

  • Notice that each instance shows four filesystems called “failover”.
  • Notice that the mapping of the filesystems matches what we expect:
    • Prefailover -> vdb
    • Postfailover -> vdc
    • Prefailback -> vdd
    • Postfailback -> vde
  • The files inside each filesystem are as such, and each file name starts with the corresponding instance name.
    • Prefailover -> 3 files [*pre-failover,*post-recover-from-failover,*down-services]
    • Postfailover -> 2 files [*post-failover, *down-services]
    • Prefailback -> 1 file [*pre-failback]
    • Postfailback -> 1 file [*post-failback]

Summary

Failures and disasters happen, so plan for them and test your plans. We hope that this blog has helped you become more comfortable with this one mechanism for performing disaster recovery.  There are many more scenarios to be considered and reviewed. We covered only one in this blog, namely:

  • Recovery by cinder failover-host following the loss of an iSCSI backend while the root volumes remain accessible

The procedures in this blog have demonstrated that Cinder’s built-in (with support for CDOT as of Newton) disaster recovery mechanism is a viable part of your disaster recovery repertoire. Together we have overcome the main obstacle presented by Cinder Cheesecake—a lack of native failback. Remember, no disaster recovery plan can be called such unless it not only can be tested, but is tested, and tested, and tested again.

Chad Morgenstern
Chad is a 10 -year veteran at NetApp having held positions In both escalations, reference architecture teams, and most recently as part of the workload performance team where he contributed to significant understandings of AFF performance for things such as VDI environments, working on the SPC-1 benchmark, and more. In addition, Chad spent time building automated tools and frameworks for performance testing with a few of them based on opensource technologies such as cloud and containers.

Chad is happily married and is the father of four daughters, and soon to be owner of two ferrets.

Leave a Reply