By Chad Morgenstern and Jenny Yang
In the blog post An Introduction to Cheesecake in OpenStack, we spoke to Cinder Cheesecake replication, but we didn’t go into detail about how it works. In summary:
- Cinder automatically sets up a replication relationship between backends.
- If a primary storage backend fails, the OpenStack administrator issues the
cinder failover-hostcommand. This causes the Cinder driver to point away from the primary backend and toward the secondary backend.
- Cinder volumes are now ready and waiting on the secondary site for access by Nova.
Steps 1 and 2 above work as expected, but through our lab studies we have discovered that there are some complications associated with step 3. So before showing you how to use Cheesecake for both failover and failback, like we said we would in the previous post, let’s take a look at some nuances that you should be aware of.
As the previous blog pointed out, following Cinder failover, volumes that are attached to instances must be detached and reattached. The reason this is necessary is to pick up the change in provider location. By the way, provider location includes such things as NFS export, iSCSI target IP address, iSCSI target IQN or WWPN for Nova, and SVM, FlexVol, and LUN for Cinder. Or more simply put, the info needed to connect the Nova instance to the block device.
Why are we pointing this out? This degree of complexity is not just an academic issue, but one that requires orchestration to solve. Cinder failover fails over the storage, but it doesn't apparently tell any service upstream that the storage target has changed. Even if it does, for example tell Nova, Nova isn't listening. So, after failover, your environment is still in the same state, down with no ability to self-heal. To complete the process of recovery, all affected Cinder volumes need to be both detached and reattached from their Nova instances.
According to its author, one thing that was in the original spec at one point was that if a volume was in an "in-use" state that it would be auto-detached during a failover. All our lab work has shown this not to be the case.
At a high level, what does this mean? It means that somebody must issue both Nova detach and reattach commands against all impacted Cinder volumes within each affected instance inside of every tenant. Following this, the tenant then has to reboot all affected instances. Without a reboot, at least for Centos 7.2 the block device IDs change. What was /dev/vdb is now /dev/vdc, for example.
Detach, reattach, and reboot are tenant rather than admin level operations and require a degree of cooperation between the OpenStack admin and tenant users. Which brings us to the subject of admin versus tenant level DR planning. If the tenant has their own plans for DR, the OpenStack admin is not involved. Equally, when the OpenStack admin develops a DR plan, the tenants are not consulted.
There is an implied firewall between cloud administrator and tenants. Clouds are designed as self-service and each tenant exists in their own virtual private space, i.e. the virtual private cloud. Whatever a cloud admin does needs to be invisible to the tenants. Now, Cinder failover requires tenant involvement in the post-failover recovery contrary to this principle of admin-tenant isolation. Since the tenant wasn’t involved in the administrator’s DR planning, and in most cases, won't be, then the tenant cannot be expected to be a willing or informed participant in the execution of recovery.
Speaking to the OpenStack development community, add code to Nova to give Nova the ability to both auto-detach and auto-reattach upon Cinder failover. Do this and the mixing of tenant/cloud admin responsibilities is eliminated. Alternatively, add code to Nova with the ability to simply auto-detach non-root block devices upon Cinder failover. This also eliminates the mixing of responsibilities. Upon failover, the block devices are ready for use, leaving it up to the tenant to use these devices or not. Perhaps they will not need them, perhaps they will, either way it is up to the tenant to decide.
Moving on from cloud admin and tenant DR, we address another topic: root volume detachment is prohibited through at least the Pike release of OpenStack. Because Nova can’t detach root volumes, the databases are never updated with the correct provider locations for root devices. Effectively, Nova is unaware that the backend has failed over at all, which means the instance is irreparable. The user then must delete and recreate the instance as a workaround; the user has the option to reuse the original persistent root volume.
For at least two years there have been ongoing discussions around enabling detachment of root volumes. Nova patches have been written, as of Pike the patches have not been merged. See the spec here: https://blueprints.launchpad.net/nova/+spec/detach-boot-volume. If you feel that this blueprint needs to be acted upon, please speak up in the in the weekly Nova IRC meeting or contact the author of the spec personally.
Ultimately Cheesecake’s value add is that it replicates block devices from one backend to another, where backends are defined here as separate storage devices, ideally in separate availability zones. Cheesecake ensures that the block devices are available for use and that the data is preserved within the bounds of a service level agreement. While some to many applications are all but immune to physical outages, especially in the 3rd platform, other applications are not. Cheesecake replication makes the option of re-attachment available to all.
If your environment does not require a separation of roles or a virtual firewall between cloud admin and tenant, Cheesecake should work fine for you, and the following blogs will show you how.
There are 3 scenarios we will cover in more detail in upcoming posts. For now, we simply want to introduce them:
- Your environment encounters an issue where the storage platform has failed yet the Nova compute node has survived. In this scenario, the failure does not affect the root volume. In our studies, we chose to use a persistent root volume, though the scenario is no different with an ephemeral root.
- This time, the Nova instance has a persistent root volume, and this root volume is affected by the disaster.
- The compute node fails as part of the disaster. We haven’t explored this yet, but we will swing back later.
There may be some training involved in the recovery process--keep that in mind. If, however, this separation needs to be maintained, keep the issues we have identified in mind before moving forward.