Trident and Disaster Recovery Workflow Using ONTAP SnapMirror Volume

Welcome to the last and final blog of the 4 part series on how to do Trident Disaster Recovery using ONTAP Data Replication Technologies. In this blog, we will be discussing how Trident Disaster Recovery can be done using ONTAP Snapmirror Volume

SnapMirror Volume Replication Setup

The following section discusses in-depth the steps to setup ONTAP SnapMirror Volume Replication using OnCommand System Manager.

  1. Set up peering between the Source and Destination Cluster and SVM. For information on setting cluster and SVM peering, refer the following documentation.
  2. Create a protection policy which controls the behavior of the relationship and specifies the configuration attributes for that relationship. Specify the appropriate “Policy Type”, “Policy Name” and “Transfer Priority”.
  3. Create schedules using OnCommand System manager UI “Schedules” page, by choosing the required schedule.
  4. Create a SnapMirror relationship between the destination volume and the source volume using System Manager UI “ Volume Relationship “ page. Click on “Create” and the SnapMirror create wizard will be initiated.
    • Give in the appropriate “Replication” and “Replication Type”. For more information on which type of replication is suitable for you, refer the following documentation.
    • Choose the source cluster, source SVM and the source volume.
    • Choose the Destination SVM of choice and the destination volume.
    • Configure the appropriate Mirror and Vault Policy and the schedule.
    • Click on the “Validate” button and if the validation completes without any issues, click on the create button.
  •   
  1. After the SnapMirror relationship is created, the relationship will be initialized so that a baseline transfer from the source volume to the destination volume will be completed.

 

SnapMirror Volume Disaster Recovery Workflow (Trident 19.04 and Below)

Disaster recovery using SnapMirror Volume Replication is not as seamless as the SnapMirror SVM Replication. This means that we would need to do a few steps before we can start using the destination volume.

  1. In the event of a disaster, stop all the scheduled SnapMirror transfers and abort all ongoing SnapMirror transfers. Break the replication relationship between the destination and source Trident etcd volume so that the destination volume becomes Read/Write. Also, make sure that all the other application data volumes are also made into Read/Write volumes on the destination side.Uninstall
  2. Uninstall Trident from the Kubernetes cluster using the “tridentctl uninstall -n <namespace>” command. Don’t use the -a flag during the uninstall.
  3. Create a new backend.json file using the new IP and new SVM name of the destination SVM where the Trident etcd volume resides.
  4. Re-install Trident in the  Kubernetes cluster using “tridentctl install -n <namespace>” command with the –volume-name option to point the volume to the previous Trident volume.
  5. Now the Trident plugin will be up and running ready to provision volumes dynamically with the Trident volume being served from the secondary destination SVM
  6. Create new backends on Trident to point to the required destination SVMs where the data volumes have been mirrored to. Use the new IP and the new SVM name of the destination SVM while creating the new backends.
  7. Create new Storage Classes to point to the newly created backends.
  8. Clean up the previous deployments which were consuming PVC bound to volumes on the source SVM.
  9. Now import the required application data volumes as a PV bound to a new PVC using the Trident import feature.
  10. Re-deploy the application deployments with the newly created PVCs.

If the disaster recovery operation should involve setting up a new Kubernetes cluster on the destination side, then the recovery steps should be as follows.

  1. Make sure that the Kubernetes objects are backed up using a backup utility tool.
  2. Install Trident on the Kubernetes cluster using the Trident volume at the destination site using the “./tridentctl install -n <namespace> –volume-name <volume-name>” command
  3. Create new backends on Trident to point to the required destination SVMs where the data volumes have been mirrored to. Use the new IP and the new SVM name and password of the destination SVM while creating the new backends.
  4. Create new Storage Classes to point to the newly created backends.
  5. Import all the other application volumes at the secondary site  into the Kubernetes cluster as PV bound to a PVC using the Trident import feature.
  6. After all the PVs are imported, redeploy the application deployment with the imported PVs to restart the containerized applications on the clusters.

SnapMirror Volume Disaster Recovery Workflow (Trident 19.07 and Above)

As mentioned before, Trident v19.07 and beyond will now utilize Kubernetes CRDs to store and manage its own state. Trident will store its metadata in the Kubernetes cluster’s etcd database. Here we assume that the Kubernetes etcd data files and the certificates are stored on NetApp FlexVolume which is SnapMirrored to the destination volume at the secondary site. The following steps describe how we can recover a single master Kubernetes Cluster with Trident.

  1. In the event of a disaster, the Kubernetes cluster may become unstable as it may not have access to the volume which contains the etcd member data folder as the site has gone down. Under such circumstances, a cluster cannot make any changes to its current state, which implies no new pods can be scheduled.
  2. From the destination side, stop all the scheduled SnapMirror transfers and abort all ongoing SnapMirror transfers. Break the replication relationship between the destination and source volumes so that the destination volume becomes Read/Write.
  3. From the destination SVM, mount the volumes which contain the Kubernetes etcd data files and certificates at the appropriate location on to the host which has been set up as a master node.
  4. Use the kubectl get crd command to verify if all the Trident custom resources have come up and retrieve Trident objects to make sure that all the data is available.
  5. Clean up the previous backends and create new backends on Trident using the new Management and Data LIF, new SVM name and password of the destination SVM.
  6. Clean up the deployments, PVCs and PV, from the Kubernetes cluster.
  7. Now import the required volumes as a PV bound to a new PVC using the Trident import feature.
  8. Re-deploy the application deployments with the newly created PVCs.

If the disaster recovery operation should involve setting up a new Kubernetes cluster on the destination side, then the recovery steps should be as follows.

  1. Make sure that the Kubernetes objects are backed up using a backup utility tool.
  2. Mount the volumes from the secondary site which contains the Kubernetes etcd data files and certificates at the appropriate location on the host which will be set up as a master node.
  3. Now create a Kubernetes cluster with the kubeadm init command along with the –ignore-preflight-errors=DirAvailable–var-lib-etcd flag. Please note that the hostnames used for the Kubernetes nodes must the same as the source Kubernetes cluster.
  4. Create new backends on Trident by using the new Management and Data LIF, new SVM name and password of the destination SVM.
  5. Create new Storage Classes to point to the newly created backends.
  6. Import all the other application volumes at the secondary site into the Kubernetes cluster as PV bound to a PVC using the “tridentctl import volume“ command.
  7. After all the PVs are imported, redeploy the application deployment with the imported PVs to restart the containerized applications on the clusters.

 

Summing Up

In this blog series, we have looked into the different replication technologies offered by NetApp for Disaster Recovery operation which include MetroCluster,SnapMirror SVM  and SnapMirror Volumes and also have discussed in detail how to do Trident disaster recovery workflows for these different replication technologies.

We know you will have more questions about things which concern you, so please reach out to us on our Slack team, GitHub issues. We’re happy to help!

Part 1 > Part 2 > Part 3 > Part 4

Jacob Andathethu on EmailJacob Andathethu on Linkedin
Jacob Andathethu
Technical Marketing Engineer at NetApp
A dynamic professional with over 13 years of experience working in Data Storage Industry [NetApp and Dell-EMC]
Currently working as a Technical Marketing Engineer for Open Ecosystem Products in NetApp (Docker,Docker Swarm,Kubernetes, OpenShift).

Pin It on Pinterest