Trident and Disaster Recovery Workflow Using MetroCluster

Welcome to the second installment of Trident and Disaster Recovery using ONTAP Data Replication Technologies. In this blog, we will be discussing a MetroCluster configuration.

MetroCluster Setup

Configuring MetroCluster is beyond the scope of this blog. Please follow the documentation for more details on how to configure MetroCluster. There are various MetroCluster configurations to choose from and they have key differences in the required components. Therefore make sure to choose the appropriate MetroCluster solution for your need.

Trident Disaster Recovery Workflow

Now let’s examine how we can go ahead and do disaster recovery for Trident in a MetroCluster scenario. In the following scenario we have Site A which is configured as switch over site and Site B is the primary site across which MetroCluster has been setup and Trident volume has been setup on a SVM on the cluster in Site B. In this example we will be doing a negotiated (planned) switch over operation to confirm uninterrupted data availability through System Manager.

  1. Let’s Suppose Site B which hosts the Trident has encountered a full disruption and is down. As a part of the DR plan we must initiate a Switchover to Site A. To do a Switch over to Site A, Logon to the Site A cluster using OnCommand System Manager and navigate to the MetroCluster tab that is under “Configuration” Tab. Site A Cluster will detect that Site B Cluster has gone down and will intimate whether Switch over needs to be done from Site B to Site A. The following picture shows the Site A Cluster System Manager UI, which depicts Site A as local and Site B as Remote.It shows Site B as “UNREACHABLE” and Site A as “ACTIVE”

2. Once Switchover is initiated, all the controls will be transferred to Site A and all the SVMs that had been mirrored on Site A will be activated. Once the switchover is completed successfully, the Switchover icon “ACTIVE, SWITCHOVER MODE” will be displayed in green.

Disaster Recovery Workflow for Trident v19.04 and below

The following section discusses the steps for doing a recovery after the switch over has been completed.

  1. After the switch over has been completed to Site A, all the SVM and volumes that had been replicated will now start to serve data.
  2. Please note that the LIFs on a cluster in a MetroCluster configuration are replicated on the partner cluster. However, the SVM name on the partner cluster will have “-mc” suffix.
  3. All the data volumes provisioned by Trident will start serving data as soon as the Site A SVMs are activated.
  4. Make sure to update all the required backends to reflect the new destination SVM name using the “./tridentctl update backend <backend-name> -f <backend-json-file> -n <namespace>” command.
  5. All containerized applications should be running without any disruptions.

If the disaster recovery operation should involve setting up a new Kubernetes cluster on the destination side,then the recovery steps should be as follows.

  1. Make sure that the Kubernetes objects are backed up using a backup utility tool.
  2. Install Trident on the Kubernetes cluster using the Trident volume at Site A using the “./tridentctl install -n <namespace> –volume-name <volume-name>” command
  3. Once Trident is up and running, update all the required backends to reflect the new destination SVM name using the “./tridentctl update backend <backend-name> -f <backend-json-file> -n <namespace>” command.
  4. Import all the application volumes from the SVM from secondary Site A  into the Kubernetes cluster as PVs bound to a PVC using the “tridentctl import volume” command.
  5. After all the PVs are imported ,deploy the application deployment files to restart the containerized applications on the clusters.

 

Disaster Recovery Workflow for Trident v19.07 and above

Trident v19.07 (and beyond) takes advantage of the CRDs to store and manage its own state. It will use the Kubernetes cluster’s etcd to store its metadata. Here we assume that the Kubernetes etcd data files and the certificates are stored on NetApp FlexVolume.

  1. After the Switchover has been completed to Site A, all the SVM and volumes that had been replicated will now start to serve data. The interfaces of the SVMs will remain the same.
  2. Please note that the LIFs on a cluster in a MetroCluster configuration are replicated on the partner cluster. However, the SVM name on the partner cluster will have “-mc” suffix.
  3. All the data volumes provisioned by Trident will start serving data as soon as the Site A SVMs are activated.
  4. Make sure to update all the required backends to reflect the new destination SVM name using the “./tridentctl update backend <backend-name> -f <backend-json-file> -n <namespace>” command.
  5. All containerized applications should be running without any disruptions.

If the disaster recovery operation should involve setting up a new Kubernetes cluster on the destination side,then the recovery steps should be as follows.

  1. Mount the volume from the secondary site which contains the Kubernetes etcd data files and certificates on to the host which will be setup as a master node.
  2. Copy all the required certificates pertaining to the Kubernetes cluster under /etc/kubernetes/pki and the etcd member files under /var/lib/etcd.
  3. Now create a Kubernetes cluster with the “kubeadm init” command along with the “–ignore-preflight-errors=DirAvailable–var-lib-etcd” flag. Please note that the hostnames used for the Kubernetes nodes must same as the source Kubernetes cluster.
  4. Update all the required backends to reflect the new destination SVM name using the ./tridentctl update backend <backend-name> -f <backend-json-file> -n <namespace> command.
  5. Clean up all the application deployments,,PVCs and PVs on the cluster.
  6. Import all the application volumes from the SVM from secondary Site A  into the Kubernetes cluster as PVs bound to a PVC using the “tridentctl import volume” command.
  7. After all the PVs are imported ,redeploy the application deployment with the imported PVs to restart the containerized applications on the clusters.

Next Installment

In the next blog, we will be discussing disaster recovery using ONTAP SnapMirror SVM.

We know you will have more questions about things which concern you, so please reach out to us on our Slack team, GitHub issues. We’re happy to help!. Please go through the remaining of the Trident Disaster Recovery blog series to understand how Trident DR can be done using different NetApp data replication technologies.

Part 1 > Part 2 > Part 3 > Part 4

Jacob Andathethu on EmailJacob Andathethu on Linkedin
Jacob Andathethu
Technical Marketing Engineer at NetApp
A dynamic professional with over 13 years of experience working in Data Storage Industry [NetApp and Dell-EMC]
Currently working as a Technical Marketing Engineer for Open Ecosystem Products in NetApp (Docker,Docker Swarm,Kubernetes, OpenShift).

Pin It on Pinterest