Understanding Different NetApp Data Replication Technologies and DR Best Practices
It is important that we understand different disaster recovery methodologies for Trident and stateful containerized applications in the event of a disaster for business continuance. Business continuance is all about making sure that there should no disruptions in the functioning during and after a disaster has occurred. Business continuance defines two main criteria: recovery point objective (RPO) and recovery time objective (RTO). There are quite a few technologies that ONTAP offers for setting up disaster recovery that facilitates failover from the source storage to the destination storage which is located at remote site. These technologies include MetroCluster, ONTAP SnapMirror SVM and ONTAP SnapMirror Volume.
In this 4 part blog, we will try to understand these DR technologies offered by ONTAP and their respective features and also, we will try to understand how DR should be done for Trident versions 19.04 & below and for Trident versions 19.07 & above using different NetApp data replication technologies. There is a major difference between how Trident versions 19.04 & below and Trident versions 19.07 & above are deployed and essentially the DR steps would differ as well. The former uses its own etcd to store its metadata relating to the volume and storage class it is associated with, but the latter will now utilize Kubernetes CRDs to store and manage its own state. It will use the Kubernetes cluster’s etcd to store its metadata.
MetroCluster business continuity solution guarantees continuous data availability for your critical application. MetroCluster technology replicates cluster onto a secondary site by combining High Availability and SyncMirror. SyncMirror is used to mirror the aggregate data for each cluster in copies, or plexes, giving us redundancy for disk shelves and High Availability gives us redundancy for our controllers. Aggregate mirroring of I/O via SyncMirror technology makes sure that transactions are not lost as the mirrored aggregates only commit a write once it’s been mirrored to the remote aggregate.
You could consider MetroCluster disaster recovery solution. If you would like the following.
- A cluster-wide business continuity solution.
- Zero data loss protection for all volumes on the cluster.
- Support for synchronous replication over FC or IP networks.
- Zero RPO and near-zero RTO.
- MetroCluster is a no-charge feature built into ONTAP.
- Support for third-party storage with NetApp FlexArray®technology.
- Data efficiencies include deduplication, compression, and compaction.
For more information on MetroCluster Solution Architecture and Design, refer the following documentation.
ONTAP SnapMirror SVM Replication
ONTAP SnapMirror for SVM also referred to as SVM DR, is a solution that uses SnapMirror to mirror a storage virtual machine’s (SVM’s) volumes and configuration to simplify data recovery. SnapMirror can be used to replicate a complete SVM which includes its configuration settings and its volumes to the secondary site. In the event of a disaster, SnapMirror destination SVM can be activated to start serving data and switch back to the primary when the systems are restored.
You could consider SnapMirror SVM disaster recovery solution if you would like the following:-
- To have SVM level business continuity solution.
- Manual application failover.
- RTO and RPO in order of minutes.
- Simple setup and deployment procedure.
- No incremental licensing costs. (Require SnapMirror licensing).
For more information on ONTAP SnapMirror SVM Replication, refer the following documentation.
ONTAP SnapMirror Volume Replication
ONTAP SnapMirror Volume Replication is a disaster recovery feature which enables failover to destination storage from primary storage on a volume level. SnapMirror creates a volume replica or mirror of the primary storage on to the secondary storage by syncing snapshots.
ONTAP SnapMirror is a SnapMirror technology creates a replica of the primary working data by taking snapshots and replicates snapshots to the secondary storage. Snapshots are a point-in-time recovery which provides a very quick and easy method of recovering data which has been corrupted or accidentally lost as a result of human or technological error. There are two modes in which ONTAP SnapMirror Volume can be used for disaster recovery. We can use either SnapMirror Asynchronous or SnapMirror Synchronous.
You could consider SnapMirror Volume Asynchronous or Synchronous for disaster recovery solution based on your requirements. Following are the key feature comparisons between SnapMirror Volume Asynchronous and Synchronous.
|SnapMirror Volume Asynchronous||SnapMirror Volume Synchronous|
|To have volume level business continuity solution.||To have volume level business continuity solution.|
|Manual application failover.||Manual application failover.|
|RTO in order of minutes to hours.||RTO in order of minutes.|
|RPO in order of minutes to hours.||Zero data loss protection for the selected volume.|
|Require no complex networking and setup.||Require no complex networking and setup.|
|no additional licensing costs. (Require SnapMirror licensing).||no additional licensing costs. (Require Premium/Flash bundle licensing).|
|Easy setup, deploy and operation||Easy setup, deploy and operation|
For more information on ONTAP SnapMirror Volume Synchronous, refer the following documentation.
For more information on ONTAP SnapMirror Volume Asynchronous, refer the following documentation.
NOTE: Snapshot is an Alpha feature in Kubernetes and is not meant to be used in production.
Comparison of Different Replication Technologies Offered by ONTAP.
The following table shows the comparison of general features of the different replication technologies.
|MetroCluster||SnapMirror SVM||SnapMirror Volume|
|Protection||Zero data loss protection for all volumes||Zero data loss protection for SVM volumes||Point in Time protection for selected volumes||Zero data loss protection for Selected volumes|
|License||SnapMirror License||SnapMirror License||SnapMirror License||Sync + SnapMirror License|
|Protocol||SAN and NAS protocols||SAN and NAS protocols||SAN and NAS protocols||SAN and NAS protocols|
|Failover||Auto Failover||Manual Failover||Manual Failover||Manual Failover|
Disaster Recovery Best Practices for Trident
There are certain general best practices that need to be followed while setting up disaster recovery for Trident
- Create backends on Trident for all the source SVMs which contains Trident volumes and other application volumes, which has a peering relationship to a destination SVM.
- Set appropriate Snapshot policy to the volumes provisioned by Trident on the Source SVM using the “snapshotPolicy” parameter in the backend file.
- Design StorageClasses such that the backends pertaining to the source SVM are chosen and the mirrored backends will be chosen only when required.
- Care should be taken while designing StorageClass such that volumes which do not need the protection of a replication relationship will be provisioned onto the other backend(s)
- Application administrators should understand the additional cost and complexity associated with replicating the data and a plan for recovery should be determined before they leverage data replication.
- For MetroCluster, SnapMirror SVM and SnapMirror Volume DR solution, Trident does not automatically detect SVM failures. Therefore, upon a failure, the administrator needs to run appropriate steps to make sure that data starts serving from the destination cluster.
In the next blog, we will be discussing how Trident Disaster Recovery can be done using MetroCluster :- Trident and Disaster Recovery Pt. 2
We know you will have more questions about things which concern you, so please reach out to us on our Slack team, GitHub issues. We’re happy to help!. Please go through the remaining of the Trident Disaster Recovery blog series to understand how Trident DR can be done using different NetApp data replication technologies.