Self-Service Data Recovery using Trident and NFS

Since the earliest days of ONTAP, snapshots have been a feature which has played a significant role in providing point-in-time recovery options for data which has been lost or corrupted accidentally. They are instant, space efficient, and, when the snapshot directory is turned on for the volume, provide the application or user a way to recover their data without having to rely on the storage administrator.

Configuring Snapshots

Data recovery via snapshots in volumes provisioned by Trident is quite easy. The following process applies to both Trident for Docker and Trident for Kubernetes; for either the ontap-nas or ontap-nas-economy driver. First, a couple of configuration checks:

  1. The snapshot directory needs to be enabled. It is hidden, by default, for volumes provisioned by Trident.

    To enable the snapshot directory using the backend configuration file, set the option shown below. This will make the default for all volumes created using this backend have the snapshot directory visible. If the setting is configured here, then it will not need to be specified in the Kubernetes PVC or as a CLI option for Docker.

    To enable the snapshot directory in the PVC, use the annotation. This will make the snapshot directory visible regardless of the setting in the backend configuration, though if it’s enabled in the backend you won’t need to specify the value here.

    To enable the snapshot directory using Docker volume options, use the following docker volume create command. Like the PVC setting, this will apply regardless of the setting in the backend.

  2. A snapshot policy needs to be applied. Trident does not manage snapshots for the volumes, instead it relies on the ONTAP system’s intrinsic functionality to create and delete snapshots according to a schedule. If no snapshot policy is defined in the backend or at volume creation, then Trident will set the policy to “none”, meaning no snapshots will be taken.

    By default, ONTAP has three snapshot policies: none, default, and default-1weekly. Your storage administrator may have modified those policies or created customized policies, so be sure to work with them when selecting the snapshot policy to use.

    To specify the snapshot policy using the backend configuration file, set the option in the defaults section. Just like above, this will be the default value used for all volumes created using this backend which do not have the option specified in the PVC or at the Docker command line.

    To set the snapshot policy in the PVC use an annotation. This will override the value in the backend configuration.

    Finally, using Docker volume options you can specify the policy to use.

With these two options configured, we can now be assured that the ONTAP system is automatically creating and managing snapshots for us, and the application is able to access data in the snapshots to recover should something happen.

Recovering Data

At this point the application or user can simply browse to the .snapshot directory in their volume. This is a hidden directory and it is read only, so no modifications will be possible in place, instead you will need to copy the data back to the original location. Inside the .snapshot directory will be a number of additional directories, each representing a snapshot on the volume. The name of these directories is when the snapshot was taken, you will need to “cd” to the time you want to recover the data from. Inside that directory you will see what looks like the live file system from that time. Browse to your data location and copy it back.

There are a few important things to know about using the snapshot directory for data recovery:

  • When using OpenShift, the storage administrator will need to set the SVM level configuration option v3-fsid-change to disabled. This is due to how OpenShift handles SELinux contexts, where without the option disabled it will not allow access to the snapshot directory regardless of permissions. From the CLI, the option is accessible from the advanced privilege level.

    To change the setting using the PowerShell Toolkit:

  • Copying large files is not efficient and may take a long time and use a lot of system resources. For very large files it is still a better option to work with the storage administrator to do a single file FlexClone from the snapshot back to the “live” filesystem, which is instant regardless of the size of the file. For the storage administrator, this can be done from the CLI using the volume file clone create command:

    Or, using the PowerShell Toolkit:

  • Snapshots occur at scheduled times. For example, here is the schedule for an unmodified default policy:

    We can see that the hourly snapshots will happen at 5 minutes past the hour, daily snapshots will happen at 12:10am, and weekly snapshots will happen on Sunday at 12:15am. Importantly, only so many of each will be kept. For example, with two daily snapshots being kept, that means that if it is currently Wednesday at 8am, then there will be a daily snapshot from Wednesday at 12:10am and one from Tuesday at 12:10am.

    A volume will not have any snapshots until at least one of the scheduled times has passed, so it’s normal for the .snapshot directory to be empty for a bit at first.

  • Some applications do not function with the snapshot directory being visible, for example MySQL will not start because it is unable to write to the directory and this causes an error. If you encounter this issue with your application, you will need to disable the snapshot directory by destroying and recreating the volume using Trident without the setting enabled (remember, you can override the backend setting using the PVC or Docker CLI), or work with your storage admin to disable the directory for an existing volume. Here is how to disable the directory using the CLI:

    Or, using the PowerShell Toolkit:

Summary

Snapshots are an important part of the self-service paradigm enabled by Trident for the containers ecosystem. Using snapshots, with the snapshot directory visible, enables the user or application to recover data without having to rely on, or wait on, the storage administrator. The application team and the storage team should work together to define snapshot policies which meet their needs, accounting for RTO requirements and capacity consumed by snapshots, then communicate with the Kubernetes (or Docker) administrator to define policies enabled by default, if desired.

If you have any questions about snapshots, snap policies, or how to do self-service data recovery using snapshots, please don’t hesitate to reach out to us using the comments below, our Slack Team, or the NetApp Communities!

Andrew Sullivan on GithubAndrew Sullivan on Twitter
Andrew Sullivan
Technical Marketing Engineer at NetApp
Andrew has worked in the information technology industry for over 10 years, with a rich history of database development, DevOps experience, and virtualization. He is currently focused on storage and virtualization automation, and driving simplicity into everyday workflows.

Leave a Reply