The etcd database is crucial to NetApp’s Trident storage orchestrator. It stores all backends, storage classes, and volume information used by Trident. This information is necessary to create Persistent Volumes and attach to the backend storage. But what if the data in etcd is corrupted or lost? Though the current running pods would continue to operate normally, their volumes would no longer be manageable and no new volumes could be dynamically provisioned.

To prevent losing all this information in a catastrophe, many enterprises prefer to back up Trident’s etcd volume to restore later if necessary. Currently, there are two methods to back up etcd:

  • ONTAP Snapshot feature
  • etcdctl

In order to use the ONTAP Snapshot feature, storage administrators back up the volume by assigning Trident’s etcd volume to an appropriate snapshot policy defined in ONTAP. The volume could then be restored by the administrator using the ONTAP CLI command: volume snapshot restore.

As an alternative, Kubernetes administrators can use etcdctl to backup Trident’s etcd data store and ultimately restore the volume if required. This blog describes the procedure to back up Trident’s internal etcd by creating snapshots using etcdctl. It also covers the restore procedure from snapshots via etcdctl.

What is etcdctl?

The etcdctl command line utility provides access to an etcd key value store. It automatically installs when the etcd cluster is set up and configured. etcdctl offers the ability to take a snapshot of the etcd cluster which can be later restored using the same utility.

For backward compatibility, etcdctl uses its APIv2 to communicate with the etcd datastore by default. However, the APIv3 must be used to create and restore snapshots. Therefore, be sure to set the API to version 3 prior to creating or restoring a snapshot.

Creating etcdctl Snapshots

Create a snapshot of Trident’s etcd data to use for backup regularly. Store the snapshots under the persistent Trident NetApp volume /var/etcd/data so snapshots are stored safely and securely. Check the volume periodically to be sure it does not run out of space.

To create the snapshot, log into Trident’s etcd container. Issue the command etcdctl snapshot save /var/etcd/data/<name>.db  to take a point in time snapshot of the etcd cluster. Below gives an example of connecting to Trident’s etcd container, setting the API to version 3, and taking a snapshot of the etcd database. We call the snapshot file snapshot1.db in the below example.

ubuntu@demo-tme-ubuntu:~$ kubectl get pods -n trident
NAME                       READY   STATUS    RESTARTS   AGE
trident-7df76c5dcb-67h62   2/2     Running   0          47h

ubuntu@demo-tme-ubuntu:~$ kubectl -n trident exec -it trident-7df76c5dcb-67h62  -c etcd --sh
/ # export ETCDCTL_API=3
/ # etcdctl --endpoints=http://127.0.0.1:8001/ snapshot save /var/etcd/data/snapshot.db
 
Snapshot saved at /var/etcd/data/snapshot.db

/ # exit

 

Note that localhost port 8001 accesses the etcdctl utility within the etcd container.

etcdctl Snapshot Restore

Now that you have the snapshots taken, you can restore the etcd data from a prior snapshot if necessary. To restore Tridents etcd volume, take the following four steps:

  1. Restore a snapshot to a new directory on Trident’s volume.
  2. Uninstall Trident.
  3. Mount Trident’s data volume to an alternate host and copy the new data.
  4. Re-install Trident and verify recovery has completed successfully.
Step 1: Restore a snapshot to a new directory.

After selecting which snapshot to use, restore the snapshot to a new folder within Trident’s volume.

Attach to the etcd container within Trident’s pod. From the etcd container, use the command etcdctl snapshot restore to retrieve the older etcd data. Be sure to restore the snapshot to a new directory inside Tridents volume. The example below attaches to the etcd container, sets the API to version 3, and restores the snapshot to the /var/etcd/data/etcd-test2 folder.

ubuntu@demo-tme-ubuntu:~$ kubectl get pods -n trident
NAME                       READY   STATUS    RESTARTS   AGE
trident-7df76c5dcb-67h62   2/2     Running   0          47h

ubuntu@demo-tme-ubuntu:~$ kubectl -n trident exec -it trident-7df76c5dcb-67h62 -c etcd -- sh <ENTER>
/ # export ETCDCTL_API=3 <ENTER>
/ # etcdctl  snapshot restore /var/etcd/data/snapshot.db --data-dir \ 
> /var/etcd/data/etcd-test2 --name etcd1  --initial-cluster \
> etcd1=http://127.0.0.1:8002 --initial-cluster-token  \
> etcd1 --initial-advertise-peer-urls  http://127.0.0.1:8002 <ENTER>

2019-01-09 13:45:23.618260 I | etcdserver/membership: added member 81ddb55b4108e0de 
[http://127.0.0.1:8002] to cluster 48c4961e2c5f42dd

/ # exit

The port shown above (8002) are assigned to specific etcd functions when Trident is deployed. More information can be found in the deployment file located on github.

This command creates a new ‘member’ under the new directory etcd-test2 as mentioned in Step 3.

Step 2: Uninstall Trident.

Since Trident is currently mounted to the corrupted directory, we must uninstall Trident to remove the mounting. In a later step, we will reinstall it.

After the restoration is complete, uninstall Trident as shown below. Never use the -a option as this will delete the PV and PVC to where the restored version is located.

ubuntu@demo-tme-ubuntu:~$ cd trident-installer
ubuntu@demo-tme-ubuntu:~/trident-installer$./tridentctl uninstall -n trident
Step 3: Mount Trident’s data volume to an alternate host and copy the new data.

Next, mount Trident’s data volume to an alternate host that also has reachability to Trident’s backend. Be sure the proper utilities are installed prior to NFS mounting. After this is mounted to an alternate host, we can delete the old corrupted etcd, and replace it with the version restored from the snapshot.

ubuntu@tme-ubuntu-lab:~$ apt-get install -y nfs-common

Mount Trident’s volume manually on an alternate host as shown in the example below. Delete the corrupted “member” folder located under /etcd/. Copy the new “member” folder from the restored folder, ./etcd/etcd-test2, back to ./etcd/ as depicted below

ubuntu@tme-ubuntu-lab:~$ apt-get install -y nfs-common
ubuntu@tme-ubuntu-lab:~$ sudo mkdir /etcd
ubuntu@tme-ubuntu-lab:~$ sudo mount 192.0.2.1:/trident_trident /etcd
ubuntu@tme-ubuntu-lab:~$ cd /etcd
ubuntu@tme-ubuntu-lab:~/etcd$ sudo rm -rf member
ubuntu@tme-ubuntu-lab:~/etcd$ sudo cp -a etcd-test2/. /etcd

Replace 192.0.2.1 with the IP of your ManagementLIF where Trident’s volume is located, and etcd-test2 with the name of your new folder.

After this is complete, unmount the alternate host to the trident volume using the command:

ubuntu@tme-ubuntu-lab:~/etcd$ cd ..
ubuntu@tme-ubuntu-lab:/$ sudo umount 192.0.2.1:/trident_trident
Step 4: Re-install Trident and verify recovery is successful.

After the copy is complete, re-install Trident. Trident will mount to the recovered etcd volume.

Reinstall trident using the following command:

ubuntu@demo-tme-ubuntu:~/trident-installer$ ./tridentctl install -n trident

Verify if the restore and recovery has been completed successfully by making sure all the required data is present as shown below.

ubuntu@demo-tme-ubuntu:~$ kubectl get pods -n trident
NAME                       READY   STATUS    RESTARTS   AGE
trident-7df76c5dcb-67h62   2/2     Running   1          70s

ubuntu@demo-tme-ubuntu:~$ kubectl -n trident exec -it trident-7df76c5dcb-67h62 \
-c etcd  -- sh
/ # export ETCDCTL_API=3
/ # etcdctl --endpoints=http://127.0.0.1:8001/ get --prefix "" \
/trident/store
{"store_version":"etcdv3","orchestrator_api_version":"1"}
/trident/v1/backend/ontapnas_xx.xx.xx.xx
{"version":"1","config":{"ontap_config":{"version":1,"storageDriverName":"ontap-nas”..

In Conclusion

In this blog, we have demonstrated how to perform a backup and restore of trident’s etcd datastore using the etcdctl command utility. Use etcd snapshots regularly to provide additional safety in the event of a catastrophe.

If you have any questions or comments about what you’ve seen here, we’d love to hear from you! Please add a comment below, reach out to us on the #containers channel on Slack, or open a support case to let us know how we can help.

Jacob Andathethu on EmailJacob Andathethu on Linkedin
Jacob Andathethu
Technical Marketing Engineer at NetApp
A dynamic professional with over 13 years of experience working in Data Storage Industry [NetApp and Dell-EMC]
Currently working as a Technical Marketing Engineer for Open Ecosystem Products in NetApp (Docker,Docker Swarm,Kubernetes, OpenShift).
Diane Patton on Email
Diane Patton
Technical Marketing Engineer at NetApp
Diane is a Technical Marketing Engineer with NetApp, in the open eco-systems group supporting Docker, Kubernetes, and OpenShift integration with NetApp products. She works with product management, marketing, and development to evangelize, support, and help drive new technologies into Trident, the open source storage provisioner for containers maintained by NetApp. Diane has over 20 years experience in the IT industry (Optical DWDM, Layer 2, Layer 3, container networking, storage), holds CCIE 2537 Emeritus, and has a BS and MS in Electrical Engineering. When not working on open technologies, you will find Diane on a tennis court.