Ever tried to run the NetApp docker volume plugin (nDVP) with Docker Swarm? It’s fairly easy. But what if you want to do this on AWS and use ONTAP Cloud as storage? Let’s have a look.

Docker on AWS

First, what are the options to run Swarm on AWS?

An obvious way would be to deploy some Linux EC2 instances and create a Swarm cluster the traditional way. But Docker’s recommendation looks different:

With Docker for AWS they provide AWS CloudFormation templates to easily provision a Swarm cluster. Enter some parameters and it automatically deploys a cluster on EC2. In my tests it deployed Docker CE for AWS 17.09.0-ce.

From here on, installing nDVP should be fairly easy. On a traditional Linux. But this in no traditional Linux.

Enter Linuxkit

Instead of deploying a typical Linux like Ubuntu/Red Hat/AWS AMI, the template deploys an LinuxKit based OS, which is Docker’s intended way forward. If you are not aware of it, I recommend doing some research on Moby and it’s sub-projects like LinuxKit, containerd, etc. In short, LinuxKit is a hardened, very stripped down, immutable Linux which runs everything (even system processes) as containers.

If you login to such a system, you don’t end up on the docker host itself, but in a management container:

~ $ docker ps
 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
 c299280b515f docker4x/l4controller-aws:17.09.0-ce-aws1 ...
 43c7de9fd299 docker4x/meta-aws:17.09.0-ce-aws1 ...
 ec36f58b0e55 docker4x/guide-aws:17.09.0-ce-aws1 ...
 9d1fb5462b8e docker4x/shell-aws:17.09.0-ce-aws1 ...
~ $

shell-aws is the container you end up if you SSH into a Swarm manager.

And no, you cannot access the hosts filesystem in it. This is the point where the nDVP installation instruction breaks, since you are not able to create the configuration file. I will show you how to get it running anyway, but before this, let’s get our persistent cloud storage configured. Look here, why you want to use ONTAP.

Installing ONTAP Cloud

First, provision an ONTAP Cloud (OTC) system in the same VPC as your Swarm. If you are unfamiliar how to do this, start with this video or follow the quick installation instructions. For a more detailed look on how to use OnCommand Cloud Manager (OCCM) to install ONTAP Cloud, see here.

After the installation is done, use OCCM to find out the cluster management IP and the NFS data IP of your storage.

Where to find LIF IPs in OnCommand Cloud Manager.

Configuring ONTAP Cloud

First, we create a export policy for our docker hosts. Since OTC only got private IPs, we use the Swarm hosts as jumphost:

# ssh -i <your AWS keyfile> docker@<any swarm manager public IP>
Welcome to Docker!
~ $ ssh admin@10.160.7.114 # the cluster management IP
Password:
okdocker01::> export-policy create -policyname dockerPolicy
okdocker01::> export-policy rule create -policyname dockerPolicy -clientmatch <VPC CIDR> -rorule sys -rwrule sys -protocol nfs -superuser sys 

You might want to make the clientmatch smaller, or create individual rules per Swarm host. But keep in mind that the Swarm if configured with an AWS autoscaling group, which might spin up additional hosts. If your clientmatch doesn’t account for it, you end up in permission denied problems for the unaccounted hosts.

Create a service user for nDVP as described in the documentation. I will use the system’s admin user for now.

Installing nDVP

Now comes the tricky part. We need to create the nDVP config file in the hosts /etc/netappdvp/, which the management container cannot access. But we can solve it the docker way and create a container which mounts /etc from the host:

~ $ docker run --rm -ti -v /etc:/data alpine:latest ash
/ # mkdir -p /data/netappdvp
/ # cat << EOF > /data/netappdvp/config.json
> {
>  "version": 1,
>  "storageDriverName": "ontap-nas",
>  "managementLIF": "10.160.7.114",
>  "dataLIF": "10.160.7.22",
>  "svm": "svm_okdocker01",
>  "username": "admin",
>  "password": "superpassword",
>  "aggregate": "aggr1",
>  "nfsMountOptions": "-o nfsvers=3,nolock",
>  "defaults": {
>   "size": "10G",
>   "spaceReserve": "none",
>   "exportPolicy": "dockerPolicy"
>  }
> }
> EOF
/ # exit

You need to adjust managementLIF, dataLIF, svm, username, and password according to your environment. The only special lines in this config file are exportPolicy, where I specified the export policy we created above and nfsMountOptions.

nfsMountOptions is required here. The LinuxKit host doesn’t start rpc.lockd or rpc.statd, which are required for locking with NFSv3. Without the “nolock” option, mounting/attaching nDVP volumes will fail, due to missing rpc.statd. I found no way to start rpc.statd on this deployment, since the host OS is so well guarded. With “nolock” we simply disable NFS locking; the host now manages locks locally without propagating them.

Locking in NFSv3 is a story of its own. Since you normally don’t attach a volume to multiple hosts, you should be fine. If you do, beware that you cannot lock files and processes might concurrently write to the same file.

Now we can install the plugin on the docker host:

~ $ docker plugin install --grant-all-permissions --alias netapp netapp/ndvp-plugin:latest config=config.json
latest: Pulling from netapp/ndvp-plugin
92601a6e915c: Download complete 
Digest: sha256:26e1826c4e534f5b87028027780abf9160e2eb5e57153b35cac514a3e69ddd2d
Status: Downloaded newer image for netapp/ndvp-plugin:latest
Installed plugin netapp/ndvp-plugin:latest

Use nDVP

Finally we can create and use docker volumes, stored securely on an ONTAP NFS backend:

~ $ docker volume create -d netapp --name myVol
myVol
~ $ docker run --rm -ti -v myVol:/data alpine:latest ash
/ # touch /data/test01

And finally: clone a volume using FlexClone. It is instant, space efficient (only changed data consumes storage) and performance neutral.

~ $ docker volume create -d netapp --name myClone -o from=myVol
myClone
~ $ docker run --rm -ti -v myClone:/data alpine:latest ash
/ # ls /data
test01
~ $ docker volume ls
DRIVER VOLUME NAME
netapp:latest myClone
netapp:latest myVol
local sshkey

Next steps

Installing nDVP on a Docker Swarm – deployed via Docker’s CloudFormation template in AWS – requires some tweaks, but works well.

The next steps are to install nDVP on all hosts of the Swarm. This can be done manually, but remember, it is an autoscaling group and hosts can appear or disappear. New hosts will be missing nDVP and will not be able to access the volumes until nDVP is installed. Not a desirable thing in a Swarm cluster.

It makes more sense to include the installation of nDVP into the automated deployment process.

But that is another story and shall be told another time …

If you have any questions, please reach out using the comments below or the Slack team.

EDIT: Replaced the the link to ONTAP Cloud installation to an updated one.

Oliver Krause
Cloud Solution Architect at NetApp Deutschland GmbH
Nerd doing computer stuff for over 20 years. Ex-NAS expert, Ex-ONTAP expert, Ex-MetroCluster expert, ex-many other stuff. Now doing Clouds. The water and the server ones. My shell is my castle.