The initial release of the BeeGFS CSI driver included everything necessary to get started using BeeGFS in Kubernetes, but it left room for improvement in a couple of key areas. The 1.1.0 version (available now) has some important enhancements:
- A robust set of automated tests built on the Kubernetes end-to-end test framework is now included in the repo. These tests are run in multiple environments on every build, increasing development efficiency and decreasing the possibility of bugs.
- BeeGFS connection-based authentication is now supported, and the Kubernetes deployment manifests use a Kubernetes Secret to maintain the safety of pre-shared keys.
- BeeGFS Storage Classes can now include directory permission and ownership configuration. This provides increased flexibility and paves the way for administrators to use the BeeGFS quotas feature if it makes sense in their environment.
As always, consult the changelog for a complete description of changes and enhancements.
Why per Storage Class ownership and permissions?
BeeGFS provides the most value when it is used as a large-scale, single-namespace, shared file system. Even in a “traditional” parallel file system environment, this can provide challenges:
- File/directory ownership and permissions must be managed carefully. Mistakes can provide unintended user or workload access to confidential data (or prevent access by the intended user or workload).
- POSIX file systems don’t have a concept of directory capacity. By default, a single user or group can consume the entire capacity of a BeeGFS file system through a single directory.
This post focuses on managing BeeGFS file system permissions in Kubernetes using functionality provided by the CSI driver. This functionality also enables the integration of Storage Classes with BeeGFS quota tracking and enforcement, which can be used to alleviate capacity usage concerns. Watch for a future post on that topic.
In a traditional HPC environment using a workload manager like Slurm:
- Each job is submitted by a known user and its processes run as that user.
- Access to existing directories is limited by uid and gid according to file permissions or ACLs.
- New files and directories created by one of a job’s processes have permissions governed by the umask of that process.
Things can look quite a bit different in a Kubernetes environment. A pod like the one below can run a container as an arbitrary user or group. If no user or group is specified, many container images are configured to run as root.
kind: Pod metadata: name: my-pod spec: securityContext: runAsUser: 1000 # any arbitrary value runAsGroup: 1000 # any arbitrary value fsGroup: 1000 # any arbitrary value ...
What, then, is the correct way for the BeeGFS CSI driver to handle permissions on the directories it creates? If they are locked down too tightly, pods deployed according to best practices (without using root containers) can’t access them. If they aren’t locked down at all, data may be unintentionally exposed to external file system users.
How to use per Storage Class ownership and permissions
When the BeeGFS CSI driver is used as a dynamic provisioner, every Persistent Volume Claim triggers the creation of a new directory on a BeeGFS file system under a configured base directory (specified by the sysMgmtdHost and volDirBasePath parameters of a Storage Class respectively). It’s best practice for an administrator to first create the base directory on the file system and set its permissions out-of-band of the driver (e.g., using beegfs-ctl), but the driver will create the base directory (and any nonexistent parent directories) if necessary.
Because it doesn’t know at provision-time what user or group needs access, in our initial release, the driver always created directories with root:root (uid=0, gid=0) ownership and 0777 permissions. This is still the default behavior, but in 1.1.0 administrators can change any of these parameters on a per Storage Class basis. For example, the following Storage Class causes directories to be created with uid=1000, gid=1000, and drwxr-x-r-x POSIX permissions. Pods with spec.securityContext.runAsUser=1000 can read, write, and execute, but other pods (as well as external file-system users who might have access to the base directory) have more limited capabilities.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: my-storage-class provisioner: beegfs.csi.netapp.com parameters: sysMgmtdHost: mgmtd.host.name volDirBasePath: /path/to/parent/dir permissions/uid: "1000" permissions/gid: "1000" permissions/mode: "0755"
Obviously, pods with spec.securityContext.runAsUser=0 (root), or pods that don’t specify a runAsUser with container images that are configured to run as root can bypass all permissions-based restrictions. Although deprecated in Kubernetes v1.21, Kubernetes Pod Security Policies are one good way to ensure pods cannot run as root if desired. (Kubernetes SIG Security and SIG Auth are working on a replacement, which should be fully implemented before pod Security Policies are removed in v1.25.) Deploying Open Policy Agent as a Validating Admission Controller is another potential strategy.
Why not use fsGroup instead?
Many Kubernetes storage plugins rely on the fsGroup field to mitigate permissions issues. When mounting a volume that supports it, Kubernetes recursively changes the ownership of all files and directories within the volume to match the provided fsGroup and adds that fsGroup to the container processes as a supplemental gid. For example, if fsGroup is set to 1000, all files and directories within a mounted volume are assigned a gid of 1000 and all container processes can access them. This behavior makes sense for certain volume types (it is enabled by default for ReadWriteOnce volumes served up by most plugins), but the intended scale and shared nature of BeeGFS deployments offer many challenges:
- The operation takes a long time for directory trees with many files, and until Kubernetes 1.20 it would occur every time a volume was mounted, whether necessary or not.
- Multiple pods within the same Deployment, or even multiple Deployments might access the same volume simultaneously, exacerbating the above issue.
- Unexpected permissions/ownership changes within a shared BeeGFS file system may be confusing to administrators and detrimental to security. This is especially true in the static provisioning workflow, where a BeeGFS directory created and maintained outside of Kubernetes is used inside of Kubernetes.
For these reasons, the BeeGFS CSI driver intentionally disables fsGroup functionality in its default deployment manifests using the fsGroupPolicy field.
apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: name: beegfs.csi.netapp.com spec: attachRequired: false fsGroupPolicy: None volumeLifecycleModes: - Persistent
NOTE: The fsGroupPolicy field does not exist in Kubernetes <1.18 and is ignored in Kubernetes 1.19 and 1.20 when the CSIVolumeFSGroupPolicy feature gate is not enabled, so it is a good idea to use ReadWriteMany or ReadOnlyMany volumes in most current clusters.
Until next time…
The BeeGFS CSI driver continues to evolve. Check back in late August for another update, and until then, remember to visit netapp.com/ai to learn more about this and other NetApp AI and HPC solutions.