In our version 1.1.0 introductory post, we described how the new per Storage Class ownership and permissions configuration feature works in the BeeGFS CSI driver. While this feature is certainly useful on its own, one of its primary motivations was enabling BeeGFS quota tracking and enforcement.
The reason for quotas
BeeGFS provides the most value when it is used as a large-scale, single-namespace, shared file system. HPC and AI applications benefit greatly from a fully POSIX-compliant file system that can scale virtually indefinitely in both capacity and performance. However, this architecture does have its drawbacks. There is no POSIX-compliant way to divide up a single file system into pieces with defined capacity. For example, you can’t generally say, “This directory can hold 5 TB before it’s full and this other directory can only hold 2 TB.” Since all users and groups share the same file system, by default, one user can continue to fill a single directory they have access to until the entire file system is consumed.
This problem is obviously not new, nor is it specific to BeeGFS. To combat it, XFS, ext4, ZFS, and other file systems provide various quota management features and utilities. For example, the xfs_quota tool can be used to limit the disk space (blocks) and number of files (inodes) consumed by a single user, group, or project, in a properly mounted XFS file system. There are a number of ways to configure quotas (e.g. hard versus soft limits), but ultimately all configurations result in creations/writes by an offending user/group being met with a “disk quota exceeded” message and rejected.
BeeGFS is a distributed file system typically built on top of some number of underlying XFS, ext4, or ZFS mounts. Its quota tracking and enforcement features aggregate the data provided by the underlying mounts to enable file system-wide quota management.
Enabling basic user/group quotas
It’s fairly straightforward to get started with BeeGFS quotas in Kubernetes, but there are some caveats to a simplistic approach that limit its effectiveness. Before BeeGFS quotas can be used, they must be enabled on all BeeGFS server nodes. Client support is also required. Enable this support by setting the quotaEnabled BeeGFS client configuration parameter to true in the configuration file when deploying the BeeGFS CSI driver, as in the example below:
config: beegfsClientConf: quotaEnabled: true # could be enabled here for ALL BeeGFS file systems fileSystemSpecificConfigs: - sysMgmtdHost: mgmtd.host.name config: beegfsClientConf: quotaEnabled: true # could be enabled here for a specific BeeGFS file system
From a BeeGFS-enabled workstation (one with beegfs-ctl installed and network access to the appropriate BeeGFS file system), use beegfs-ctl to set disk or file-based limits for a particular user or group. For example, use the following command to limit the group with gid=1000 to 5 GB of space and 500 files:
beegfs-ctl --setquota --gid=1000 --sizelimit=5G --inodelimit=500
Kubernetes Specific Challenges
As we mentioned in our previous post, Kubernetes provides some specific quota related challenges. Most importantly, a Pod can run as any arbitrary user or group, and many container images are configured to run as root by default. A pod like the one below would be appropriately limited by the quota set above, but it’s not always a good idea to rely on users to police themselves (e.g. to set the right runAsUser or runAsGroup for their Pod).
kind: Pod metadata: name: my-pod spec: securityContext: runAsGroup: 1000 # container processes run with this group ...
Although deprecated in v1.21, Kubernetes Pod Security Policies can be one good way to ensure Pods run as a particular user or group if desired. Deploying Open Policy Agent as a Validating Admission Controller is another potential strategy.
Forcing containers to run as a particular user/group and tying quotas to that user/group works, but it often doesn’t make sense. Many container images expect to run as a particular user, and user management via POSIX uid and gids isn’t all that “Kubernetes-native.” An alternative approach is Storage Class project directory quota tracking.
Enabling Storage Class based project directory quotas
Many local POSIX file systems provide per-project or per-directory quota features, where a single directory (tree) is tracked instead of a user or group. The BeeGFS quota management mechanism is strictly user and group-based, but the documentation describes how BeeGFS quotas can be used to emulate a per-project mechanism. There are essentially three steps:
- Designate a particular gid to represent the “project”.
- Create a new directory for the project owned by its designated gid.
mkdir /mnt/beegfs/project01 chown :1000 /mnt/beegfs/project01
- Enable the setgid flag on the new directory.
chmod g+s /mnt/beegfs/project01
The setgid flag ensures all files and subdirectories created within the project directory have the same group ownership (and that subdirectories propagate the setgid flag), regardless of the user/group that creates them. As a result, all files and subdirectories can be managed together using BeeGFS quotas.
The 1.1.0 release of the BeeGFS CSI driver introduces the ability to set ownership and permissions within a Storage Class. To track and enforce quotas on a per Storage Class basis, apply a designated project gid to the storage class and set permissions so that the setgid flag is enabled. The following Storage Class enables gid=1000 quota tracking and enforcement, even when Pods consuming provisioned Persistent Volumes run as an arbitrary user or group.
NOTE: In octal notation, a digit preceding the standard POSIX user, group, and other permissions designates special modes. The “2” in “2777” enables the setgid flag.
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: csi-beegfs-dyn-sc provisioner: beegfs.csi.netapp.com parameters: sysMgmtdHost: mgmtd.host.name volDirBasePath: /project01 # create PVC directories with a /project01/ prefix permissions/uid: "0" # create PVC directories with root ownership permissions/gid: "1000" # create all files and subdirectories with gid=1000 permissions/mode: "2777" # enable Pods running as any user or group reclaimPolicy: Delete volumeBindingMode: Immediate allowVolumeExpansion: false
Examining the file system from a BeeGFS-enabled workstation after deploying a couple of workloads yields the following:
--> ls -Rln /mnt/beegfs/project01/ /mnt/beegfs/project01/: drwxrwsrwx 1 0 1000 4096 May 19 12:41 pvc-db7a3ec0 drwxrwsrwx 1 0 1000 4096 May 19 12:41 pvc-e65q1v65 /mnt/beegfs/project01/pvc-db7a3ec0: drwxr-sr-x 1 0 1000 4096 May 19 12:41 dir1 -rw-r--r-- 1 0 1000 0 May 19 12:27 file1 -rw-r--r-- 1 0 1000 0 May 19 12:27 file2 /mnt/beegfs/project01/pvc-e65q1v65: drwxr-sr-x 1 0 1000 4096 May 19 12:41 dir1 -rw-rw-r-- 1 1000 1000 0 May 19 12:28 file1 -rw-rw-r-- 1 1000 1000 0 May 19 12:28 file2
There are a couple of key takeaways here:
- All directories have the setgid flag enabled (“s” in “drwxrwsrwx” or “drwxr-sr-x”).
- pvc-db7a3ec0 has been used by a container running as root, but its created files and directories have gid=1000 group ownership.
- pvc-e65q1v65 has been used by a container with spec.securityContext.runAsUser=1000 and its created files and directories also have gid=1000 group ownership.
- All files and directories created by containers within PVC directories have permissions consistent with the container process umask. The setgid flag ensures proper group ownership but has no additional, unintended effect.
Now BeeGFS quota tracking and enforcement can be applied as necessary to gid=1000. Kubernetes workloads using this Storage Class cannot unfairly consume more than their fair share of file system resources.
It’s worth noting that setting “2777” permissions in this Storage Class has the potential to open a security hole, as it could give external file system users access to PVC directories. To mitigate this, either:
- Pre-create the project01 directory on the BeeGFS file system with selective ownership and reduced permissions (root:root and 0700 make it impossible for any user but root to navigate to the PVC directories), or
- Set more restrictive ownership and permissions in the Storage Class and ensure Pods run as a user or group with appropriate access.
We could write a novel on how to tune ownership and permissions for BeeGFS quota scenarios with the BeeGFS CSI driver in Kubernetes, but we likely still wouldn’t hit on your specific access needs and use case. Hopefully this post armed you with the basic information you need to get started, but feel free to reach out to firstname.lastname@example.org with directed questions or open up an issue on our Github page with specific concerns. There is additional information and examples in our quotas documentation as well. As always, remember to visit netapp.com/ai to learn more about this and other NetApp AI and HPC solutions.