If you haven’t been living under a rock, you have probably heard about Prometheus and Grafana. Both are great tools to implement a monitoring framework that is visually descriptive and powerful. Trident provides a set of Prometheus metrics that you can use to obtain insights on its performance and the various entities it manages: backends, volumes created, space allocated, Storage Classes managed, and a lot more. With 20.07, Trident also provides users per-volume usage information: total space allocated, the amount of space available, and used space per volume. This blog will show you how this works.
Configuration
The best place to get started is an earlier blog that I had written: https://netapp.io/2020/02/20/prometheus-and-trident/. I already have a Kubernetes cluster with Prometheus and Grafana installed. In addition, Trident 20.07 is installed using the instructions found here. There is nothing customized for the Trident install; the metrics are returned right out the box.
$ tridentctl version -n trident +----------------+----------------+ | SERVER VERSION | CLIENT VERSION | +----------------+----------------+ | 20.07.1 | 20.07.1 | +----------------+----------------+
Kubelet exposes per-volume metrics that you can now obtain and identify how much space is being used per volume. To do this, you will need to use the volumestatsaggperiod
parameter and set it to a non-zero value. Depending on how you install kubelet, it should be as easy as setting volumestatsaggperiod
on your kubelet config file. I am working with kubeadm and I had to update my “/var/lib/kubelet/config.yaml”.
There are different ways you can access your Prometheus and Grafana dashboards, depending on your environment and how you install them. In my case, I used the Prometheus operator, and here’s how I did it:
$ kubectl create ns monitoring $ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts $ helm repo update $ helm install prometheus prometheus-community/kube-prometheus-stack -f custom-values.yaml -n monitoring
custom-values.yaml looks like this:
prometheus: service: type: NodePort prometheusSpec: storageSpec: volumeClaimTemplate: spec: storageClassName: storage-class-nas accessModes: ["ReadWriteOnce"] resources: requests: storage: 2Gi grafana: adminPassword: "admin-password” enabled: true defaultDashboardsEnabled: true persistence: enabled: true type: pvc size: 1Gi storageClassName: storage-class-nas accessModes: ["ReadWriteOnce"] service: type: NodePort nodePort: 30093
This will expose the Prometheus UI on port 30090, the Grafana UI on 30093, and store its data on PVCs provisioned by Trident.
PromQL: your new best friend
There’s a great number of queries that you can craft with PromQL. Here’s some of my favorites. I’m starting off with some simple ones:
1. Fetch the total number of volumes per backend:
sum(trident_volume_count) by (backend_uuid)
2. Used bytes per PVC. This returns the used bytes per PVC for all namespaces but you can easily narrow it down too:
(kubelet_volume_stats_used_bytes)/(1024*1024*1024) (kubelet_volume_stats_used_bytes{namespace=“default”})/(1024*1024*1024)

3. Extending (1) to get backend information, along with the number of volumes per backend:
trident_volume_count * on (backend_uuid) group_left (backend_name, backend_type) trident_backend_info or on (backend_uuid) trident_volume_count
There’s a lot happening here so let’s break this down:
- The
*
operator is effectively a join on two metrics (trident_volume_count
andtrident_backend_info
). The key used for the join is going to bebackend_uuid
. - To also include labels that are not available in
trident_volume_count
,group_left
is used. So here, I’m gettingbackend_name
andbackend_type
fromtrident_backend_info
. - The
(or)
directive is to fetch backends that do not have any volumes yet (empty sets).
4. Observe the amount of space allocated per backend over time:
(sum(trident_volume_allocated_bytes) by (backend_uuid)) / (1024*1024*1024) * on (backend_uuid) group_left(backend_name) trident_backend_info
5. Find the rate of ONTAP operations, by operation/SVM. For example, here’s what the autosupport call rate looks like:
rate(trident_ontap_operation_duration_in_milliseconds_by_svm_sum{op="ems-autosupport-log"}[5m])
The rate function will average counter values seen for the time period specified. You may want to use a larger time step to average over a longer timeframe.
Insight is right around the corner
There’s loads of great sessions at this year’s Insight, which will be an all-virtual event. The best part? Registration is free. You can sign up here. Be sure to be on the lookout for Trident sessions, a few of which are listed below:
BREAKOUT SESSIONS:
- [BRK-1170-2] Operator, Operator: A New Way to Automatically Manage Trident
- [BRK-1273-2] K8s and Cloud Volumes ONTAP – a Match Made in the Cloud
- [BRK-1281-3] Take a Deep Dive into Red Hat OpenShift on NetApp HCI
CUSTOMER BREAKOUT SESSIONS:
- [BRK-1565-2] The Road to Stateful Applications on Kubernetes at Yahoo! Japan
- [BRK-1462-2] Provisioning Containers with NetApp Storage via Trident Anywhere [Tractor Supply Company]
- [BRK-1499-2] SK Telecom All Container Orchestrator with NetApp Trident
SPEED BREAKOUT SESSIONS:
- [SPD-1115-2] Getting Results Quickly with AI Inference on NetApp HCI
- [SPD-1171-2] Monitoring in the Kubernetes Era: Prometheus and Trident
DEMOS:
- [DEM-1191-3] Trident Demo: Metrics
- [DEM-1116-2] NetApp IT Automation Use Case for Trident, OpenShift and Ansible
Questions? Comments? Stay in touch with us on Slack! Join our Slack workspace and hang out at the #containers channel.