If you haven’t been living under a rock, you have probably heard about Prometheus and Grafana. Both are great tools to implement a monitoring framework that is visually descriptive and powerful. Trident provides a set of Prometheus metrics that you can use to obtain insights on its performance and the various entities it manages: backends, volumes created, space allocated, Storage Classes managed, and a lot more. With 20.07, Trident also provides users per-volume usage information: total space allocated, the amount of space available, and used space per volume. This blog will show you how this works.

Configuration

 

The best place to get started is an earlier blog that I had written: https://netapp.io/2020/02/20/prometheus-and-trident/. I already have a Kubernetes cluster with Prometheus and Grafana installed. In addition, Trident 20.07 is installed using the instructions found here. There is nothing customized for the Trident install; the metrics are returned right out the box.

$ tridentctl version -n trident
+----------------+----------------+
| SERVER VERSION | CLIENT VERSION |
+----------------+----------------+
| 20.07.1        | 20.07.1        |
+----------------+----------------+

 

Kubelet exposes per-volume metrics that you can now obtain and identify how much space is being used per volume. To do this, you will need to use the volumestatsaggperiod parameter and set it to a non-zero value. Depending on how you install kubelet, it should be as easy as setting volumestatsaggperiod on your kubelet config file. I am working with kubeadm and I had to update my “/var/lib/kubelet/config.yaml”.

There are different ways you can access your Prometheus and Grafana dashboards, depending on your environment and how you install them. In my case, I used the Prometheus operator, and here’s how I did it:

$ kubectl create ns monitoring
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm repo update
$ helm install prometheus prometheus-community/kube-prometheus-stack -f custom-values.yaml -n monitoring

 

custom-values.yaml looks like this:

prometheus:
 service:
 type: NodePort
 prometheusSpec:
 storageSpec:
 volumeClaimTemplate:
 spec:
 storageClassName: storage-class-nas
 accessModes: ["ReadWriteOnce"]
 resources:
 requests:
 storage: 2Gi
grafana:
 adminPassword: "admin-password”
 enabled: true
 defaultDashboardsEnabled: true
 persistence:
 enabled: true
 type: pvc
 size: 1Gi
 storageClassName: storage-class-nas
 accessModes: ["ReadWriteOnce"]
 service:
 type: NodePort
 nodePort: 30093

This will expose the Prometheus UI on port 30090, the Grafana UI on 30093, and store its data on PVCs provisioned by Trident.

PromQL: your new best friend

 

There’s a great number of queries that you can craft with PromQL. Here’s some of my favorites. I’m starting off with some simple ones:

1. Fetch the total number of volumes per backend:
sum(trident_volume_count) by (backend_uuid)

2. Used bytes per PVC. This returns the used bytes per PVC for all namespaces but you can easily narrow it down too:
(kubelet_volume_stats_used_bytes)/(1024*1024*1024)
(kubelet_volume_stats_used_bytes{namespace=“default”})/(1024*1024*1024)
3. Extending (1) to get backend information, along with the number of volumes per backend:
 trident_volume_count * on (backend_uuid)
 group_left (backend_name, backend_type) trident_backend_info
 or on (backend_uuid) trident_volume_count
 

There’s a lot happening here so let’s break this down:

  • The * operator is effectively a join on two metrics (trident_volume_count and trident_backend_info). The key used for the join is going to be backend_uuid.
  • To also include labels that are not available in trident_volume_count, group_left is used. So here, I’m getting backend_name and backend_type from trident_backend_info.
  • The (or) directive is to fetch backends that do not have any volumes yet (empty sets).
4. Observe the amount of space allocated per backend over time:
  (sum(trident_volume_allocated_bytes) by (backend_uuid)) / (1024*1024*1024)
   * on (backend_uuid) group_left(backend_name) trident_backend_info
  

5. Find the rate of ONTAP operations, by operation/SVM. For example, here’s what the autosupport call rate looks like:
 rate(trident_ontap_operation_duration_in_milliseconds_by_svm_sum{op="ems-autosupport-log"}[5m])
 

The rate function will average counter values seen for the time period specified. You may want to use a larger time step to average over a longer timeframe.

Insight is right around the corner

 

There’s loads of great sessions at this year’s Insight, which will be an all-virtual event. The best part? Registration is free. You can sign up here. Be sure to be on the lookout for Trident sessions, a few of which are listed below:

 

BREAKOUT SESSIONS:

  1. [BRK-1170-2] Operator, Operator: A New Way to Automatically Manage Trident
  2. [BRK-1273-2] K8s and Cloud Volumes ONTAP – a Match Made in the Cloud
  3. [BRK-1281-3] Take a Deep Dive into Red Hat OpenShift on NetApp HCI

 

CUSTOMER BREAKOUT SESSIONS:

  1. [BRK-1565-2] The Road to Stateful Applications on Kubernetes at Yahoo! Japan
  2. [BRK-1462-2] Provisioning Containers with NetApp Storage via Trident Anywhere [Tractor Supply Company]
  3. [BRK-1499-2] SK Telecom All Container Orchestrator with NetApp Trident

 

SPEED BREAKOUT SESSIONS:

  1. [SPD-1115-2] Getting Results Quickly with AI Inference on NetApp HCI
  2. [SPD-1171-2] Monitoring in the Kubernetes Era: Prometheus and Trident

 

DEMOS:

  1. [DEM-1191-3] Trident Demo: Metrics
  2. [DEM-1116-2] NetApp IT Automation Use Case for Trident, OpenShift and Ansible

 

Questions? Comments? Stay in touch with us on Slack! Join our Slack workspace and hang out at the #containers channel.

Bala RameshBabu
Bala is a Technical Marketing Engineer who focuses on Trident, NetApp's dynamic storage provisioner for Kubernetes and Docker. With a background in OpenStack, he focuses on open-source solutions and DevOps workflows. When not at work, you can find him in a soccer field or reading biographies

Pin It on Pinterest