A couple of months ago, our team received the mandate to set up a Red Hat OpenShift cluster in offline mode to support a brilliant use case for gaining performance insight. I was very excited. I use the word “excited” because it’s fascinating to see how Kubernetes has become one of the fastest-growing open-source projects in GitHub. OpenShift automates application builds, deployments, scaling, health management, and more by leveraging integrated components from Kubernetes.

Working with OpenShift and Kubernetes was new for me. I have been working with Docker for quite some time, specifically to containerize the Hadoop cluster. When tackling that complexity, I often remind myself that real applications don’t run in a single container. I initially tried to scale Docker containers across multiple hosts by setting up an overlay network. This is when I realized the importance of a good orchestrator that can simplify cumbersome network settings.

In the Docker world, we hear very little about orchestrators, and it is really tough to choose one that suits your application needs if you don’t have any prior knowledge about it. My love of Kubernetes started when I saw the powerful networking model that it offers across multiple hosts. I thought about why Red Hat chose Kubernetes, and, the more I worked with it, the better I understood. I was really surprised to find the powerful features and flexibility that Kubernetes provides to extend and maintain code.

Our Use Case

Now that I’ve explained why we chose OpenShift to meet the needs of our environment, let me give a brief explanation about how we plan to use it. Our first use case is a microservice that has intelligence to parse the AutoSupport® data and provide APIs to consume different performance counters.

For NetApp® ONTAP® 9.2 and later versions, performance-related data is provided in the Cluster Counter Manager Archive (CCMA) format inside a tarball. The existing MapReduce setup does not have the intelligence to parse CCMA files, so the team came up with a Docker container-based solution that parses CCMA data and exposes REST APIs to consume counters. For a NetApp AutoSupport payload that has CCMA data, this microservice transforms and queries this data. This approach provides valuable insights into the performance content for all kernel objects.

Is It a Cakewalk? Definitely Not!

Before I start working on implementing something in offline mode with limited pointers to the software binaries, I often like to remind myself that this task might not be a piece of cake. That’s true especially when I work with a new technology for the first time. I was able to redefine happiness by finding a internal repository containing all the OpenShift binaries. With this information and with recommendations from the IT team, I had all the necessary dependencies, so I hit the road.

For me, the only comfortable part of the journey was that I was working with an Ansible-based installation, which helped a lot in going back and forth with the installation process. Red Hat provides many references for a customized implementation, assuming that a user is familiar with Ansible. Thankfully, I am. But the complexity lies in finding the right source of documentation for your customization and in having the iterative installation process resolve all dynamic issues.

In my case, the installation procedure expected the hardware to be equipped with a few dependencies that had not been covered by the preflight considerations from IT. Also, the installer by default connects to the internet to obtain all the necessary images. Therefore, additional configuration changes were required to support the offline mode.

Understanding the Red Hat references and moving all the necessary images inside required a lot of reading about how the software works. The flow included pushing these images into the NetApp centralized Docker registry, pointing the installer to make use of that internal registry, and triggering the required infrastructure pods.

Service requests for block and object storage, DNS wildcards, application service accounts, and many other issues came into play as I went through the setup step by step. I also had to try out different Docker versions because the setup required a specific version of Docker, which I had to source separately, excluding the Ansible run.

These are just a few of the issues that I encountered during this iterative process, and they helped me to learn this container application platform.

The First Application Is Always Special

It’s always special to see the end-to-end flow of an application after it goes live. With the help of our development team, we connected all the dots and could understand the different segments in detail. This level of understanding and documentation is required to support any kind of application. In fact, this project helped me a great deal in working on the dynamic Hadoop cluster setup on top of the OpenShift–Amazon Web Services cluster. We made great progress with this project, but going forward we still need to explore OpenShift CLI support for the automated deployments from Jenkins.

My first time out with Kubernetes and OpenShift was a great learning process. There is still a lot to learn about OpenShift features, and I’m really looking forward to seeing all the other use cases that we can support with this infrastructure.




Ravi Teja Dama on Linkedin
Ravi Teja Dama
Big Data Software Engineer at NetApp

Pin It on Pinterest