Deploying Apache Atlas on Kubernetes

Install Apache Atlas on AKS with Helm Charts

Manjit Singh
6 min readOct 19, 2020
Apache Atlas on Kubernetes Helm

In the previous Post, we went through the installation of Apache Atlas on Ubuntu VM. Now, continuing our Atlas journey, we will look at further simplifying the process by installing Apache Atlas in a Containerized environment. We will deploy Apache Atlas as a running Microservices on a Kubernetes Cluster with Helm utility.

In this article, we will focus on deployment of Apache Atlas on Azure Kubernetes Service(AKS) with Helm Charts. This article assumes that you already have the AKS created and Helm installed on it. The Helm Chart would also work on other Managed clusters like EKS or GKE.

What is Apache Atlas?

Apache Atlas is an open source Metadata Management and Data Governance tool that can store Metadata , track Lineage information, derive the relationships and help in Data Governance.

I have written a detailed Post on fundamentals of Apache Atlas and its components.

Components of Apache Atlas

Now, before we dive into Installation part, let us see the components and dependencies that are required for installing Atlas.

Apache Atlas requires 3 applications to be installed-

  1. Solr- This is used to index the Atlas Data so that we can search the data in Atlas UI.
  2. Cassandra- It acts as a backend and stores the data ingested by Apache Atlas, which is nothing but the Metadata.
  3. Zookeeper — It is a highly available, central location for cluster management.

Structure of HELM Chart

Apache Atlas Helm Chart Root Directory Components

The Atlas Helm Chart consist of a directory that has all the typical Helm Chart components such as templates directory to store all the yaml configurations, values.yaml file to store the Chart specific values. Apart from that, the folder also has a Dockerfile, in case you want to build the docker image on your own.

The important thing to notice here is a directory called “charts”. This charts directory contains all the dependent Helm chart. In this case, it will contain the Helm Charts for Solr, Zookeeper and Cassandra. Let’s take a look at the folder structure of Apache Atlas Helm Chart.

Apache Atlas Helm Chart Tree Structure

As we can see in above structure, charts directory contains the dependent Helm Charts that is Cassandra, Solr and Zookeeper . This means that when we deploy the Atlas Chart, Helm will firstly deploy these dependent charts, and only when these three charts are deployed successfully, then the Atlas Chart will be deployed.

Deploying the Helm Chart

This section will show the steps to deploy the Helm Chart. The first step is to get the Helm Chart by cloning the Git repository.

git clone https://github.com/manjitsin/atlas-helm-chartcd atlas-helm-chartdocker build -t apache-atlas-image . 

It is not mandatory to build the Docker image. You can directly run the Helm Chart. If you want to build the image of your own, then you can run the docker build command mentioned above. Once the image is built, you can push the image to a container registry. You can also use the local image to deploy the Helm Chart.

Now we have the apache atlas image, we can run the Helm Chart to deploy the application. Before deploying the Helm Chart, make sure to update the image name and tag in values.yaml file.

To run the Helm Chart, run the below command.

helm install --name atlas atlas-helm-chart
Installing the Helm Chart

Once the run the helm command, we can check the status of all the resources deployed by the Helm.

kubectl get all
Output of kubectl get all command shows all the resources deployed by Helm Chart

As shown in above image, Helm has deployed Pods for Apache Atlas, Cassandra, Solr and Zookeeper. It has also deployed an apache-atlas service of type Load Balancer. We can see the IP of the Load Balancer.

To access the Apache Atlas application, get the Load Balancer IP and open it in browser with port 21000.

Apache Atlas Login page on Load Balancer IP and port 21000
Apache Atlas Home Page
A test entity created of type hdfs_path

The deployment of Pods will take few minutes. Once you see the Pods up and running, you can navigate to the Load balancer IP with port 21000 and you will find the Apache Atlas page.

Secondly, if you do not have persisted volumes created then you will have to disable the persistence parameter by setting the “enabled” parameter as false in values.yaml file of Cassandra and zookeeper chart.

Atlas Integration with Messaging Service

The Apache Atlas can be integrated with a messaging interface that is based on Kafka. This can be useful for communicating metadata objects to Atlas, and also to consume metadata change events from Atlas.

The messaging interface can also help if one wishes to use a more loosely coupled integration with Atlas that could allow for better scalability, reliability etc.

This Helm chart has a feature to use Azure Event Hubs as a messaging service with Apache Atlas. You will have to create an Event Hub in Azure. Post that, you will have to specify the URL and connection string of the Event hub created under kafka parameter in values.yaml file of Apache Atlas.

Once you give the url and connection string and run the Helm chart, the Apache Atlas can accept the events and messages from the event hub.

What if we do not have Managed Kubernetes

You can also deploy the atlas Helm Chart if you do not have a managed Kubernetes cluster like AKS or EKS. In that case, you will have to make some minor modifications in the Helm Chart.

You will have to modify the service type from Load Balancer to a Node Port. And in case you do not have Persistent Volume then you will have to disable the persistence from Zookeeper and Cassandra Helm Chart.

Why to run on Kubernetes

Deploying applications on Kubernetes platform has some key benefits:

  • Resiliency: Provides self-healing, high-availability, and scalability as core features.
  • Efficiency: Requires less effort to deploy and manage the workloads.
  • Portability: Ability to migrate workloads between different environments, infrastructure with minimal effort.
  • Productivity: Deployment happens with a single command. This allows us to faster deploy, test and remove the application.

Using Helm, the Kubernetes package manager, simplifies the installation process considerably and installs each components of the workloads simultaneously.

This structure of the Helm Chart is based on another Helm Chart. Here is the link to that Helm Chart. You can use this Helm Chart as well to deploy Apache Atlas.

Summary

This is the third article in the Apache Atlas Series. In the first article, I talked about the various concepts of Apache Atlas. If you are a starter in Atlas, then do refer to this article- Discover your Metadata-Apache Atlas

In the second article, I talked about installing Apache Atlas on a VM. Usually, the installation process is complex due to which it’s difficult to test the features of Atlas. In the article, I have tried to simplify the installation process by directly specifying all commands and its dependencies. Please refer to the article- Apache Atlas-Installation Guide.

--

--