First Step in Monitoring with Prometheus
Prometheus is a popular monitoring tool used in the world of DevOps as it is designed to work well in Cloud native and Containerized environments.
In this Post, I want to give you a complete beginner guide to what Prometheus is, what it does and its features and benefits. If you are beginner, then this one is for you. I will showing the methods of installing and using Prometheus in next article.
Before going in Prometheus, let’s take a step back and see how our IT Infrastructure has evolved over last decades.
Initially, IT infra used to be static. There used to be a dedicated server hardware with its core functions. IT environment was transformed slowly with the introduction of some game changing technologies.
The first of them was Virtualization that provided faster provisioning and utilization of resources.
Next was Containers that really brought resource efficiency and immutable architecture. Introduction of Container Orchestration resulted in faster deployment and having HA and scalability.
And all these technology improvements was given by Cloud. It took the IT infra on a new level.
The main takeaway from this was that as the time progressed, the IT infra has shifted from a static setup to a more dynamic environment. We have moved from the traditional monolithic approach to a much more enhanced microservices approach.
This is where Prometheus has the edge. It is born in age of dynamic infra and is specially created to monitor the dynamic workloads. Prometheus is specially designed to automatically discover and monitor the dynamic infra.
What is Prometheus
Prometheus is an Open Source Monitoring and Alerting tool with time series database. Prometheus collects data from applications and systems and allows you to visualize the data and trigger alerts as well.
1) Metric Collection
- Language: Primarily written in Go. Some components are written in java, Python and Ruby.
- License: Prometheus uses the open-source Apache 2.0 license.
- Launch: Prometheus was first created by SoundCloud back in 2012 when they came to the decision that their current monitoring platforms were inadequate for their use case. The first release of Prometheus v1 was ready in 2016.
Prometheus Server Component
- Time Series Database: It stores the scraped metrics in a Time Series Database.
- Data Retrieval Worker: It is responsible for pulling/scraping metrics from external sources and pushing them in Time Series Database.
- Web Server: Provides a simple web interface for configuration and querying of the data stored.
Prometheus Full architecture
The above diagram shows-
- Prometheus Server: It is considered as Core component. This component scrapes the metrics and stores them in a Time Series Database(TSDB).
- Exporters: They are installed on Targets and are responsible for exporting the metrics to an endpoint from which Prometheus server can scrape.
- Push Gateway: For monitoring batch applications and short lived jobs. In this mechanism, apps and services push the metrics into Prometheus rather than standard Pull mechanism.
- Service Discovery: Prometheus can perform automatic discovery of its hosts, targets and services rather than manually specifying them.
- Alert Manager: This feature allows us to handle alerts generated by Prometheus. We can define threshold values and trigger alerts based on the alert rules. Alerts can be pushed to various applications.
- PromQL: Prometheus Query Language used to read and query the metrics scraped by Prometheus.
I plan to write a future a blog on each one of the Prometheus components in a more detailed way.
How Prometheus is different
Before we go in more details about Prometheus, I want to highlight the two main features of Prometheus that stands out from other Monitoring systems-
Time Series Database: Prometheus stores the metrics in a time series database. In a time series database, each incoming data is recorded as a new entry, and the data typically arrives in a time order. This not only stores the current value, but also helps us to record the changes happened to a metric over a period of time.
This practice of recording each and every change to the system as a new, different row is what makes time-series data so powerful. It allows us to measure change: analyze how something changed in the past, monitor how something is changing in the present, predict how it may change in the future. We can analyze the metric data stored by Prometheus over time and can take some informed decisions.
Pull Mechanism: Most monitoring systems such as Cloudwatch, Application Insights which uses a Push based mechanism where clients(application/server) is responsible for sending its metric to centralized monitoring server.
Prometheus uses a Pull based mechanism, where instead of clients pushing the metrics, Prometheus goes and scrapes the metrics on a simple HTTP Endpoints via exporters.
- Prometheus offers a dimensional data model where time series data is defined by metric name and key/value dimensions.
- A flexible Query Language
- Targets discovered via service discovery or static configuration;
- Multiple support modes for graphs and dashboards.
- Alerts are well handled with Alert Managers.
- Easy integration with other systems such as Grafana.
How does Prometheus Works
- Suppose you want to monitor a VM with Prometheus. So now, this VM is a target for Prometheus. The first step is to install the exporter on the target machine.
- Prometheus has a wide range of exporter for various target types. To monitor VM, Prometheus uses node exporter. We will have to install a node exporter if the target is VM.
- Once the exporter is installed, it will gather all the metrics from the target and expose them on a specific port to an endpoint.
- After that, Prometheus server will scrape/pull the metrics from the endpoint exposed by the exporter.
- As we can see in above diagram, Prometheus server is pulling the metrics via HTTP from various targets over the endpoints exposed by the exporters.
- Exporters are a software that can read metrics from Prometheus targets.
- Exporters are installed on the targets to gather the metrics. A target is a machine that Prometheus is monitoring. The exporter will then gather the metrics and expose them to a specific HTTP Port. Prometheus will scrape the exported metrics on the target.
- Prometheus has a number of exporters such as node exporter(to monitor VM nodes), Github exporter, mysql exporter, Jira exporters, docker exporter.
Metrics and Labels in Prometheus
A metric in Prometheus essentially represents any parameter that we are monitoring. It refers to a general feature of a system or application that is being measured.
Every metric in Prometheus has a metric name. A Metric name merely refers to feature being measured. Example- cpu_load, memory_usage, disk_free.
Importance of Metric Labels
If you query a metric cpu_load, then Prometheus will show the cpu_load of all the VM that are getting monitored. Now if you want to query the cpu_load of a particular VM, then you will have to use metrics with labels. With Labels, we can uniquely identify which source metrics we want to read.
Generally a metric in Prometheus has multiple labels. Labels are used to filter the metric based on various parameters. These labels are separated by commas inside a curly brace.
In the below image, node_cpu is a metric that contains multiple labels such as cpu, instance, job, mode.
Refers to different types in which exporters represent the metric data they provide.
- Counter- A single number that only increases or be reset to zero. They represent cumulative values. Examples: Number of records processed, Number of application restarts, Number of errors.
- Gauge- A single number that can increase or decrease over time. Current HTTP requests, CPU usage, memory usage.
- Histogram- Counts the number of observations/events that fall into a set of configurable buckets, each with its own separate time series. A histogram will use labels to differentiate between buckets.
- Summary- Similar to histogram but exposes metrics in form of quantiles instead of buckets. While buckets divide values based on specific boundaries, quantiles divides values based on percentiles into which they fall.
Allows you to access and work with your metric data in Prometheus. Prometheus uses PromQL (Prometheus Query language) to write queries and retrieve useful information from metric data collected by Prometheus.
Different ways of Querying-
1. Expression Browser
2. Prometheus HTTP API
3. Visualization tools like Grafana
Handling Alerts in Prometheus
Prometheus handles alerts in two phases. The first one is creating alert rules in Prometheus server which will capture alerts and send it to Alert manager.
The second phase is how Alert manager handles those alerts. Alert manager provides features like silencing, inhibition, aggregation of alerts. And most importantly, alert manager is responsible for sending the alerts as notification to various channels like email, slack, pager duty.
I will talk about installation and other aspects of Prometheus in another article.
- Prometheus is a very powerful tool that can effectively monitor the infra and applications.
- Prometheus is designed for monitoring cloud-based, containerized world.
- Prometheus uses Pull Model to scrape the metrics from the target.
- Prometheus has various exporters that can be installed on target machines and scraped for the data.
- Prometheus has metrics and labels to read various parameters from target.
- Prometheus stores the metrics in a Time Series Database.
- Prometheus has a Query language PromQL to query and read metrics.
- Alert Managers are used to handle and notify the alerts generated by Prometheus.
- Prometheus has service discovery feature that can automatically discover new targets.
- Prometheus can be integrated with Grafana to get a nice dashboard view of monitoring data.