Getting Started with PromQL: A Comprehensive Guide
Introduction
Prometheus, an open-source monitoring and alerting toolkit, relies on PromQL (Prometheus Query Language) to interact with and analyze time series data. PromQL is a powerful language that allows users to query and manipulate metrics collected by Prometheus. In this guide, we will explore the key concepts of PromQL and provide examples to help you get started.
Basics of PromQL
Metric Selectors
At the core of PromQL are metric selectors, allowing you to retrieve specific metrics. The basic syntax is as follows:
<metric_name>{<label_name>=<label_value>, ...}
Let’s consider an example:
cpu_usage{job="webapp", instance="server-1"}
This query selects the CPU usage metric for the “webapp” job on the “server-1” instance.
Instant Vector Selectors
Instant vector selectors retrieve the value of a metric at a specific point in time. For example:
cpu_usage{job="webapp"} // Returns the current CPU usage for the "webapp" job.
Range Vector Selectors
Range vector selectors retrieve a range of values over a specified time interval. To calculate the rate of a metric over the last 5 minutes:
rate(http_requests_total{job="api"}[5m])
This query calculates the per-second rate of HTTP requests for the “api” job over the last 5 minutes.
Aggregation and Grouping
PromQL supports various aggregation functions to summarize and analyze data. For instance, to calculate the total CPU usage per instance for the “app” job:
sum(cpu_usage{job="app"}) by (instance)
This groups the data by the “instance” label and calculates the sum of CPU usage for each instance.
Common Functions and Operators
Rate Function
The rate()
function calculates the per-second rate of increase over a specified time range. Example:
rate(http_requests_total{job="frontend"}[1h])
This query computes the rate of HTTP requests per second over the last hour for the “frontend” job.
Increase Function
The increase()
function returns the total increase in a metric over a specified time range. Example:
increase(http_errors_total{job="api"}[30m])
This query returns the total increase in HTTP errors for the “api” job over the last 30 minutes.
Binary Operators
PromQL supports various binary operators like +
, -
, *
, and /
for arithmetic operations between vectors.
http_requests_total{job="app"} + http_requests_total{job="api"}
This adds the total HTTP requests for the “app” and “api” jobs.
Advanced Topics
Subqueries
PromQL allows the use of subqueries to perform more complex analyses. Example:
avg_over_time(http_request_duration_seconds{job="webapp"}[5m:1m])
This calculates the average request duration over a 5-minute range with a 1-minute step.
Alerting Rules
Prometheus uses PromQL-based alerting rules to define conditions for triggering alerts. Example:
ALERT HighErrorRate
IF rate(http_errors_total{job="api"}[5m]) > 0.5
FOR 3m
This rule triggers an alert if the error rate for the “api” job exceeds 0.5 errors per second over a 5-minute window for 3 minutes.
Conclusion
PromQL is a versatile language for querying and analyzing time series data collected by Prometheus. This guide provides a foundation for understanding the basics, aggregation, functions, operators, and advanced features of PromQL. Experimenting with these examples in a Prometheus environment will deepen your understanding and proficiency in using PromQL for effective monitoring and alerting.