Back to Blog

Monitoring Applications with Prometheus and Grafana

Complete DevOps tutorial on Prometheus. Learn PromQL, exporters, targets, alerts (Alertmanager), Grafana dashboards.

Monitoring Applications with Prometheus and Grafana

Monitoring Applications with Prometheus and Grafana

Introduction

Monitoring applications is a crucial aspect of DevOps and Site Reliability Engineering (SRE) that ensures systems are performing optimally and issues are detected promptly. Prometheus is a powerful open-source monitoring and alerting toolkit that collects and stores metrics as time series data, while Grafana is a popular open-source platform for visualizing metrics. Together, they provide a robust solution for monitoring applications, enabling teams to gain insights into performance, reliability, and health.

Using Prometheus, developers and operators can define PromQL (Prometheus Query Language) queries to extract valuable insights from their metrics. Additionally, Prometheus supports various exporters that allow it to scrape metrics from different services, making it versatile for diverse environments. By integrating with Alertmanager, teams can set up alerts based on defined thresholds, ensuring proactive management of system issues. Finally, Grafana provides a user-friendly interface for creating dashboards that visualize metrics, facilitating data-driven decisions.

This tutorial will guide you through the process of setting up Prometheus and Grafana, configuring exporters, creating alerts, and visualizing metrics effectively.


Prerequisites

Before we begin, ensure you have the following prerequisites:

  • Software:

    • Docker (for running Prometheus and Grafana)
    • kubectl (if deploying to Kubernetes)
  • Cloud Subscriptions:

    • Optional: A cloud provider account (e.g., AWS, GCP, Azure) if deploying in the cloud.
  • Permissions:

    • Administrative access to the servers or cloud resources you intend to monitor.
  • Tools:

    • Basic knowledge of YAML and JSON formats.
    • CLI tools for interacting with Docker and Kubernetes.

Core Concepts

Definitions

  • Prometheus: An open-source monitoring and alerting toolkit designed for reliability and scalability.
  • Grafana: A visualization tool that integrates with various data sources, including Prometheus, to create dynamic dashboards.
  • PromQL: The query language used by Prometheus for querying time series data.
  • Exporters: Applications that expose metrics from third-party services to Prometheus.
  • Targets: Endpoints that Prometheus scrapes metrics from.
  • Alertmanager: A tool that handles alerts sent by Prometheus, allowing for routing, silencing, and deduplication.

Architecture

Prometheus operates on a pull-based model, where it scrapes metrics from configured targets at specified intervals. The architecture consists of:

  1. Prometheus Server: Responsible for scraping and storing metrics.
  2. Exporters: Act as intermediaries to expose metrics from various services.
  3. Alertmanager: Manages alerts generated by Prometheus.
  4. Grafana: Visualizes the data collected by Prometheus.

When to Use

Prometheus is best suited for environments with dynamic workloads, where microservices or containerized architectures are prevalent. It is ideal for:

  • Monitoring Kubernetes clusters
  • Tracking application performance in real-time
  • Managing infrastructure as code

Limitations

  • Prometheus may not be the best fit for long-term storage of metrics.
  • Relatively complex setup for advanced features such as federation.

Pricing Notes

Prometheus is free and open-source. However, running it in a cloud environment may incur costs based on storage, compute, and networking.


Syntax/Configuration

Prometheus Configuration

Prometheus is configured using a YAML file. A basic configuration might look like this:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'nodejs-app'
    static_configs:
      - targets: ['localhost:3000']

Docker Commands

To run Prometheus using Docker, use the following command:

docker run -d \
  -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Grafana Configuration

To run Grafana using Docker, use the following command:

docker run -d \
  -p 3000:3000 \
  grafana/grafana

Practical Examples

Example 1: Setting Up Prometheus

  1. Create a prometheus.yml file:
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'my_app'
    static_configs:
      - targets: ['localhost:8080']
  1. Run Prometheus:
docker run -d \
  -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Example 2: Scraping Node Exporter Metrics

  1. Start the Node Exporter:
docker run -d \
  -p 9100:9100 \
  prom/node-exporter
  1. Update prometheus.yml to scrape Node Exporter:
scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

Example 3: Using PromQL to Query Metrics

In the Prometheus web UI, navigate to the "Graph" tab and run the following PromQL query to see CPU usage:

rate(node_cpu_seconds_total[5m])

Example 4: Configuring Alerts

In prometheus.yml, configure an alerting rule:

rule_files:
  - "alert.rules"

# alert.rules
groups:
  - name: example
    rules:
      - alert: HighCpuUsage
        expr: rate(node_cpu_seconds_total[5m]) > 0.7
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"

Example 5: Running Alertmanager

  1. Run Alertmanager:
docker run -d \
  -p 9093:9093 \
  prom/alertmanager
  1. Configure Alertmanager in prometheus.yml:
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

Example 6: Creating Grafana Dashboard

  • Access Grafana at http://localhost:3000.
  • Add Prometheus as a data source.
  • Create a new dashboard and add a panel with the following query:
sum(rate(http_requests_total[5m])) by (status)

Example 7: Using Grafana Variables

To create a more dynamic dashboard, use Grafana variables. For example, create a variable for the application name and use it in your queries:

sum(rate(http_requests_total{app="$app_name"}[5m])) by (status)

Example 8: Advanced Alerting

Configure more advanced alerting rules in alert.rules for multiple conditions:

groups:
  - name: advanced-alerts
    rules:
      - alert: HighMemoryUsage
        expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) < 0.15
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Memory usage critical"

Real-World Scenarios

Scenario 1: Monitoring a Microservices Architecture

In a microservices architecture, each service can expose its metrics. Using Prometheus and Grafana, you can monitor the health of each service and visualize dependencies between them, enabling better incident management.

Scenario 2: Kubernetes Cluster Monitoring

When deployed in Kubernetes, Prometheus can scrape metrics from pods and nodes effortlessly, allowing for a comprehensive view of cluster health. With Grafana, you can create dashboards that visualize Kubernetes resource usage and application performance.

Scenario 3: Alerting on Performance Metrics

Using Prometheus and Alertmanager, you can set up alerts based on critical performance metrics, such as response time or error rates. This allows teams to respond quickly to performance degradation before it impacts end-users.


Best Practices

  1. Secure Your Setup: Use TLS/SSL for communication between Prometheus and its targets. Ensure Grafana is secured with proper authentication.
  2. Optimize Scrape Intervals: Adjust scrape intervals based on the criticality of your services to minimize load and maximize responsiveness.
  3. Use Labels Wisely: Organize metrics with labels to facilitate easier querying and aggregation.
  4. Implement Retention Policies: Configure data retention policies to manage storage effectively.
  5. Automate Deployments: Use tools like Helm for deploying Prometheus and Grafana in Kubernetes environments to ensure consistency across deployments.

Common Errors

  1. Error: "Error scraping target"

    • Cause: The target is not reachable.
    • Fix: Check network connectivity and ensure the target service is running.
  2. Error: "no data points found"

    • Cause: Prometheus cannot find metrics at the specified endpoint.
    • Fix: Verify that the exporter is running and properly configured.
  3. Error: "alert is firing" when it shouldn't

    • Cause: The alerting expression may be too sensitive.
    • Fix: Adjust the alerting threshold or conditions.
  4. Error: "Invalid syntax" in PromQL

    • Cause: Incorrectly formatted query.
    • Fix: Review the query syntax against PromQL documentation.

Related Services/Tools

Tool Description Integration Capability
Prometheus Open-source monitoring and alerting toolkit. Excellent with various exporters
Grafana Visualization tool to create dashboards from metrics. Integrates well with Prometheus
Alertmanager Manages alerts sent by Prometheus. Directly integrated with Prometheus
ELK Stack A suite for logging and monitoring. Can be used alongside Prometheus
Zabbix An alternative monitoring tool with built-in alerting. Standalone, not directly integrated

Automation Script

Here’s a simple Bash script to automate the setup of Prometheus and Grafana using Docker:

#!/bin/bash

# Create Prometheus configuration
cat <<EOF > prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'nodejs-app'
    static_configs:
      - targets: ['localhost:3000']
EOF

# Run Prometheus
docker run -d \
  -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

# Run Grafana
docker run -d \
  -p 3000:3000 \
  grafana/grafana

echo "Prometheus and Grafana are now running!"

Conclusion

In this tutorial, we explored the essential aspects of monitoring applications using Prometheus and Grafana. We covered how to set up Prometheus, configure exporters, create alerts, and visualize metrics using Grafana. Monitoring is a critical part of maintaining reliability and performance in modern applications, and leveraging these tools can greatly enhance observability.

For further exploration, consider diving into the official documentation for Prometheus and Grafana. You can set up more complex monitoring scenarios and improve your alerting strategies.

Next Steps


References