Back to Blog

Alerts and Observability in Azure Monitor with Terraform

Complete tutorial about azurerm_monitor_action_group in Terraform. Learn action groups, alert rules, Log Analytics.

Alerts and Observability in Azure Monitor with Terraform

Alerts and Observability in Azure Monitor with Terraform

Introduction

In today's cloud-centric environment, monitoring and observability are critical for maintaining the health and performance of applications and infrastructure. Azure Monitor provides a unified platform for monitoring your applications and services, enabling you to collect, analyze, and act on telemetry data from your Azure environment. A key component of Azure Monitor is action groups, which allow you to define a set of actions that are triggered when an alert is fired. This tutorial will guide you through setting up alerts and observability in Azure Monitor using Terraform, focusing on the azurerm_monitor_action_group, alert rules, and Log Analytics.

The importance of Infrastructure as Code (IaC) cannot be understated. It allows developers and DevOps teams to automate the provisioning and management of infrastructure, reducing manual errors and increasing consistency. With Terraform, you can express your Azure infrastructure as code, making it easily reproducible and manageable.

Use Cases

  • Proactive Monitoring: Automatically monitor application performance and resource utilization.
  • Incident Response: Quickly notify team members of issues and trigger automated remediation actions.
  • Centralized Logging: Save logs for analysis and compliance using Log Analytics.

Prerequisites

Before you start, ensure you have the following:

  • Terraform CLI: Installed on your local machine. You can download it from Terraform's website.
  • Azure Subscription: Create a free account if you don't have one at Azure Free Trial.
  • Azure CLI: Installed for managing Azure resources from the command line. Installation instructions can be found here.
  • Service Principal: Set up a service principal to authenticate Terraform with Azure. You can create one using the following command:
    az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/{your_subscription_id}"
    

Fundamental Concepts

Key Terminology

  • Action Group: A collection of notification preferences and automated actions that can be triggered by alerts.
  • Alert Rule: Defines the conditions under which an alert is fired, such as metric thresholds or log query results.
  • Log Analytics: A service that allows you to query and analyze log and performance data across Azure resources.

Resource Dependencies

Terraform manages dependencies automatically, but it's important to understand that resources like action groups may need to be created before they can be referenced in alert rules.

State Management

Terraform maintains a state file that tracks the resources it manages. This file is crucial for understanding resource changes and ensuring that infrastructure is consistent with the code.

Resource Syntax

The primary resource for action groups in Azure Monitor using Terraform is azurerm_monitor_action_group. Here’s an overview of its syntax:

resource "azurerm_monitor_action_group" "example" {
  name                = "example-action-group"
  resource_group_name = azurerm_resource_group.example.name
  short_name          = "exAG"

  email_receiver {
    name                    = "EmailReceiver1"
    email_address           = "example@example.com"
    use_common_alert_schema = true
  }

  sms_receiver {
    name          = "SMSReceiver1"
    country_code  = "+1"
    phone_number   = "1234567890"
  }

  webhook_receiver {
    name        = "WebhookReceiver1"
    service_uri = "https://example.com/webhook"
  }
}

Arguments Table

Argument Description
name The name of the action group.
resource_group_name The name of the resource group where the action group is created.
short_name A short name used in SMS messages.
email_receiver Email notification settings.
sms_receiver SMS notification settings.
webhook_receiver Webhook notification settings.

Practical Examples

Example 1: Basic Action Group

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "East US"
}

resource "azurerm_monitor_action_group" "basic" {
  name                = "basic-action-group"
  resource_group_name = azurerm_resource_group.example.name
  short_name          = "basicAG"

  email_receiver {
    name                    = "alert@example.com"
    email_address           = "alert@example.com"
    use_common_alert_schema = true
  }
}

Example 2: Action Group with Multiple Receivers

resource "azurerm_monitor_action_group" "multi_receivers" {
  name                = "multi-receiver-action-group"
  resource_group_name = azurerm_resource_group.example.name
  short_name          = "multiAG"

  email_receiver {
    name                    = "EmailReceiver"
    email_address           = "email@example.com"
    use_common_alert_schema = true
  }

  sms_receiver {
    name         = "SMSReceiver"
    phone_number  = "1234567890"
    country_code  = "+1"
  }

  webhook_receiver {
    name        = "WebhookReceiver"
    service_uri = "https://example.com/webhook"
  }
}

Example 3: Creating an Alert Rule for a Metric

resource "azurerm_monitor_metric_alert" "vm_alert" {
  name                = "vm-metric-alert"
  resource_group_name = azurerm_resource_group.example.name
  location            = "East US"

  criteria {
    metric_name        = "Percentage CPU"
    aggregation        = "Average"
    operator           = "GreaterThan"
    threshold          = 80
    dimensions {
      name     = "ResourceId"
      operator = "Include"
      values   = [azurerm_virtual_machine.example.id]
    }
  }

  action {
    action_group_id = azurerm_monitor_action_group.basic.id
  }
}

Example 4: Log Analytics Workspace

resource "azurerm_log_analytics_workspace" "example" {
  name                = "example-law"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  sku                 = "PerGB2018"
  retention_in_days   = 30
}

Example 5: Query Alerts in Log Analytics

resource "azurerm_monitor_log_query_alert" "log_alert" {
  name                = "log-query-alert"
  resource_group_name = azurerm_resource_group.example.name
  location            = "East US"

  criteria {
    query                = "AzureActivity | where ActivityStatus == 'Failed'"
    time_aggregation     = "Count"
    operator             = "GreaterThan"
    threshold            = 0
    evaluation_frequency = "PT5M"  # Check every 5 minutes
    window_size          = "PT5M"   # Look back 5 minutes
  }

  action {
    action_group_id = azurerm_monitor_action_group.basic.id
  }
}

Example 6: Complex Alert with Multiple Conditions

resource "azurerm_monitor_metric_alert" "complex_alert" {
  name                = "complex-metric-alert"
  resource_group_name = azurerm_resource_group.example.name
  location            = "East US"

  criteria {
    metric_name        = "Network In"
    aggregation        = "Total"
    operator           = "GreaterThan"
    threshold          = 100000
    dimensions {
      name     = "ResourceId"
      operator = "Include"
      values   = [azurerm_virtual_machine.example.id]
    }
  }

  criteria {
    metric_name        = "Network Out"
    aggregation        = "Total"
    operator           = "GreaterThan"
    threshold          = 100000
    dimensions {
      name     = "ResourceId"
      operator = "Include"
      values   = [azurerm_virtual_machine.example.id]
    }
  }

  action {
    action_group_id = azurerm_monitor_action_group.basic.id
  }
}

Example 7: Combining Action Groups with Alert Rules

resource "azurerm_monitor_action_group" "combined" {
  name                = "combined-action-group"
  resource_group_name = azurerm_resource_group.example.name
  short_name          = "combAG"

  email_receiver {
    name                    = "CombinedEmail"
    email_address           = "combined@example.com"
    use_common_alert_schema = true
  }

  sms_receiver {
    name         = "CombinedSMS"
    phone_number  = "9876543210"
    country_code  = "+1"
  }
}

resource "azurerm_monitor_metric_alert" "combined_alert" {
  name                = "combined-alert"
  resource_group_name = azurerm_resource_group.example.name
  location            = "East US"

  criteria {
    metric_name        = "Disk Read Bytes"
    aggregation        = "Average"
    operator           = "GreaterThan"
    threshold          = 1000
  }

  action {
    action_group_id = azurerm_monitor_action_group.combined.id
  }
}

Example 8: Cleanup Action Group

resource "azurerm_monitor_action_group" "cleanup" {
  count               = 2
  name                = "cleanup-action-group-${count.index}"
  resource_group_name = azurerm_resource_group.example.name
  short_name          = "cleanupAG"

  email_receiver {
    name                    = "CleanupEmailReceiver"
    email_address           = "cleanup@example.com"
    use_common_alert_schema = true
  }
}

Real-World Use Cases

  1. Application Performance Monitoring: Create action groups that notify developers via email and SMS when application performance metrics exceed thresholds.
  2. Infrastructure Health Checks: Use alert rules on virtual machines to monitor CPU and memory usage, triggering alerts when they exceed predefined limits.
  3. Security Monitoring: Set up log query alerts for unauthorized access attempts logged in Azure Activity logs, notifying the security team immediately.

Best Practices

  1. Use Modules: Organize your Terraform code into reusable modules to promote consistency and maintainability.
  2. Naming Conventions: Adopt a clear naming convention for resources and variables to make your infrastructure easier to understand.
  3. Version Control: Keep your Terraform configurations in a version control system (like Git) to track changes and collaborate effectively.
  4. State Management: Use remote state storage (e.g., Azure Storage) for collaborative environments to avoid conflicting changes.
  5. Testing: Regularly test your Terraform configurations using terraform plan before applying changes to ensure accuracy.

Common Errors

  1. Error: "Resource group not found"

    • Cause: The specified resource group does not exist.
    • Solution: Ensure the resource group is created before referencing it in other resources.
  2. Error: "The action group has too many receivers"

    • Cause: Exceeding the limit of allowed receivers in an action group.
    • Solution: Review and reduce the number of receivers.
  3. Error: "Invalid or missing required property"

    • Cause: Required attributes for a resource are not provided.
    • Solution: Check the resource documentation to ensure all required fields are filled.
  4. Error: "Resource already exists"

    • Cause: Attempting to create a resource that already exists in Azure.
    • Solution: Use a unique name for resources or import the existing resource into Terraform.

Related Resources

Resource Type Resource Link
Action Groups azurerm_monitor_action_group
Metric Alerts azurerm_monitor_metric_alert
Log Analytics Workspace azurerm_log_analytics_workspace

Complete Infrastructure Script

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "East US"
}

resource "azurerm_log_analytics_workspace" "example" {
  name                = "example-law"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  sku                 = "PerGB2018"
  retention_in_days   = 30
}

resource "azurerm_monitor_action_group" "example" {
  name                = "example-action-group"
  resource_group_name = azurerm_resource_group.example.name
  short_name          = "exAG"

  email_receiver {
    name                    = "EmailReceiver1"
    email_address           = "example@example.com"
    use_common_alert_schema = true
  }
}

resource "azurerm_monitor_metric_alert" "example" {
  name                = "example-metric-alert"
  resource_group_name = azurerm_resource_group.example.name
  location            = "East US"

  criteria {
    metric_name        = "Percentage CPU"
    aggregation        = "Average"
    operator           = "GreaterThan"
    threshold          = 80
  }

  action {
    action_group_id = azurerm_monitor_action_group.example.id
  }
}

Conclusion

In this tutorial, we explored how to set up alerts and observability in Azure Monitor using Terraform. We covered the creation of action groups, alert rules, and Log Analytics workspaces. By adopting Infrastructure as Code practices with Terraform, you can streamline your monitoring setup, ensure consistency, and enhance your DevOps workflows. As you continue to build out your Azure environment, remember to apply best practices and leverage Terraform's capabilities for efficient infrastructure management.

References

Now it's your turn to implement monitoring and observability in your Azure infrastructure using Terraform! 🚀