Alerts and Observability in Azure Monitor with Terraform
Introduction
In today's cloud-centric environment, monitoring and observability are critical for maintaining the health and performance of applications and infrastructure. Azure Monitor provides a unified platform for monitoring your applications and services, enabling you to collect, analyze, and act on telemetry data from your Azure environment. A key component of Azure Monitor is action groups, which allow you to define a set of actions that are triggered when an alert is fired. This tutorial will guide you through setting up alerts and observability in Azure Monitor using Terraform, focusing on the azurerm_monitor_action_group, alert rules, and Log Analytics.
The importance of Infrastructure as Code (IaC) cannot be understated. It allows developers and DevOps teams to automate the provisioning and management of infrastructure, reducing manual errors and increasing consistency. With Terraform, you can express your Azure infrastructure as code, making it easily reproducible and manageable.
Use Cases
- Proactive Monitoring: Automatically monitor application performance and resource utilization.
- Incident Response: Quickly notify team members of issues and trigger automated remediation actions.
- Centralized Logging: Save logs for analysis and compliance using Log Analytics.
Prerequisites
Before you start, ensure you have the following:
- Terraform CLI: Installed on your local machine. You can download it from Terraform's website.
- Azure Subscription: Create a free account if you don't have one at Azure Free Trial.
- Azure CLI: Installed for managing Azure resources from the command line. Installation instructions can be found here.
- Service Principal: Set up a service principal to authenticate Terraform with Azure. You can create one using the following command:
az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/{your_subscription_id}"
Fundamental Concepts
Key Terminology
- Action Group: A collection of notification preferences and automated actions that can be triggered by alerts.
- Alert Rule: Defines the conditions under which an alert is fired, such as metric thresholds or log query results.
- Log Analytics: A service that allows you to query and analyze log and performance data across Azure resources.
Resource Dependencies
Terraform manages dependencies automatically, but it's important to understand that resources like action groups may need to be created before they can be referenced in alert rules.
State Management
Terraform maintains a state file that tracks the resources it manages. This file is crucial for understanding resource changes and ensuring that infrastructure is consistent with the code.
Resource Syntax
The primary resource for action groups in Azure Monitor using Terraform is azurerm_monitor_action_group. Here’s an overview of its syntax:
resource "azurerm_monitor_action_group" "example" {
name = "example-action-group"
resource_group_name = azurerm_resource_group.example.name
short_name = "exAG"
email_receiver {
name = "EmailReceiver1"
email_address = "example@example.com"
use_common_alert_schema = true
}
sms_receiver {
name = "SMSReceiver1"
country_code = "+1"
phone_number = "1234567890"
}
webhook_receiver {
name = "WebhookReceiver1"
service_uri = "https://example.com/webhook"
}
}
Arguments Table
| Argument | Description |
|---|---|
name |
The name of the action group. |
resource_group_name |
The name of the resource group where the action group is created. |
short_name |
A short name used in SMS messages. |
email_receiver |
Email notification settings. |
sms_receiver |
SMS notification settings. |
webhook_receiver |
Webhook notification settings. |
Practical Examples
Example 1: Basic Action Group
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "East US"
}
resource "azurerm_monitor_action_group" "basic" {
name = "basic-action-group"
resource_group_name = azurerm_resource_group.example.name
short_name = "basicAG"
email_receiver {
name = "alert@example.com"
email_address = "alert@example.com"
use_common_alert_schema = true
}
}
Example 2: Action Group with Multiple Receivers
resource "azurerm_monitor_action_group" "multi_receivers" {
name = "multi-receiver-action-group"
resource_group_name = azurerm_resource_group.example.name
short_name = "multiAG"
email_receiver {
name = "EmailReceiver"
email_address = "email@example.com"
use_common_alert_schema = true
}
sms_receiver {
name = "SMSReceiver"
phone_number = "1234567890"
country_code = "+1"
}
webhook_receiver {
name = "WebhookReceiver"
service_uri = "https://example.com/webhook"
}
}
Example 3: Creating an Alert Rule for a Metric
resource "azurerm_monitor_metric_alert" "vm_alert" {
name = "vm-metric-alert"
resource_group_name = azurerm_resource_group.example.name
location = "East US"
criteria {
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
dimensions {
name = "ResourceId"
operator = "Include"
values = [azurerm_virtual_machine.example.id]
}
}
action {
action_group_id = azurerm_monitor_action_group.basic.id
}
}
Example 4: Log Analytics Workspace
resource "azurerm_log_analytics_workspace" "example" {
name = "example-law"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
sku = "PerGB2018"
retention_in_days = 30
}
Example 5: Query Alerts in Log Analytics
resource "azurerm_monitor_log_query_alert" "log_alert" {
name = "log-query-alert"
resource_group_name = azurerm_resource_group.example.name
location = "East US"
criteria {
query = "AzureActivity | where ActivityStatus == 'Failed'"
time_aggregation = "Count"
operator = "GreaterThan"
threshold = 0
evaluation_frequency = "PT5M" # Check every 5 minutes
window_size = "PT5M" # Look back 5 minutes
}
action {
action_group_id = azurerm_monitor_action_group.basic.id
}
}
Example 6: Complex Alert with Multiple Conditions
resource "azurerm_monitor_metric_alert" "complex_alert" {
name = "complex-metric-alert"
resource_group_name = azurerm_resource_group.example.name
location = "East US"
criteria {
metric_name = "Network In"
aggregation = "Total"
operator = "GreaterThan"
threshold = 100000
dimensions {
name = "ResourceId"
operator = "Include"
values = [azurerm_virtual_machine.example.id]
}
}
criteria {
metric_name = "Network Out"
aggregation = "Total"
operator = "GreaterThan"
threshold = 100000
dimensions {
name = "ResourceId"
operator = "Include"
values = [azurerm_virtual_machine.example.id]
}
}
action {
action_group_id = azurerm_monitor_action_group.basic.id
}
}
Example 7: Combining Action Groups with Alert Rules
resource "azurerm_monitor_action_group" "combined" {
name = "combined-action-group"
resource_group_name = azurerm_resource_group.example.name
short_name = "combAG"
email_receiver {
name = "CombinedEmail"
email_address = "combined@example.com"
use_common_alert_schema = true
}
sms_receiver {
name = "CombinedSMS"
phone_number = "9876543210"
country_code = "+1"
}
}
resource "azurerm_monitor_metric_alert" "combined_alert" {
name = "combined-alert"
resource_group_name = azurerm_resource_group.example.name
location = "East US"
criteria {
metric_name = "Disk Read Bytes"
aggregation = "Average"
operator = "GreaterThan"
threshold = 1000
}
action {
action_group_id = azurerm_monitor_action_group.combined.id
}
}
Example 8: Cleanup Action Group
resource "azurerm_monitor_action_group" "cleanup" {
count = 2
name = "cleanup-action-group-${count.index}"
resource_group_name = azurerm_resource_group.example.name
short_name = "cleanupAG"
email_receiver {
name = "CleanupEmailReceiver"
email_address = "cleanup@example.com"
use_common_alert_schema = true
}
}
Real-World Use Cases
- Application Performance Monitoring: Create action groups that notify developers via email and SMS when application performance metrics exceed thresholds.
- Infrastructure Health Checks: Use alert rules on virtual machines to monitor CPU and memory usage, triggering alerts when they exceed predefined limits.
- Security Monitoring: Set up log query alerts for unauthorized access attempts logged in Azure Activity logs, notifying the security team immediately.
Best Practices
- Use Modules: Organize your Terraform code into reusable modules to promote consistency and maintainability.
- Naming Conventions: Adopt a clear naming convention for resources and variables to make your infrastructure easier to understand.
- Version Control: Keep your Terraform configurations in a version control system (like Git) to track changes and collaborate effectively.
- State Management: Use remote state storage (e.g., Azure Storage) for collaborative environments to avoid conflicting changes.
- Testing: Regularly test your Terraform configurations using
terraform planbefore applying changes to ensure accuracy.
Common Errors
Error: "Resource group not found"
- Cause: The specified resource group does not exist.
- Solution: Ensure the resource group is created before referencing it in other resources.
Error: "The action group has too many receivers"
- Cause: Exceeding the limit of allowed receivers in an action group.
- Solution: Review and reduce the number of receivers.
Error: "Invalid or missing required property"
- Cause: Required attributes for a resource are not provided.
- Solution: Check the resource documentation to ensure all required fields are filled.
Error: "Resource already exists"
- Cause: Attempting to create a resource that already exists in Azure.
- Solution: Use a unique name for resources or import the existing resource into Terraform.
Related Resources
| Resource Type | Resource Link |
|---|---|
| Action Groups | azurerm_monitor_action_group |
| Metric Alerts | azurerm_monitor_metric_alert |
| Log Analytics Workspace | azurerm_log_analytics_workspace |
Complete Infrastructure Script
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "East US"
}
resource "azurerm_log_analytics_workspace" "example" {
name = "example-law"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
sku = "PerGB2018"
retention_in_days = 30
}
resource "azurerm_monitor_action_group" "example" {
name = "example-action-group"
resource_group_name = azurerm_resource_group.example.name
short_name = "exAG"
email_receiver {
name = "EmailReceiver1"
email_address = "example@example.com"
use_common_alert_schema = true
}
}
resource "azurerm_monitor_metric_alert" "example" {
name = "example-metric-alert"
resource_group_name = azurerm_resource_group.example.name
location = "East US"
criteria {
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
action {
action_group_id = azurerm_monitor_action_group.example.id
}
}
Conclusion
In this tutorial, we explored how to set up alerts and observability in Azure Monitor using Terraform. We covered the creation of action groups, alert rules, and Log Analytics workspaces. By adopting Infrastructure as Code practices with Terraform, you can streamline your monitoring setup, ensure consistency, and enhance your DevOps workflows. As you continue to build out your Azure environment, remember to apply best practices and leverage Terraform's capabilities for efficient infrastructure management.
References
- Terraform Registry - azurerm_monitor_action_group
- Azure Monitor Documentation
- Terraform Documentation
Now it's your turn to implement monitoring and observability in your Azure infrastructure using Terraform! 🚀