Back to Blog

Deploying and Operating AKS Clusters with Terraform

Complete tutorial about azurerm_kubernetes_cluster in Terraform. Learn node pools, autoscaling, CNI networks, addons.

Deploying and Operating AKS Clusters with Terraform

Deploying and Operating AKS Clusters with Terraform

Introduction

Azure Kubernetes Service (AKS) is a managed Kubernetes service that simplifies deploying, managing, and scaling containerized applications using Kubernetes. By leveraging AKS, developers can focus on their applications rather than managing the underlying infrastructure. Infrastructure as Code (IaC) is a critical practice in modern DevOps, enabling teams to provision and manage cloud resources through code, ensuring consistency, reducing human error, and enhancing collaboration.

In this tutorial, we will explore how to deploy and operate AKS clusters using Terraform. We will cover essential aspects such as configuring node pools, enabling autoscaling, implementing CNI networks, and utilizing various addons. By the end of this tutorial, you will be equipped with the knowledge and practical skills to manage AKS clusters effectively.

Prerequisites

To follow this tutorial, you will need:

  • Terraform CLI installed on your local machine.
  • An Azure subscription. If you don't have one, you can create a free account.
  • Azure CLI installed and configured.
  • A service principal with the required permissions to create Azure resources. You can create one using the command:
    az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/{subscription-id}"
    

Fundamental Concepts

Before we dive into the code, let's clarify some key terminology and concepts:

  • Node Pool: A collection of virtual machines (VMs) that run your containerized applications. Each pool can have different VM configurations.
  • Autoscaling: Automatically adjusts the number of nodes in a pool based on the current demand.
  • CNI (Container Networking Interface): A specification for configuring network interfaces in Linux containers. Azure provides its own CNI plugin for enhanced networking capabilities.
  • Addons: Additional functionalities that can be enabled on your AKS cluster, such as monitoring and logging.

Resource Dependencies: In Terraform, resources can depend on one another. For example, an AKS cluster will depend on the virtual network and subnet it resides in.

State Management: Terraform maintains a state file that keeps track of the resources it manages. It’s essential to keep this file safe and updated to avoid inconsistencies.

Resource Syntax

The following is the syntax for the azurerm_kubernetes_cluster resource:

resource "azurerm_kubernetes_cluster" "example" {
  name                = string
  resource_group_name = string
  location            = string
  dns_prefix          = string
  agent_pool_profile {
    name       = string
    count      = number
    vm_size    = string
    os_type    = string
    os_disk_size_gb = number
  }
  identity {
    type = string
  }

  # Other optional configurations
}
Argument Description
name The name of the AKS cluster.
resource_group_name The name of the resource group.
location The Azure region for the resource.
dns_prefix DNS prefix for the AKS cluster.
agent_pool_profile Configuration for the agent pool (VMs).
identity Managed identity settings.

Practical Examples

1. Basic AKS Cluster Creation

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "aks_rg" {
  name     = "myAKSResourceGroup"
  location = "East US"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myaks"

  agent_pool_profile {
    name       = "default"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
  }

  identity {
    type = "SystemAssigned"
  }
}

2. AKS with Multiple Node Pools

resource "azurerm_kubernetes_cluster_node_pool" "linux_pool" {
  name                  = "linuxpool"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  vm_size               = "Standard_DS2_v2"
  node_count            = 2
  os_type               = "Linux"
}

resource "azurerm_kubernetes_cluster_node_pool" "windows_pool" {
  name                  = "winpool"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  vm_size               = "Standard_DS2_v2"
  node_count            = 2
  os_type               = "Windows"
}

3. Enable Autoscaling

resource "azurerm_kubernetes_cluster" "aks" {
  # ... previous configurations

  agent_pool_profile {
    name       = "default"
    count      = 3
    min_count  = 3  # Minimum nodes
    max_count  = 10 # Maximum nodes
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
  }
}

4. Configuring CNI Networking

resource "azurerm_kubernetes_cluster" "aks" {
  # ... previous configurations

  network_profile {
    network_plugin = "azure"
    dns_service_ip = "10.0.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr = "10.0.0.0/16"
  }
}

5. Adding Monitoring with Azure Monitor

resource "azurerm_monitor_diagnostic_setting" "aks_monitor" {
  name               = "aks-monitoring"
  target_resource_id = azurerm_kubernetes_cluster.aks.id

  log {
    category = "kube-apiserver"
    enabled  = true
  }

  log {
    category = "kube-controller-manager"
    enabled  = true
  }

  metric {
    category = "AllMetrics"
    enabled  = true
  }

  workspace_id = azurerm_log_analytics_workspace.log_workspace.id
}

6. Integrating Azure Active Directory

resource "azurerm_kubernetes_cluster" "aks" {
  # ... previous configurations

  azure_active_directory {
    admin_group_object_ids = ["<your-ad-group-id>"]
    managed = true
  }
}

7. Using Addons (e.g., Helm)

resource "azurerm_kubernetes_cluster_extension" "helm" {
  name                  = "helm"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  extension_type        = "Helm"
}

8. Full AKS Configuration with Variables

variable "rg_name" {
  type = string
  default = "myAKSResourceGroup"
}

variable "cluster_name" {
  type = string
  default = "myAKSCluster"
}

resource "azurerm_resource_group" "aks_rg" {
  name     = var.rg_name
  location = "East US"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = var.cluster_name
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myaks"

  agent_pool_profile {
    name       = "default"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
  }

  identity {
    type = "SystemAssigned"
  }
}

Real-World Use Cases

Scenario 1: Multi-Tenant Application

Deploy multiple AKS clusters for different teams within an organization, each configured with its own network policies and resource quotas. This design ensures isolation and compliance with security policies.

Scenario 2: Autoscaling Web Application

Implement an AKS cluster with autoscaling configured for a web application that experiences variable traffic. This setup allows the application to scale up during peak hours and scale down during off-peak hours, optimizing resource usage.

Scenario 3: CI/CD Pipeline Integration

Integrate AKS with a CI/CD pipeline using tools like Azure DevOps or GitHub Actions. Automate the deployment of applications to the AKS cluster as part of the release process, ensuring consistent and repeatable deployments.

Best Practices

  1. State Management: Always use remote state storage like Azure Storage to manage Terraform states, enabling collaboration and preventing state conflicts.
  2. Security: Use managed identities for AKS to enhance security by avoiding the use of secrets in your code.
  3. Modules: Organize your Terraform code into reusable modules, making it easier to manage and scale your infrastructure.
  4. Naming Conventions: Use consistent naming conventions for resources to improve clarity and manageability.
  5. Monitoring: Implement monitoring and alerting for your AKS clusters to proactively detect and resolve issues.

Common Errors

  1. Error: "The requested resource 'xxx' was not found"

    • Cause: The specified resource group or resource name is incorrect.
    • Solution: Verify the names and ensure they match the existing Azure resources.
  2. Error: "Insufficient privileges to perform this action"

    • Cause: The service principal lacks permissions.
    • Solution: Ensure the service principal has the necessary roles assigned.
  3. Error: "Resource already exists"

    • Cause: Trying to create a resource that already exists.
    • Solution: Check if the resource is already present, or use terraform import to manage it.
  4. Error: "Authentication failed"

    • Cause: Incorrect service principal credentials.
    • Solution: Verify the client ID and secret used for authentication.

Related Resources

Resource Name URL
Terraform Azure Provider Terraform Registry
Azure Kubernetes Service Documentation Microsoft Docs
Terraform Best Practices Terraform Best Practices

Complete Infrastructure Script

provider "azurerm" {
  features {}
}

variable "rg_name" {
  type = string
  default = "myAKSResourceGroup"
}

variable "cluster_name" {
  type = string
  default = "myAKSCluster"
}

resource "azurerm_resource_group" "aks_rg" {
  name     = var.rg_name
  location = "East US"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = var.cluster_name
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myaks"

  agent_pool_profile {
    name       = "default"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
  }

  identity {
    type = "SystemAssigned"
  }
}

Conclusion

In this tutorial, we explored how to deploy and manage AKS clusters using Terraform, covering essential configurations like node pools, autoscaling, CNI networking, and addons. By leveraging Terraform's IaC capabilities, you can efficiently manage your Kubernetes infrastructure in Azure.

Next Steps

  • Experiment with different configurations to fit your workload needs.
  • Explore advanced features such as network policies and custom metrics for autoscaling.

References