Back to Blog

Creating AKS Clusters with Terraform

Complete tutorial about azurerm_kubernetes_cluster in Terraform. Learn AKS deployment, node pools, scaling, networking.

Creating AKS Clusters with Terraform

Creating AKS Clusters with Terraform

Introduction

Azure Kubernetes Service (AKS) is a managed container orchestration service that simplifies the deployment, management, and scaling of containerized applications using Kubernetes. By leveraging AKS, organizations can focus on building applications rather than managing the underlying infrastructure. Infrastructure as Code (IaC) with Terraform enhances this experience by allowing developers to define, manage, and provision infrastructure using code. This approach ensures consistency, repeatability, and easier collaboration across teams.

Common use cases for AKS include deploying microservices applications, running batch processing jobs, and managing machine learning workloads. With Terraform, you can provision AKS clusters and related resources such as Azure Container Registry (ACR), virtual networks, and storage accounts with ease. This tutorial will guide you through the process of creating an AKS cluster using Terraform, including configuring node pools, scaling, and networking.

Prerequisites

Before you begin, ensure you have the following:

  1. Terraform CLI installed. You can download it from Terraform's official website.
  2. An Azure subscription. If you don’t have one, you can create a free account.
  3. Azure CLI installed. Follow the instructions here.
  4. A service principal for authenticating Terraform with Azure. You can create one with the following command:
    az ad sp create-for-rbac --name "<your-service-principal-name>" --role Contributor --scopes /subscriptions/<your-subscription-id>
    

Fundamental Concepts

Key Terminology

  • AKS (Azure Kubernetes Service): A managed Kubernetes service by Azure.
  • Node Pool: A group of nodes within an AKS cluster that can have different VM sizes and configurations.
  • Pod: The smallest deployable unit in Kubernetes, which can contain one or more containers.
  • Container: A lightweight, standalone, and executable package that includes everything needed to run a piece of software.
  • HCL (HashiCorp Configuration Language): The language used to define infrastructure in Terraform.

Resource Dependencies

Terraform manages dependencies automatically, ensuring that resources are created in the correct order. For example, an AKS cluster cannot be created without the associated virtual network being provisioned first.

State Management

Terraform maintains a state file to track the resources it manages. This state file allows Terraform to understand the current state of the infrastructure and to plan changes accurately.

Resource Syntax

The primary resource for creating an AKS cluster in Terraform is azurerm_kubernetes_cluster. Below is the syntax and an argument table.

resource "azurerm_kubernetes_cluster" "example" {
  name                = "<cluster-name>"
  location            = "<location>"
  resource_group_name = "<resource-group-name>"
  dns_prefix          = "<dns-prefix>"
  
  agent_pool_profile {
    name       = "<node-pool-name>"
    count      = <number-of-nodes>
    vm_size    = "<vm-size>"
    
    os_type    = "Linux" // or "Windows"
  }

  identity {
    type = "SystemAssigned"
  }

  sku {
    name     = "Basic"
    tier     = "Free"
  }

  tags = {
    environment = "dev"
  }
}
Argument Description
name The name of the AKS cluster.
location The Azure region where the cluster will be created.
resource_group_name The name of the resource group where the cluster will reside.
dns_prefix The DNS prefix for the cluster.
agent_pool_profile Configuration for the node pool.
identity Specifies the identity type for the cluster (e.g., SystemAssigned).
sku Pricing tier for the cluster.
tags Tags for resource management.

Practical Examples

Example 1: Basic AKS Cluster

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "aks_rg" {
  name     = "myResourceGroup"
  location = "East US"
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myakscluster"

  agent_pool_profile {
    name       = "agentpool"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
  }

  identity {
    type = "SystemAssigned"
  }

  sku {
    name = "Basic"
    tier = "Free"
  }
}

Example 2: AKS with a Windows Node Pool

resource "azurerm_kubernetes_cluster_node_pool" "windows_pool" {
  kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
  name                  = "windowspool"
  vm_size              = "Standard_DS2_v2"
  count                = 2
  os_type              = "Windows"
}

Example 3: Enable Monitoring on AKS

resource "azurerm_monitor_diagnostics" "aks_diagnostics" {
  name               = "aksDiagnostics"
  resource_id       = azurerm_kubernetes_cluster.aks.id
  storage_account_id = "<storage-account-id>"
  
  logs {
    category = "kube-apiserver"
    enabled  = true
    retention_policy {
      days    = 30
      enabled = true
    }
  }

  metrics {
    category = "AllMetrics"
    enabled  = true
  }
}

Example 4: Configuring Network Settings

resource "azurerm_virtual_network" "aks_vnet" {
  name                = "aksVNet"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
}

resource "azurerm_subnet" "aks_subnet" {
  name                 = "aksSubnet"
  resource_group_name  = azurerm_resource_group.aks_rg.name
  virtual_network_name = azurerm_virtual_network.aks_vnet.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_kubernetes_cluster" "aks_with_network" {
  name                = "myAKSClusterWithNetwork"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myaksnetwork"

  agent_pool_profile {
    name       = "agentpool"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"

    vnet_subnet_id = azurerm_subnet.aks_subnet.id
  }
}

Example 5: Scaling AKS Nodes

resource "azurerm_kubernetes_cluster" "aks_scaling" {
  name                = "myAKSClusterScaling"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myaksscaling"

  agent_pool_profile {
    name       = "agentpool"
    count      = 5 // Scale up nodes
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
  }
}

Example 6: Using Azure Container Registry (ACR)

resource "azurerm_container_registry" "acr" {
  name                = "myacr"
  resource_group_name = azurerm_resource_group.aks_rg.name
  location            = azurerm_resource_group.aks_rg.location
  sku                 = "Basic"
  admin_enabled       = true
}

Example 7: Assigning Roles to ACR

resource "azurerm_role_assignment" "acr_pull" {
  principal_id   = azurerm_kubernetes_cluster.aks.identity[0].principal_id
  role_definition_name = "AcrPull"
  scope          = azurerm_container_registry.acr.id
}

Example 8: Complete Configuration

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "aks_rg" {
  name     = "myResourceGroup"
  location = "East US"
}

resource "azurerm_virtual_network" "aks_vnet" {
  name                = "aksVNet"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
}

resource "azurerm_subnet" "aks_subnet" {
  name                 = "aksSubnet"
  resource_group_name  = azurerm_resource_group.aks_rg.name
  virtual_network_name = azurerm_virtual_network.aks_vnet.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_container_registry" "acr" {
  name                = "myacr"
  resource_group_name = azurerm_resource_group.aks_rg.name
  location            = azurerm_resource_group.aks_rg.location
  sku                 = "Basic"
  admin_enabled       = true
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myakscluster"

  agent_pool_profile {
    name       = "agentpool"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
    vnet_subnet_id = azurerm_subnet.aks_subnet.id
  }

  identity {
    type = "SystemAssigned"
  }

  sku {
    name = "Basic"
    tier = "Free"
  }
}

resource "azurerm_role_assignment" "acr_pull" {
  principal_id   = azurerm_kubernetes_cluster.aks.identity[0].principal_id
  role_definition_name = "AcrPull"
  scope          = azurerm_container_registry.acr.id
}

Real-World Use Cases

  1. Microservices Deployment: Deploying a microservices-based application where each service runs in its own pod, making it easier to manage, scale, and update independently.

  2. Machine Learning Workflows: Utilizing AKS for training machine learning models, leveraging GPU-enabled nodes, and deploying models as REST APIs for inference.

  3. CI/CD Integration: Automating the deployment of containerized applications using CI/CD tools like Azure DevOps or GitHub Actions, integrating with ACR for image storage.

Best Practices

  1. Use Modules: Organize your Terraform code into reusable modules for better maintainability and clarity.

  2. Manage State Files: Use remote state storage (e.g., Azure Blob Storage) to avoid local state file conflicts and share state with your team.

  3. Implement Tagging: Tag your resources for better organization and billing insights.

  4. Secure Access: Use Azure Active Directory for Kubernetes RBAC to manage access to your clusters effectively.

  5. Monitor and Log: Enable monitoring and logging features to keep track of performance and troubleshoot issues.

Common Errors

  1. Error: Resource '...' not found
    Cause: The resource specified does not exist.
    Solution: Ensure that all dependencies and resources are properly defined and created.

  2. Error: Insufficient permissions
    Cause: The service principal does not have adequate permissions.
    Solution: Make sure that the service principal has the necessary roles assigned.

  3. Error: Invalid subnet id
    Cause: The subnet ID specified does not exist or is incorrect.
    Solution: Verify the subnet ID in your configuration.

  4. Error: The resource group could not be found
    Cause: The specified resource group does not exist.
    Solution: Check if the resource group has been created before referencing it.

Related Resources

Resource Description
azurerm_kubernetes_cluster Terraform documentation for AKS resource.
Azure Kubernetes Service Official Azure documentation for AKS.
Terraform Documentation Official documentation for Terraform.

Complete Infrastructure Script

Here’s a full working Terraform configuration that sets up an AKS cluster with a virtual network and a container registry:

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "aks_rg" {
  name     = "myResourceGroup"
  location = "East US"
}

resource "azurerm_virtual_network" "aks_vnet" {
  name                = "aksVNet"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
}

resource "azurerm_subnet" "aks_subnet" {
  name                 = "aksSubnet"
  resource_group_name  = azurerm_resource_group.aks_rg.name
  virtual_network_name = azurerm_virtual_network.aks_vnet.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_container_registry" "acr" {
  name                = "myacr"
  resource_group_name = azurerm_resource_group.aks_rg.name
  location            = azurerm_resource_group.aks_rg.location
  sku                 = "Basic"
  admin_enabled       = true
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "myAKSCluster"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "myakscluster"

  agent_pool_profile {
    name       = "agentpool"
    count      = 3
    vm_size    = "Standard_DS2_v2"
    os_type    = "Linux"
    vnet_subnet_id = azurerm_subnet.aks_subnet.id
  }

  identity {
    type = "SystemAssigned"
  }

  sku {
    name = "Basic"
    tier = "Free"
  }
}

resource "azurerm_role_assignment" "acr_pull" {
  principal_id   = azurerm_kubernetes_cluster.aks.identity[0].principal_id
  role_definition_name = "AcrPull"
  scope          = azurerm_container_registry.acr.id
}

Conclusion

In this tutorial, we explored how to create Azure Kubernetes Service (AKS) clusters using Terraform, covering everything from basic configurations to more advanced networking and scaling options. By adopting IaC practices with Terraform, you can streamline your deployment processes, enhance collaboration, and maintain consistency across your environments.

For your next steps, consider exploring more complex configurations, integrating CI/CD pipelines, or implementing monitoring and logging for your AKS clusters.

References