Problem
When configuring the Terraform kubernetes provider, you may encounter connection errors during terraform apply or terraform destroy operations. These errors typically occur when the provider's configuration arguments, such as host and token, are supplied by data sources that depend on resources being actively managed in the same configuration.
Common error messages include:
Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused
Error: Post "http://localhost/apis/storage.k8s.io/v1/storageclasses": dial tcp 127.0.0.1:80: connect: connection refused
[ERROR] [Configure]: Failed to load config:=" <nil> "... Delete "http://localhost/api/v1/namespaces/aws-test/configmaps/aws-test": dial tcp 127.0.0.1:80: connect: connection refused
Cause
These errors are typically caused by dependency and timing issues within a single Terraform run:
-
Creation Race Condition: The
kubernetesprovider attempts to authenticate to a cluster (e.g., AWS EKS) that has not yet been created or is not yet fully available because it is being provisioned in the sameapplyoperation. - Expired Authentication Tokens: Data sources may retrieve a short-lived authentication token that expires before the provider uses it.
-
Destruction Order: During
terraform destroy, the EKS cluster may be destroyed before other Kubernetes resources that depend on it, causing the provider to fail when attempting to connect to a non-existent cluster to manage those dependent resources.
Best Practice Recommendation
Single-apply workflows are not a reliable method for deploying Kubernetes infrastructure with Terraform. HashiCorp strongly recommends separating the EKS Cluster configuration from the Kubernetes resources that depend on it. They should be deployed in separate Terraform configurations, with separate runs and state files. For more details, please refer to this GitHub issue comment.
Solutions
Here are three solutions corresponding to the common causes of this issue.
Solution 1: Separate EKS Cluster Creation
This solution addresses the race condition where the provider tries to connect to a cluster that is being created in the same run.
Procedure:
Ensure the EKS cluster is created in a separate Terraform configuration and run. Its state file should be independent of the configuration that deploys resources onto the cluster. This guarantees the EKS cluster is fully provisioned and available before the kubernetes provider attempts to authenticate.
Once the EKS cluster is available and its outputs are accessible (e.g., via a terraform_remote_state data source), you can apply the configuration for your Kubernetes resources.
Solution 2: Use an Exec Plugin for Token Authentication
This solution addresses issues with expired authentication tokens from cloud providers with short-lived credentials.
Procedure:
Use an exec block within the kubernetes provider configuration. This allows the provider to execute a command to fetch a new, valid token dynamically during the run, ensuring the credentials are not stale.
Here is an example configuration for AWS EKS.
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
args = ["eks", "get-token", "--cluster-name", var.cluster_name]
command = "aws"
}
}For more information, refer to the provider documentation on exec plugins.
Solution 3: Manage Resource Destruction Order
This solution addresses failures during terraform destroy where the EKS cluster is removed before dependent resources.
Procedure:
To ensure dependent resources are destroyed first, use a multi-stage destroy process. You can use the prevent_destroy lifecycle meta-argument on the EKS cluster resource to protect it from being destroyed prematurely.
-
Add the
prevent_destroylifecycle block to your EKS cluster resource.resource "aws_eks_cluster" "example" { # ... other configuration ... lifecycle { prevent_destroy = true } } - Run
terraform destroy. This will destroy all resources except the EKS cluster. - Remove the EKS cluster resource block (and the
lifecycleblock) from your configuration. - Run
terraform applyagain to destroy the EKS cluster.
Alternatively, you can use targeted destroys (terraform destroy -target=...), but this can be time-consuming for configurations with many resources.