Article Contents
- Affected Components
- Problem Summary
- Scenarios
- Scenario Solutions
- Important Information
- Additional Information
Affected Components
- Kubernetes Provider
- AWS EKS Resources
- Terraform Version
0.14.x
and newer
Problem Summary
After configuring the kubernetes
provider with arguments whose values are determined by Data Sources, the following errors may be seen.
Scenario A:
During a single-apply workflow that creates a new EKS Cluster and all other resources at the same time, an error occurs while attempting to GET a token
and a host
using a Data Source. These values cannot be determined or read using a Data Source because the EKS Cluster does not exist yet.
Provider Configuration
provider "kubernetes" {
host = data.aws_eks_cluster.default.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.default.token
}
Apply Error
Error: failed to create kubernetes rest client for read of resource: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused
Scenario B:
A bad request may be returned when the provider block is using a Data Source to retrieve a token
. This will retrieve the token
even if it has expired.
Provider Configuration
provider "kubernetes" {
host = aws_eks_cluster.test_eks_cluster.endpoint
cluster_ca_certificate = base64decode(aws_eks_cluster.test_eks_cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.test_cluster_auth.token
}
Apply Error
Error: Post "http://localhost/apis/storage.k8s.io/v1/storageclasses": dial tcp 127.0.0.1:80: connect: connection refused
Scenario C:
During terraform destroy
, the EKS cluster is destroyed before other objects and resources that depend on it. When a provider (or other parts of the configuration) use data sources, those data sources are trying read from a cluster that no longer exists and the remaining objects are not destroyed.
Provider Configuration
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.cluster.token
}
}
Destroy Error
[ERROR] unknown instance ...
...
[ERROR] [Configure]: Failed to load config:=" <nil> "
...
[DEBUG] module.test-eks.kubernetes_config_map.test: apply errored, but we're indicating that via the Error pointer rather than returning it:
Delete "http://localhost/api/v1/namespaces/aws-test/configmaps/aws-test":
dial tcp 127.0.0.1:80: connect: connection refused
...
[DEBUG] module.test-eks.kubernetes_cluster_role.test[0]: Destroying... [id=real.id]
[DEBUG] module.test-eks.kubernetes_csi_driver.test: Destroying... [id=real.id2]
[DEBUG] module.test-eks.helm_release.test[0]: apply errored, but we're indicating that via the Error pointer rather than returning it:
Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable
Overview of possible solutions (if applicable)
Solution A:
In this case, the host
EKS Cluster was being created during the same terraform run. The EKS Cluster needs to be created in its own run and be recorded in a state file of its own. This is the only way to guarantee that the EKS Cluster will be ready before it can be referenced inside the Kubernetes provider block. Once the EKS Cluster is available, retry to see if the GET
is successful.
Solution B:
After adding an exec {...} sub-block inside of the provider block, the PUT
is successful. Some cloud providers have short-lived authentication tokens that can expire relatively quickly. To ensure the Kubernetes provider is receiving valid credentials, an exec-based plugin can be used to fetch a new token before initializing the provider.
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
args = ["eks", "get-token", "--cluster-name", var.cluster_name]
command = "aws"
}
}
Solution C:
In some situations, the EKS Cluster is destroyed before the remaining objects that rely on the cluster and the data sources may be trying to read from a cluster that no longer exist. This is known to occur regardless if a depends_on
of the EKS Cluster block exists inside the remaining resources.
To resolve this, destroy the EKS resources last in a multi-destroy approach. An example of a multi-destroy approach would be to add `prevent_destroy` on the EKS Cluster to protect the cluster from being destroyed too soon. From there, the EKS Cluster block can be removed from the configuration to be destroyed in the next run.
Targeted destroys would also work in this scenario, but it may be too time consuming depending on the number of resources (e.g. terraform destroy -target
).
Important Information
Single-apply workflows are not a reliable way of deploying Kubernetes infrastructure with Terraform. We strongly recommend separating the EKS Cluster from the Kubernetes resources. They should be deployed in separate runs, and recorded in separate state files. For more details, please refer to this GitHub issue.