Problem
Runs in Terraform Enterprise remain queued indefinitely and the workers fail with the following TLS verification error while registering themselves with Terraform Enterprise.
From /var/log/terraform-enterprise/task-worker.log:
{
"@level": "info",
"@message": "[ERROR] agent: Failed starting core plugin: error=\"failed configuring core: agent registration failed: POST https://<TFE_HOSTNAME>/api/agent/register giving up after 16 attempt(s): Post \\\"https://<TFE_HOSTNAME>/api/agent/register\\\": tls: failed to verify certificate: x509: certificate signed by unknown authority\"",
"@module": "task-worker.executor.task-output",
"id": "d0ad6cba-e6af-47f0-bd37-611cabe0517e",
"name": "agent-run",
"stream": "stdout"
}Prerequisites
- Terraform Enterprise v202404-2 to v202406-1
- Kubernetes deployment
- Non-publicly trusted TLS certificate
- A CA bundle (
tls.caCertDataHelm value) containing required certificates has been provided
Cause
When a worker starts to perform a remote run in Terraform Enterprise, it registers itself with Terraform Enterprise by making an internal API request to the Terraform Enterprise hostname, during which it attempts to verify the certificate. In Kubernetes deployments, if you provide a CA bundle, Terraform Enterprise creates a ConfigMap with the contents of the certificates file. This file is a concatenation of the default OS certificates of its container and the contents of the CA bundle. Terraform Enterprise mounts this file into the agent container at /etc/ssl/certs/ca-certificates.crt so that the worker can verify certificates using the provided bundle.
A regression introduced in Terraform Enterprise v202404-2 prevents this functionality from working, resulting in TLS verification errors when the worker makes requests to its internal API or to external systems that are not publicly trusted.
This bug also affects HTTPS requests to any external systems, as Terraform uses the certificate store of the worker container to verify certificates from external systems. For example, if the Terraform Enterprise certificate is publicly trusted, but another external system's certificate is not, the worker will start, execute the Terraform run, and then Terraform itself will fail with TLS verification errors.
To confirm this bug is the cause and not a misconfiguration with the CA bundle setting, run the following command. This command makes a request with curl to the target URL that is triggering the TLS verification errors during runs.
$ kubectl exec -ti -n <TFE_NAMESPACE> <TFE_POD> -- curl -vI <TARGET_URL>
If curl is able to verify the certificate of the external server using the trusted certificates of the Terraform Enterprise container, yet runs continue to fail, this bug is likely the cause.
Solutions
To permanently resolve this issue, upgrade to Terraform Enterprise v202408-1 or later. If an upgrade is not immediately possible, use one of the following temporary solutions.
Solution 1: Use a Custom Agent Worker Pod Template
This workaround requires manually creating a ConfigMap with the contents of the generated certificates file in the agents namespace and configuring Terraform Enterprise with an agentWorkerPodTemplate that references the ConfigMap in a volume.
-
Copy the certificates file from the Terraform Enterprise container.
$ kubectl cp -n <TFE_NAMESPACE> <TFE_POD>:var/run/terraform-enterprise/etc/ssl/certs/ca-certificates.crt ca-certificates.crt
-
Create a ConfigMap in the agents namespace (
<HELM_RELEASE_NAME>-agents) with the contents of the certificates file.$ kubectl create configmap ca-certs -n <AGENTS_NAMESPACE> --from-literal ca-certificates.crt=$(cat ca-certificates.crt)
-
Add an
agentWorkerPodTemplatein the Helm values that mounts this ConfigMap in the worker container and create a new Helm release.agentWorkerPodTemplate: spec: containers: - volumeMounts: - name: ca-certs mountPath: /etc/ssl/certs readOnly: true volumes: - name: ca-certs configMap: name: ca-certs items: - key: ca-certificates.crt path: ca-certificates.crt
Solution 2: Use a Custom Worker Image
Build a custom worker image that includes the CA bundle. For an example, see this guide on creating a custom TFC agent image. Push the image to a container registry accessible from the cluster, and configure Terraform Enterprise to use it as the run pipeline image by setting the TFE_RUN_PIPELINE_IMAGE setting. If the container registry requires authentication, you must create an ImagePullSecret in the agents namespace and reference it with the TFE_RUN_PIPELINE_KUBERNETES_IMAGE_PULL_SECRET_NAME setting.
Solution 3: Roll Back to a Previous Version
Roll back to Terraform Enterprise v202402-1. For guidance on rolling back Terraform Enterprise, refer to the Terraform Enterprise backup and restore guide.