Problem
Runs in Terraform Enterprise remain queued indefinitely and the workers fail with the following TLS verification error while registering themselves with Terraform Enterprise.
/var/log/terraform-enterprise/task-worker.log
{"@level":"info","@message":"2024-06-14T17:11:22.024Z [ERROR] agent: Failed starting core plugin: error=\"failed configuring core: agent registration failed: POST https://<TFE_HOSTNAME>/api/agent/register giving up after 16 attempt(s): Post \\\"https://<TFE_HOSTNAME>/api/agent/register\\\": tls: failed to verify certificate: x509: certificate signed by unknown authority\"","@module":"task-worker.executor.task-output","@timestamp":"2024-06-14T17:11:22.025982Z","id":"d0ad6cba-e6af-47f0-bd37-611cabe0517e","name":"agent-run","stream":"stdout"}
Prerequisites
- Terraform Enterprise v202404-2 to v202406-1
- Kubernetes deployment
- Non-publicly trusted TLS certificate
- A CA bundle (
tls.caCertData
Helm value) containing required certificates has been provided
Cause
When a worker is started to perform a remote run in Terraform Enterprise, it registers itself with Terraform Enterprise by making an internal API request to the Terraform Enterprise hostname, in which it will try to verify the certificate. In Kubernetes deployments, if a CA bundle is provided, Terraform Enterprise creates a ConfigMap with the contents of the certificates file, itself a concatenation of the default OS certificates of its container and the contents of the CA bundle, and mounts it into the agent container at /etc/ssl/certs/ca-certificates.crt
so that the worker can verify the certificates using the provided certificates in the bundle. However, a regression introduced in Terraform Enterprise v202404-2 prevents this functionality from working, resulting in TLS verification errors when the worker makes requests to its internal API or to external systems which are not publicly trusted.
Note that this bug will also affect HTTPS requests to any external systems, as Terraform uses the certificate store of the worker container to verify certificates from external systems. For example, if the Terraform Enterprise certificate is publicly trusted, but that of another external system to which requests are made during Terraform runs is not, the worker will proceed through startup and begin to execute the Terraform run and Terraform itself will fail with TLS verification errors.
To confirm this bug is the cause and not a misconfiguration with the CA bundle setting, run the following command, which makes a request with curl to the target URL (that which is triggering the TLS verification errors during runs).
kubectl exec -ti -n <TFE_NAMESPACE> <TFE_POD> -- curl -vI <TARGET_URL>
If curl is able to verify the certificate of the external server using the trusted certificates of the Terraform Enterprise container yet runs continue to fail, it is likely this bug is the cause.
Solution
To permanently resolve this issue, upgrade to Terraform Enterprise v202408-1. If an upgrade is not immediately possible, use one of the following temporary solutions until an upgrade is possible.
Custom Agent Worker Pod Template
This workaround requires manually creating a ConfigMap with the contents of the generated certificates file in the agents namespace and configuring Terraform Enterprise with a agentWorkerPodTemplate
which references the ConfigMap in a volume.
- Copy the certificates file from the Terraform Enterprise container.
kubectl cp -n <TFE_NAMESPACE> <TFE_POD>:var/run/terraform-enterprise/etc/ssl/certs/ca-certificates.crt ca-certificates.crt
- Create a ConfigMap in the agents namespace (
<HELM_RELEASE_NAME>-agents
) with the contents of the certificates file.
kubectl create configmap ca-certs -n <AGENTS_NAMESPACE> --from-literal ca-certificates.crt=$(cat ca-certificates.crt)
- Add a
agentWorkerPodTemplate
in the Helm values which mounts this ConfigMap in the worker container and create a new Helm release.
agentWorkerPodTemplate:
spec:
containers:
- volumeMounts:
- name: ca-certs
mountPath: /etc/ssl/certs
readOnly: true
volumes:
- name: ca-certs
configMap:
name: ca-certs
items:
- key: ca-certificates.crt
path: ca-certificates.crt
Customer Worker Image
Build a custom worker image which includes the CA bundle (see this article for an example), push the image to a container registry accessible from the cluster, and configure Terraform Enterprise to use it as the run pipeline image by setting the TFE_RUN_PIPELINE_IMAGE setting. Should the container registry require authentication, an ImagePullSecret will need to be created in the agents namespace and referenced by the TFE_RUN_PIPELINE_KUBERNETES_IMAGE_PULL_SECRET_NAME setting.
Rollback
Rollback to Terraform Enterprise v202402-1. For guidance on rolling back Terraform Enterprise, please see the following guide.
Additional Information
If you continue to experience issues, please contact HashiCorp Support.