Problem
When running Terraform Enterprise on OpenShift or Kubernetes, you may encounter a situation where no new runs start. The Terraform Enterprise user interface remains available, but runs fail to initiate for an unknown reason, and the application logs show no specific errors.
Prerequisites
- Terraform Enterprise deployed on OpenShift or Kubernetes.
Cause
When a workspace initiates a plan or apply, Terraform Enterprise creates a job in the Kubernetes or OpenShift cluster. This job starts a pod with a Terraform Enterprise Agent, which executes the run. After completion, the Kubernetes job should be marked as successful and removed.
Occasionally, the final cleanup step may not complete properly. The job is marked as successful but is not fully removed, leaving completed jobs visible in the cluster.
Check for completed jobs that have not been cleaned up.
$ kubectl get jobs -n terraform-enterprise-agents
An example output may show several completed jobs.
NAME COMPLETIONS DURATION AGE tfe-task-08613daa-f6a7-403e-8707-6aa5acfa8fb3 1/1 121m 179m tfe-task-21745fc8-25bf-4fcb-9e1a-fce81d68ba80 1/1 3h11m 6h30m tfe-task-48e306e9-40c1-468f-909c-b61eb6c8cbad 1/1 3h21m 5h35m tfe-task-c065d786-3fd9-447d-8deb-9ed482b44f3b 1/1 71m 4h16m tfe-task-f90862a2-c4f7-4e22-9082-245beaea9812 1/1 3h20m 6h2m
These completed jobs still count toward the concurrency limit defined by the TFE_CAPACITY_CONCURRENCY environment variable, which defaults to 10. If the number of uncleaned jobs reaches this limit, Terraform Enterprise will stop starting new runs.
Manually deleting the completed jobs is not a sufficient solution, as the underlying issue may persist.
## This command provides temporary relief but does not solve the root cause $ kubectl -n terraform-enterprise-agents delete job -l job-name-prefix=tfe-task --field-selector=status.successful=1
Solutions
Solution 1: Restart Terraform Enterprise Pods
The quickest solution is to restart the Terraform Enterprise pods. This action resets job information within the Terraform Enterprise database and the Kubernetes or OpenShift platform, clearing the stuck jobs and allowing new runs to proceed.
Solution 2: Upgrade Terraform Enterprise
Upgrade to Terraform Enterprise version v202504-1 or newer. This issue is resolved in this version, which includes improved job cleanup mechanisms.
Outcome
After applying one of the solutions, Terraform Enterprise will correctly manage and clean up completed jobs, allowing new runs to start as expected.
Additional Information
- For more details, refer to the
TFE_CAPACITY_CONCURRENCYconfiguration documentation.