Problem
After a system restart or maintenance, Terraform runs in Terraform Enterprise are stuck across all workspaces using remote execution mode. New tfe-agent containers are not being created for new jobs.
This issue can occur with any Terraform Enterprise flexible deployment option.
Cause
If Terraform Enterprise is restarted without allowing active jobs to complete or terminate gracefully, the agent containers running those jobs may not shut down properly. These containers remain active but unmanaged, becoming orphaned. The orphaned containers cause naming conflicts that prevent Terraform Enterprise from launching new agent containers for remote execution mode runs.
You can find error messages confirming this issue in the task worker log file at /var/logs/terraform-enterprise/task-worker.log. The log will show that the system cannot create new containers due to a name conflict.
err: create container: Error response from daemon: Conflict. The container name "tfe-agent-xxxx" is already in use by container <container_id>. You have to remove (or rename) that container to be able to reuse that name.
Solutions
Solution 1: Clear Stuck Jobs and Remove Orphaned Containers
To resolve this issue, you must cancel the stuck jobs from the UI and then manually remove the orphaned agent containers from the host machine.
Procedure
- In the Terraform Enterprise UI, navigate to the runs queue and cancel all jobs that are not progressing.
- On the Terraform Enterprise host, identify any orphaned
tfe-agentcontainers. These are containers that are still running but do not correspond to an active run in the UI. -
Forcefully remove each orphaned container using its container ID.
$ docker rm -f <container_id>
Outcome
After removing the orphaned containers, Terraform Enterprise will be able to create new agent containers for remote runs.
To verify the solution, start a new plan in a workspace using remote execution mode. A new agent container should be created, and the Terraform plan should process without errors. You can confirm the new container is running with the docker ps command.
$ docker ps
Additional Information
- For more details on execution modes, please see the Workspace Settings documentation.