Problem

After a system restart or maintenance, Terraform runs are stuck across all workspaces in remote execution mode. No new agent containers are being created for the new Terraform jobs. This issue may occur across all Terraform Enterprise Flexible Deployment Options.

Cause

When Terraform Enterprise is restarted without allowing active jobs to complete or terminate gracefully, the agent containers running those jobs may not shutdown properly. The agent container remains active but unmanaged. Therefore it becomes orphaned, which in turn causes container name conflicts in remote execution mode.

The error messages can be found in the /var/logs/terraform-enterprise/task-worker.log. It shows that the system cannot create new containers due to name conflicts with existing ones that weren’t properly terminated.

err: create container: Error response from daemon: Conflict. The container name tfe-agent-xxxx is already in use by container xxx
You have to remove (or rename) that container to be able to reuse that name.
error occurred: Init error removing container "： Error response from daemon: page not found
error executing task.

Solutions

To resolve the issue, take the following steps:

Cancelled all jobs that were not progressing from the web UI.
Terminated all orphaned agent containers that might be causing naming conflicts with command docker rm -f <container_id>

Outcome

Test by launching new plan jobs from different workspaces in remote execution mode
An new agent container should be created, you can verify by using command line docker ps
Terraform plans should be processed without any errors.

References:

Workspace settings - execution mode

Terraform runs fail with "tfe-agent is already in use by container"

Problem

Cause

Solutions

Outcome

Articles in this section

Problem

Cause

Solutions

Outcome

Articles in this section

Related articles