Introduction
Problem
Terraform Runs stuck in queued status after migrating from Replicated to Docker flexible deployment options.
Prerequisites
- Terraform Enterprise
- Migrating from Replicated to Docker
Cause
- A Replicated deployment of Terraform Enterprise automatically creates a Docker network called
tfe_terraform_isolation
as the default fortfe-agents
but when migrating to Docker flexible deployment options this network needs to be manually created. - In order to confirm the issue review the Terraform Enterprise application startup logs
docker logs -f terraform-enterprise-tfe-1
...
{"component":"task-worker","log":"YEAR/MONTH/DAY 16:04:12 Error response from daemon: network tfe_terraform_isolation not found"}
YEAR-MONTH-DAY 16:04:12,538 INFO exited: task-worker (exit status 1; not expected)
{"component":"supervisord","log":"YEAR/MONTH/DAY 16:04:12,538 INFO exited: task-worker (exit status 1; not expected)"} - The above log serves as verification that the Docker network
tfe_terraform_isolation
(TFE_RUN_PIPELINE_DOCKER_NETWORK) does not exist and is causing thetask-worker
process to exit.
- When a Run is created you can found errors in
sidekiq
stating is unable to connect to thetask-worker
YEAR-MONTH-DAY 16:13:46 [ERROR] msg=Failed to dispatch AgentJob agent_job_id=1400 workload_type=Plan workload_id=899 exception=Failed to open TCP connection to 127.0.0.1:8000 (Connection refused - connect(2) for "127.0.0.1" port 8000)
Solutions:
-
Solution 1 - Bring the Terraform Enterprise application to a halt using
docker compose down
, edit thedocker-compose.yml
file and removeTFE_RUN_PIPELINE_DOCKER_NETWORK
and save the changes. Deploy the Terraform Enterprise application again usingdocker compose up --detach
. -
Solution 2 - Bring the Terraform Enterprise application to a halt using
docker compose down
. Create the Docker network (eg:docker network create tfe_terraform_isolation
) and redeploy the Terraform Enterprise application.
Outcome
Terraform Runs will now proceed as expected.