Terraform Enterprise Node Drain Hangs and Fails to Stop Task Worker Service – HashiCorp Help Center

Problem

The tfectl node drain command hangs indefinitely and queued runs continue to be executed.

root@terraform-enterprise:~# docker compose exec terraform-enterprise tfectl node drain
Starting node drain activity. This process runs in the background. Please monitor its progress before proceeding with a complete application shutdown.

stopping service: service=sidekiq 
waiting for command to finish execution on node 650f8cfa0dac
successfully stopped service: service=sidekiq 
stopping service: service=task-worker 
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
waiting for command to finish execution on node 650f8cfa0dac
^Croot@terraform-enterprise:~#

The output of supervisorctl status in the Terraform Enterprise container shows the tfe:task-worker service is stuck in a STOPPING state.

terraform-enterprise@75b9f4daceb8:/# supervisorctl status
fluent-bit RUNNING pid 45, uptime 0:02:21
postgres STOPPED Not started
redis RUNNING pid 49, uptime 0:02:20
terraform-enterprise RUNNING pid 25, uptime 0:02:24
tfe:archivist RUNNING pid 75, uptime 0:02:19
tfe:atlas RUNNING pid 76, uptime 0:02:19
tfe:backup-restore RUNNING pid 77, uptime 0:02:19
tfe:licensing RUNNING pid 82, uptime 0:02:19
tfe:metrics RUNNING pid 87, uptime 0:02:19
tfe:nginx RUNNING pid 91, uptime 0:02:19
tfe:outbound-http-proxy RUNNING pid 98, uptime 0:02:19
tfe:sidekiq STOPPED Sep 17 04:08 PM
tfe:slug-ingress RUNNING pid 103, uptime 0:02:19
tfe:task-worker STOPPING Sep 17 04:08 PM
tfe:terraform-registry-api RUNNING pid 123, uptime 0:02:19
tfe:terraform-registry-worker RUNNING pid 124, uptime 0:02:19
tfe:terraform-state-parser RUNNING pid 130, uptime 0:02:19
tfe:tfe-health-check RUNNING pid 141, uptime 0:02:19
tfe:vault RUNNING pid 144, uptime 0:02:18

Prerequisites

Terraform Enterprise v202404-2 to v202409-1
Docker, Podman, and Replicated deployments

Cause

When a node drain command is executed, two services making up the run pipeline, sidekiq and the task-worker, are gracefully stopped to ensure in-flight jobs are completed and no new jobs are enqueued. In Terraform Enterprise release v202404-2 to v202409-1, there is a bug which prevents the task-worker process from being shutdown during the node drain.

Solution

Upgrade to v202409-2 for a permanent solution. As a temporary workaround, utilize the following command in place of the tfectl node drain command.

Docker

docker exec -u 0 <TFE_CONTAINER> bash -c 'supervisorctl stop tfe:sidekiq && TTW_PIDS=$(pgrep -f /usr/local/bin/task-worker); for pid in $TTW_PIDS; do kill -s TERM $pid && echo "tfe:task-worker: stopped ($pid)"; done'

Podman

podman exec -u 0 <TFE_CONTAINER> bash -c 'supervisorctl stop tfe:sidekiq && TTW_PIDS=$(pgrep -f /usr/local/bin/task-worker); for pid in $TTW_PIDS; do kill -s TERM $pid && echo "tfe:task-worker: stopped ($pid)"; done'

Additional Information

Terraform Enterprise admin CLI reference

Problem

Prerequisites

Cause

Solution

Additional Information

Articles in this section

Related articles