Problem
The tfectl node drain command hangs indefinitely, and queued runs continue to be executed. The command output shows the process is stuck waiting to stop the task-worker service.
# docker compose exec terraform-enterprise tfectl node drain Starting node drain activity. This process runs in the background. Please monitor its progress before proceeding with a complete application shutdown. stopping service: service=sidekiq waiting for command to finish execution on node 650f8cfa0dac successfully stopped service: service=sidekiq stopping service: service=task-worker waiting for command to finish execution on node 650f8cfa0dac waiting for command to finish execution on node 650f8cfa0dac ##... ^C
The output of supervisorctl status within the Terraform Enterprise container shows the tfe:task-worker service is stuck in a STOPPING state.
# supervisorctl status fluent-bit RUNNING pid 45, uptime 0:02:21 postgres STOPPED Not started redis RUNNING pid 49, uptime 0:02:20 terraform-enterprise RUNNING pid 25, uptime 0:02:24 tfe:archivist RUNNING pid 75, uptime 0:02:19 tfe:atlas RUNNING pid 76, uptime 0:02:19 tfe:backup-restore RUNNING pid 77, uptime 0:02:19 tfe:licensing RUNNING pid 82, uptime 0:02:19 tfe:metrics RUNNING pid 87, uptime 0:02:19 tfe:nginx RUNNING pid 91, uptime 0:02:19 tfe:outbound-http-proxy RUNNING pid 98, uptime 0:02:19 tfe:sidekiq STOPPED Sep 17 04:08 PM tfe:slug-ingress RUNNING pid 103, uptime 0:02:19 tfe:task-worker STOPPING Sep 17 04:08 PM tfe:terraform-registry-api RUNNING pid 123, uptime 0:02:19 tfe:terraform-registry-worker RUNNING pid 124, uptime 0:02:19 tfe:terraform-state-parser RUNNING pid 130, uptime 0:02:19 tfe:tfe-health-check RUNNING pid 141, uptime 0:02:19 tfe:vault RUNNING pid 144, uptime 0:02:18
Prerequisites
- Terraform Enterprise versions
v202404-2tov202409-1 - Docker, Podman, or Replicated deployments
Cause
When a node drain command is executed, the sidekiq and task-worker services are gracefully stopped to ensure in-flight jobs complete and no new jobs are enqueued. In Terraform Enterprise releases from v202404-2 to v202409-1, a bug prevents the task-worker process from shutting down correctly during the node drain.
Solutions
This issue is resolved in Terraform Enterprise version v202409-2.
Solution 1: Upgrade Terraform Enterprise (Permanent)
The recommended solution is to upgrade your Terraform Enterprise instance to version v202409-2 or a later version. This permanently resolves the bug that prevents the task-worker service from shutting down correctly.
Solution 2: Apply a Temporary Workaround
If you cannot upgrade immediately, you can use a modified command to gracefully stop the sidekiq and task-worker services instead of using tfectl node drain.
Choose the command that matches your deployment environment.
- For Docker deployments:
Execute the following command to stop the services.
# docker exec -u 0 <TFE_CONTAINER> bash -c \
'supervisorctl stop tfe:sidekiq && \
TTW_PIDS=$(pgrep -f /usr/local/bin/task-worker); \
for pid in $TTW_PIDS; do \
kill -s TERM $pid && echo "tfe:task-worker: stopped ($pid)"; \
done'- For Podman deployments:
Execute the following command to stop the services.
# podman exec -u 0 <TFE_CONTAINER> bash -c \
'supervisorctl stop tfe:sidekiq && \
TTW_PIDS=$(pgrep -f /usr/local/bin/task-worker); \
for pid in $TTW_PIDS; do \
kill -s TERM $pid && echo "tfe:task-worker: stopped ($pid)"; \
done'