tfe-admin node-drain Command Does Not Stop Queued Runs on Terraform Enterprise v202306-1 to v202312-1 (Replicated) – HashiCorp Help Center

Problem

Running tfe-admin node-drain does not stop runs from being dequeued and executed in Terraform Enterprise. Depending on the Terraform Enterprise release, the output of the command will resemble one of the two below.

v202306-1 to v202309-1

root@ip-10-0-173-94:~# tfe-admin node-drain
Running node-drain (localhost)
2024-10-17T18:33:26.222Z [INFO] draining node: node=localhost
2024-10-17T18:33:26.223Z [INFO] stopping sidekiq
2024-10-17T18:33:27.581Z [INFO] successfully stopped sidekiq: output="tfe-sidekiq
"
2024-10-17T18:33:27.582Z [INFO] stopping build_manager and build_worker
2024-10-17T18:33:31.985Z [INFO] successfully stopped build_manager and build_worker: output="tfe-build-manager
tfe-build-worker
"

v202310-1 to v202312-1

root@ip-10-0-173-94:~# tfe-admin node-drain
Running node-drain (localhost)
2023-12-11T19:18:12.142Z [INFO] draining node: node=localhost
2023-12-11T19:18:12.143Z [INFO] stopping sidekiq
2023-12-11T19:18:13.044Z [INFO] successfully stopped sidekiq: output="tfe-sidekiq
"
2023-12-11T19:18:13.044Z [INFO] stopping build_manager and build_worker
2023-12-11T19:18:13.084Z [ERROR] error stopping build_manager and build_worker: error="exit status 1"
2023-12-11T19:18:13.084Z [ERROR] Error response from daemon: No such container: tfe-build-manager
Error response from daemon: No such container: tfe-build-worker
: error="exit status 1"
error draining node: error stopping build_manager and build_worker: exit status 1

Prerequisites

Terraform Enterprise v202306-1 to v202312-1 (Replicated deployment)
consolidated_services = 0 (v202306-1 to v202308-1) or consolidated_services_enabled = 0 (v202309-1 to v202312-1)

Cause

In v202306-1, support for the legacy pipeline was removed. This change unintentionally included removing a configuration option which was used to configure the tfe-admin node-drain command to stop the tfe-task-worker container, rather than those containers used in the legacy pipeline mode. As a result, the tfe-admin node-drain command stops the containers used in the legacy pipeline instead of that used in the agent pipeline, the tfe-task-worker. In v202310-1, the tfe-build-worker and tfe-build-manager containers are no longer a part of Terraform Enterprise, which is why command fails with an error in those releases. This is a bug and can be worked around using the steps below.

Solutions

To resolve this issue, enable consolidated services mode.

v202306-1 to v202308-1

replicatedctl app-config set consolidated_services --value 1

v202309-1 to v202312-1

replicatedctl app-config set consolidated_services_enabled --value 1

Restart the application to apply the configuration change:

replicatedctl app apply-config

If enabling consolidated services is not immediately feasible, manually stop the task-worker container by running the following command directly after tfe-admin node-drain.

docker stop -t 86400 tfe-task-worker

Additional Information

Problem

Prerequisites

Cause

Solutions

Additional Information

Articles in this section

Related articles