Problem
Running tfe-admin node-drain
does not stop runs from being dequeued and executed in Terraform Enterprise. Depending on the Terraform Enterprise release, the output of the command will resemble one of the two below.
- v202306-1 to v202309-1
root@ip-10-0-173-94:~# tfe-admin node-drain
Running node-drain (localhost)
2024-10-17T18:33:26.222Z [INFO] draining node: node=localhost
2024-10-17T18:33:26.223Z [INFO] stopping sidekiq
2024-10-17T18:33:27.581Z [INFO] successfully stopped sidekiq: output="tfe-sidekiq
"
2024-10-17T18:33:27.582Z [INFO] stopping build_manager and build_worker
2024-10-17T18:33:31.985Z [INFO] successfully stopped build_manager and build_worker: output="tfe-build-manager
tfe-build-worker
"
- v202310-1 to v202312-1
root@ip-10-0-173-94:~# tfe-admin node-drain
Running node-drain (localhost)
2023-12-11T19:18:12.142Z [INFO] draining node: node=localhost
2023-12-11T19:18:12.143Z [INFO] stopping sidekiq
2023-12-11T19:18:13.044Z [INFO] successfully stopped sidekiq: output="tfe-sidekiq
"
2023-12-11T19:18:13.044Z [INFO] stopping build_manager and build_worker
2023-12-11T19:18:13.084Z [ERROR] error stopping build_manager and build_worker: error="exit status 1"
2023-12-11T19:18:13.084Z [ERROR] Error response from daemon: No such container: tfe-build-manager
Error response from daemon: No such container: tfe-build-worker
: error="exit status 1"
error draining node: error stopping build_manager and build_worker: exit status 1
Prerequisites
- Terraform Enterprise v202306-1 to v202312-1 (Replicated deployment)
-
consolidated_services = 0
(v202306-1 to v202308-1) orconsolidated_services_enabled = 0
(v202309-1 to v202312-1)
Cause
In v202306-1, support for the legacy pipeline was removed. This change unintentionally included removing a configuration option which was used to configure the tfe-admin
node-drain
command to stop the tfe-task-worker container, rather than those containers used in the legacy pipeline mode. As a result, the tfe-admin node-drain
command stops the containers used in the legacy pipeline instead of that used in the agent pipeline, the tfe-task-worker. In v202310-1, the tfe-build-worker and tfe-build-manager containers are no longer a part of Terraform Enterprise, which is why command fails with an error in those releases. This is a bug and can be worked around using the steps below.
Solutions
To resolve this issue, enable consolidated services
mode.
- v202306-1 to v202308-1
replicatedctl app-config set consolidated_services --value 1
- v202309-1 to v202312-1
replicatedctl app-config set consolidated_services_enabled --value 1
Restart the application to apply the configuration change:
replicatedctl app apply-config
If enabling consolidated services is not immediately feasible, manually stop the task-worker container by running the following command directly after tfe-admin node-drain
.
docker stop -t 86400 tfe-task-worker
Additional Information
- Terraform Enterprise Consolidated Services Architecture
- Terraform Enterprise Admin CLI Commands - node-drain