Introduction
When a run starts in Terraform Enterprise, it initiates a process to spawn a worker container from an image in which can be either default worker image or custom worker image. This worker container needs resources to start and execute a series of commands.
During the start of the worker container some checks are done that could fail the worker to start properly. The reason for this failure doesn't return in the Terraform Enterprise Console. Instead the output of a run will show a generic error message.
This article will go over one of the possible issues that can cause this message to appear.
Problem
When running Terraform plan/apply from Terraform Enterprise it returns with the following generic error
Setup failed: Failed setting up Terraform binary: Failed pushing binary to environment: exit status 125
Prerequisites
- Terraform Enterprise up to
v202306-1
- Legacy run pipeline
Cause
Note: For installations using Terraform Enterprise v202205-01 through v202308-1, all container names now follow the naming convention of "tfe-<service>"
Example:
ptfe_atlas > tfe-atlas ptfe_archivist > tfe-archivistNote - older version can have "ptfe" prefix
More information can be found in the release notes with a change here.
When the plan or apply phase of a run is performed, a disposable Docker container is started to perform the action. This container will want to start using the configured number of cpu defined with the setting of capacity_cpus. If this number is higher than the actual number of cpu on the system the docker container will fail to start.
Monitor the logging of the ptfe_build_worker
sudo docker logs -f ptfe_build_worker 2>&1 | grep CPU
When you do a run that fails and you see the following error you can continue to look at the solution.
{"@level":"error","@message":"(Docker: 14c3f9f7-eed6-d583-ccca-8c696036febf)
Failed to start container: exit status 125\nOutput:\ndocker:
Error response from daemon: Range of CPUs is from 0.01 to 2.00,
as there are only 2 CPUs available.\nSee 'docker run --help'.",
"@module":"terraform-build-worker.stdlog",
"@timestamp":"2022-05-17T08:08:09.745789Z"
,"git_commit":"e356de1","isolation_type":"docker"}
Solution:
The available number of cpu on the system doesn't match with the defined setting of capacity_cpus
Check the setting of the capacity_cpus
replicatedctl app-config export --template "{{.capacity_cpus.Value}}"
Change this value to an appropriate value for your system and restart Terraform Enterprise. Default is "0" which means unlimited
replicatedctl app-config set capacity_cpus --value 0
replicatedctl app stop
replicatedctl app start
Outcome
Once TFE is restarted subsequent plans should complete successfully. If the issue persists, please look at the additional information or reach out to HashiCorp support for additional assistance.
Additional Information
- Documentation about CPU capacity can be found here