Problem
Occasionally a scenario can occur in which Sentinel policy or Cost Estimation checks will begin failing for all runs on a Terraform Enterprise instance. Additionally this may affect the ability to generate and download Sentinel Mocks in Terraform Enterprise workspaces.
Cause
One of the main causes of this occurs upon restart of the Terraform Enterprise application. When the ptfe_nomad
(for Terraform Enterprise v202205-1
and later, the container is called tfe-nomad
) container is brought online as part of the application start, a race condition will occur in which the Nomad jobs are scheduled before the container is ready to do so. If this is the case, the logs for ptfe_nomad or tfe-nomad
will show the following error for Cost Estimation.
NOTE: Starting with Terraform Enterprise v202212-1
the tfe-nomad
container has been replaced by tfe-task-worker
.
2020-08-13T06:45:37.601104025Z 2020/08/13 06:45:37.600969 [ERR] http: Request /v1/job/cost-estimation-worker/dispatch, error: parameterized job not found
Or the following for Sentinel.
2020-06-05T03:10:47.760300693Z 2020/06/05 03:10:47.760165 [ERR] http: Request /v1/job/sentinel-worker/dispatch, error: parameterized job not found
Additionally the logs for ptfe_sidekiq
(for Terraform Enterprise v202205-1
and later, the container is called tfe-sidekiq
) will show the following error.
2020-08-11T13:08:39.137385067Z 2020-08-11 13:08:39 [ERROR] {:msg=>"Failed to enqueue cost estimate", :run_id=>622, :cost_estimate_id=>590, :exception=>#<RestClient::InternalServerError: 500 Internal Server Error>}
The logs for ptfe_nomad
can be checked by running the following command on the Terraform Enterprise instance.
$ docker logs ptfe_nomad
For Terraform Enterprise v202205-1
through v202211-1
:
$ docker logs tfe-nomad
For Terraform Enterprise v202212-1
or later:
$ docker logs tfe-task-worker
To check the logs for ptfe_sidekiq
run the following command.
$ docker logs ptfe_sidekiq
For Terraform Enterprise v202205-1
or later:
$ docker logs tfe-sidekiq
Finally to confirm if the Nomad jobs for Sentinel and Cost Estimation checks are scheduled run the following command.
$ docker exec -it ptfe_nomad nomad job status
For Terraform Enterprise v202205-1
through v202211-1
:
$ docker exec -it tfe-nomad nomad job status
The output should show three scheduled jobs: cost-estimation-worker
, plan-export-worker
, and sentinel-worker
Solution
In order to resolve this error, run the following command on the Terraform Enterprise instance.
$ docker exec -it ptfe_nomad /bin/bash -c 'for i in ${WORKERDIR}/*.job; do nomad run "${i}"; done'
For Terraform Enterprise v202205-1
through v202211-1
:
$ docker exec -it tfe-nomad /bin/bash -c 'for i in ${WORKERDIR}/*.job; do nomad run "${i}"; done'
This will manually reschedule the Nomad jobs required for Sentinel policy and Cost Estimation Checks.
To verify the Nomad jobs are scheduled, run the following command.
$ docker exec -it ptfe_nomad nomad job status
For Terraform Enterprise v202205-1
through v202211-1
:
$ docker exec -it tfe-nomad nomad job status
Additional Information
As this is a known bug with Terraform Enterprise v202008-1
, Terraform Enterprise would also need to be upgraded to a newer version to prevent this from occurring again.