Problem
Occassionally a scenario can occur in which Sentinel policy or Cost Estimation checks will begin failing for all runs on a Terraform Enterprise instance. Additionally this may affect the ability to generate and download Sentinel Mocks in Terraform Enterprise workspaces.
Cause
One of the main causes of this occurs upon restart of the Terraform Enteprise application. When the ptfe_nomad
container is brought online as part of the application start, a race condition will occur in which the Nomad jobs are scheduled before the container is ready to do so. If this is the case, the logs for ptfe_nomad
will show the following error for Cost Estimation.
2020-08-13T06:45:37.601104025Z 2020/08/13 06:45:37.600969 [ERR] http: Request /v1/job/cost-estimation-worker/dispatch, error: parameterized job not found
Or the following for Sentinel.
2020-06-05T03:10:47.760300693Z 2020/06/05 03:10:47.760165 [ERR] http: Request /v1/job/sentinel-worker/dispatch, error: parameterized job not found
Additionally the logs for ptfe_sidekiq
will show the following error.
2020-08-11T13:08:39.137385067Z 2020-08-11 13:08:39 [ERROR] {:msg=>"Failed to enqueue cost estimate", :run_id=>622, :cost_estimate_id=>590, :exception=>#<RestClient::InternalServerError: 500 Internal Server Error>}
The logs for ptfe_nomad
can be checked by running the following command on the Terraform Enterprise instance.
$ docker logs ptfe_nomad
To check the logs for ptfe_sidekiq
run the following command.
$ docker logs ptfe_sidekiq
Finally to confirm if the Nomad jobs for Sentinel and Cost Estimation checks are scheduled run the following command.
$ docker exec -it ptfe_nomad nomad job status
The output should show three scheduled jobs: cost-estimation-worker
, plan-export-worker
, and sentinel-worker
Solution
In order to resolve this error, run the following command on the Terraform Enterprise instance.
$ docker exec -it ptfe_nomad /bin/bash -c 'for i in ${WORKERDIR}/*.job; do nomad run "${i}"; done'
This will manually reschedule the Nomad jobs required for Sentinel policy and Cost Estimation Checks.
To verify the Nomad jobs are scheduled, run the following command.
$ docker exec -it ptfe_nomad nomad job status
Additional Information
As this is a known bug with Terraform Enterprise v202008-1, Terraform Enterprise would also need to be upgraded to a newer version to prevent this from occuring again.