Problem
After cancelling a run during the plan stage in Terraform Cloud (TFC), the Terraform Cloud Agent executing the run becomes unresponsive and eventually exits after receiving a 403 response from Terraform Cloud while attempting to update its status.
2023-12-20T16:13:44.006Z [INFO] core: Job received: job_type=plan job_id=run-1G6r1bdHF3Li3259
2023-12-20T16:13:44.006Z [INFO] terraform: Handling run: run_id=run-1G6r1bdHF3Li3259 run_operation=plan organization_name=example-org workspace_name=example-workspace
2023-12-20T16:13:44.222Z [INFO] terraform: Extracting Terraform from release archive
2023-12-20T16:13:45.073Z [INFO] terraform: Terraform CLI details: version=1.6.5
2023-12-20T16:13:45.073Z [INFO] terraform: Downloading Terraform configuration
2023-12-20T16:13:45.102Z [INFO] terraform: Running terraform init
2023-12-20T16:13:45.920Z [INFO] terraform: Running terraform plan
2023-12-20T16:14:15.630Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:14:46.650Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:15:17.042Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:15:49.066Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:16:20.436Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:16:51.946Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:17:23.555Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:17:56.725Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:18:27.502Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:18:27.502Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:18:57.740Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:18:57.740Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:19:28.579Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:19:28.579Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:20:01.080Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:20:01.080Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:20:33.841Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:20:33.841Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:21:06.477Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:21:06.478Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:21:39.080Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:21:39.080Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:22:09.186Z [INFO] terraform: Received signal: signal=force-cancel
2023-12-20T16:22:09.186Z [WARN] terraform: Signal channel is full, discarding force cancel signal
2023-12-20T16:22:18.135Z [INFO] terraform: Generating and uploading plan JSON
2023-12-20T16:22:18.222Z [INFO] terraform: Finished force canceling run
2023-12-20T16:22:18.269Z [ERROR] core: Unexpected HTTP response code: method=PUT url=https://app.terraform.io/api/agent/status status=403
2023-12-20T16:22:18.269Z [ERROR] core: Failed updating status: error="PUT https://app.terraform.io/api/agent/status: unexpected status code (403 Forbidden): The current agent process failed to report to Terraform Cloud for 9 minutes and has been marked as errored"
2023-12-20T16:22:18.270Z [INFO] core: Waiting for next job
2023-12-20T16:22:18.309Z [ERROR] core: Unexpected HTTP response code: method=GET url=https://app.terraform.io/api/agent/jobs status=403
2023-12-20T16:22:18.310Z [ERROR] agent: Unrecoverable error, shutting down: error="GET https://app.terraform.io/api/agent/jobs: unexpected status code (403 Forbidden): The current agent process failed to report to Terraform Cloud for 9 minutes and has been marked as errored"
2023-12-20T16:22:18.310Z [INFO] agent: Shutting down
2023-12-20T16:22:18.310Z [INFO] agent: Core plugin is shutting down
2023-12-20T16:22:18.353Z [ERROR] core: Unexpected HTTP response code: method=PUT url=https://app.terraform.io/api/agent/status status=403
2023-12-20T16:22:18.353Z [ERROR] core: Failed updating status: error="PUT https://app.terraform.io/api/agent/status: unexpected status code (403 Forbidden): The current agent process failed to report to Terraform Cloud for 9 minutes and has been marked as errored"
2023-12-20T16:22:18.353Z [INFO] core: Shutdown complete
Graceful shutdown complete
This can manifest as the following user-facing symptoms.
- In TFC organizations with a limited number of available agents, the agent remains unavailable for new jobs after the run it is currently executing is cancelled, potentially leading to a backup in the run pipeline.
- The TFC agent unexpectedly exits with the 403 error above, having been marked as errored by Terraform Cloud (impactful if the agent is expected to be long-running and is not running in single mode).
Prerequisites
- Impacted workspace is configured with agent execution mode
- Terraform Cloud Agent versions 1.10.0 through 1.14.1
Cause
This is caused by a bug in the Terraform Cloud Agent impacting versions 1.10.0 through 1.14.1, in which interrupt signals do not reach the terraform plan process executed by the Terraform Cloud Agent. It will typically be noticeable in those runs with a longer plan stage and will cause the TFC agent to be unavailable for new runs post-cancellation and, if the agent is not configured to be automatically restarted, to unexpectedly exit.
Solution
As a long term solution, upgrade the Terraform Cloud Agent to version 1.14.2, which includes a fix for this bug. If an upgrade is not immediately feasible, utilize one or both of the following workarounds depending on the particular impact.
TFC Agent is unavailable to take new runs
Manually stop and restart the tfc-agent process by either restarting the container (Docker), deleting and recreating the pod (Kubernetes), or restarting the Systemd service depending on the deployment method.
TFC Agent unexpectedly exits after receiving 403 from Terraform Cloud
Configure the container platform or process manager to automatically restart the Terraform Cloud Agent using one of the methods below depending on the deployment method.
- Docker: start the TFC Agent container with the restart flag.
- Kubernetes: configure the TFC agent pod's restart policy to be Always or OnFailure
- Systemd: set the
Restart
directive toalways
oron-failure
in the TFC Agent's Systemd unit
Additional Information
If you continue to experience issues, please contact HashiCorp Support.