Problem
After you cancel a run during the plan stage in HCP Terraform, the HCP Terraform Agent executing the run becomes unresponsive. The agent eventually exits after receiving a 403 Forbidden response from HCP Terraform while attempting to update its status.
This issue can cause the following symptoms:
- In HCP Terraform organizations with a limited number of available agents, the affected agent remains unavailable for new jobs after the run is canceled, which can cause a backup in the run pipeline.
- The HCP Terraform agent unexpectedly exits after being marked as errored by HCP Terraform. This is most impactful if the agent is expected to be long-running and is not running in single-use mode.
The agent logs may show repeated force-cancel signals followed by 403 errors before shutting down.
[INFO] core: Job received: job_type=plan job_id=run-1G6r1bdHF3Li3259 [INFO] terraform: Handling run: run_id=run-1G6r1bdHF3Li3259 run_operation=plan organization_name=example-org workspace_name=example-workspace [INFO] terraform: Extracting Terraform from release archive [INFO] terraform: Terraform CLI details: version=1.6.5 [INFO] terraform: Downloading Terraform configuration [INFO] terraform: Running terraform init [INFO] terraform: Running terraform plan [INFO] terraform: Received signal: signal=force-cancel [INFO] terraform: Received signal: signal=force-cancel ## ... (repeated force-cancel signals) [WARN] terraform: Signal channel is full, discarding force cancel signal [INFO] terraform: Generating and uploading plan JSON [INFO] terraform: Finished force canceling run [ERROR] core: Unexpected HTTP response code: method=PUT url=https://app.terraform.io/api/agent/status status=403 [ERROR] core: Failed updating status: error="PUT https://app.terraform.io/api/agent/status: unexpected status code (403 Forbidden): The current agent process failed to report to Terraform Cloud for 9 minutes and has been marked as errored" [INFO] core: Waiting for next job [ERROR] core: Unexpected HTTP response code: method=GET url=https://app.terraform.io/api/agent/jobs status=403 [ERROR] agent: Unrecoverable error, shutting down: error="GET https://app.terraform.io/api/agent/jobs: unexpected status code (403 Forbidden): The current agent process failed to report to Terraform Cloud for 9 minutes and has been marked as errored" [INFO] agent: Shutting down [INFO] agent: Core plugin is shutting down [ERROR] core: Unexpected HTTP response code: method=PUT url=https://app.terraform.io/api/agent/status status=403 [ERROR] core: Failed updating status: error="PUT https://app.terraform.io/api/agent/status: unexpected status code (403 Forbidden): The current agent process failed to report to Terraform Cloud for 9 minutes and has been marked as errored" [INFO] core: Shutdown complete Graceful shutdown complete
Prerequisites
- The impacted workspace is configured with agent execution mode.
- You are using HCP Terraform Agent versions 1.10.0 through 1.14.1.
Cause
A bug in HCP Terraform Agent versions 1.10.0 through 1.14.1 prevents interrupt signals from reaching the terraform plan process. This issue is most noticeable in runs with a long plan stage. When a run is canceled, the agent fails to terminate the plan process, becomes unresponsive, and is eventually marked as errored by HCP Terraform.
Solutions
There is a permanent solution and two workarounds available depending on your operational needs.
Solution 1: Upgrade the HCP Terraform Agent
As the recommended long-term solution, upgrade the HCP Terraform Agent to version 1.14.2 or newer. This version includes a fix for this bug.
Solution 2: Manually Restart the Agent (Workaround)
If an immediate upgrade is not possible and an agent is unavailable for new runs, you can manually stop and restart the tfc-agent process. The method depends on your deployment environment:
-
Docker: Use the
docker restartcommand for the container. - Kubernetes: Delete and recreate the pod.
-
Systemd: Restart the
tfc-agentservice.
Solution 3: Configure Automatic Restarts (Workaround)
To prevent downtime from an agent unexpectedly exiting, configure your container platform or process manager to restart the agent automatically.
-
Docker: Start the agent container with the
--restartflag set toalwaysoron-failure. -
Kubernetes: Configure the agent pod's restart policy to be
AlwaysorOnFailure. -
Systemd: Set the
Restartdirective toalwaysoron-failurein the agent's Systemd unit file.