Problem
Terraform Enterprise (TFE) runs intermittently fail with a timeout during plan filesystem uploads for certain workspaces. A typical error message appears as follows.
Operation failed: failed uploading filesystem: request failed: PUT https://<hostname>/_archivist/v1/object/...: http2: timeout awaiting response headers
Prerequisites
- Terraform Enterprise is installed and operational.
- Affected workspaces have completed previous runs successfully.
- The network path includes firewalls and load balancers between agents and the TFE application.
- You have administrative access to TFE logs and the ability to run commands inside the TFE container.
Cause
The root cause is a transient network issue at the TCP/IP level, likely involving an external load balancer or firewall in the path between the agent and the TFE application. The upload to object storage is successful, but the agent closes the connection before receiving a response due to the network interruption.
Solutions
Solution 1: Perform Manual Diagnostics
These steps help you manually reproduce and diagnose the issue from within the TFE application container.
-
Retrieve the plan filesystem URLs using the Rails console. Replace
<run_external_id>with the ID of the failed run.Run.find_by(external_id: "<run_external_id>").plan.executable_plan.download_url.to_s Run.find_by(external_id: "<run_external_id>").plan.executable_plan.upload_url.to_s
-
Simulate the download and upload operations using
curlto test the connection.First, download the plan archive using the generated download URL.
$ curl -v -o /tmp/test "https://<hostname>/_archivist<DOWNLOAD_URI>"
Next, attempt to upload the archive using the generated upload URL.
$ curl -v \ -X PUT \ -H "Content-Type: application/octet-stream" \ --data-binary '@/tmp/test' \ "https://<hostname>/_archivist<UPLOAD_URI>"
-
Confirm the plan file size is reasonable and not excessively large.
$ ls -lh /tmp/test
Solution 2: Enable Hairpin Addressing (Long-Term Resolution)
To prevent transient network issues with external devices, configure hairpin addressing. This routes internal requests to the TFE hostname directly to the node’s private IP, keeping traffic internal and bypassing external network devices.
- Update the pod specification with
hostAliases. - Set the
TFE_RUN_PIPELINE_DOCKER_EXTRA_HOSTSenvironment variable. - Ensure the TFE certificate or its root Certificate Authority (CA) is trusted by the system and any custom agent images.
Outcome
After you configure hairpin addressing and ensure network stability, the following outcomes occur:
- Affected workspaces resume successful runs.
- New plan and apply actions complete without timeout issues.
- The error does not reoccur, confirming the root cause was transient external network interruptions.