Introduction
Problem
Post upgrading Terraform Enterprise to v202310-1 or higher, runs queue indefinitely.
The archivist logs shows errors related to TTL exceeding the maximum:
024-04-07T12:02:05.202509000Z {"log":"{\"@level\":\"error\",\"@message\":\"TTL not within required range\",\"@module\":\"archivist.server.http.create-object\",\"@timestamp\":\"2024-04-07T12:02:04.423012Z\",\"err\":\"TTL missing or exceeds the maximum value\",\"req.amazon_trace_id\":\"-\",\"req.callback\":\"\",\"req.filename\":\"\",\"req.id\":\"-\",\"req.key\":\"terraform/json-plan/07de7cb6/asmt-8dCeApMSBtFAksLL\",\"req.max_upload_bytes\":0,\"req.mode\":\"w\",\"req.stream\":false,\"req.ttl\":\"48h1h\"}","component":"archivist"}
The task-worker logs show Unexpected HTTP response
:
2024-04-07T11:07:33.201224000Z {"log":"{\"@level\":\"info\",\"@message\":\"2024-04-07T11:07:33.114Z [ERROR] core: Unexpected HTTP response code: method=GET url=https://<TFE hostname>/api/agent/jobs status=500\",\"@module\":\"task-worker.executor.task-output\",\"@timestamp\":\"2024-04-07T11:07:33.114647Z\",\"id\":\"a0a6f6af-fea2-48ca-9dfb-8fade6db35ea\",\"name\":\"agent-run\",\"stream\":\"stdout\"}","component":"task-worker"
Cause
The error messages shown are related to a known issue in Terraform Enterprise (TFE) versions after v202310-1. The issue is specifically with the "Plan/Apply Run Timeout" setting in the TFE site admin's settings section. When this setting is configured to a value above 24 hours, it can cause Terraform plans to hang or become stuck, preventing them from completing successfully.
Solution
In the TFE Admin Settings: https://<TFE hostname>/app/admin/settings, under Terraform Timeouts
Change the plan and apply timeout to a value equal/below 24h