Introduction
This article addresses an issue in Terraform Enterprise where runs are queued and waiting for execution, while agents in the associated agent pool are reported as idle.
Problem
In a Terraform Enterprise environment with a large number of agents in an agent pool, you may observe that runs are slow to be picked up from the queue. This results in a significant delay before the number of concurrent running workspaces approaches the total number of available agents.
For example, with an agent pool of 120 agents, a single VCS change triggers 120 workspace runs. The execution progress may appear as follows.
After 30 seconds: 23 running, 10 on-hold, 0 completed After 60 seconds: 49 running, 25 on-hold, 0 completed After 90 seconds: 66 running, 39 on-hold, 0 completed After 120 seconds: 86 running, 34 on-hold, 0 completed After 150 seconds: 106 running, 13 on-hold, 1 completed After 180 seconds: 108 running, 5 on-hold, 7 completed After 210 seconds: 93 running, 0 on-hold, 27 completed
The number of simultaneous runs scales slowly and may never reach the maximum capacity of 120 agents within a reasonable timeframe.
Cause
By default, agents in an agent pool check for new runs on the queue every 30 seconds. When many agents check simultaneously, they may attempt to dequeue the same run. If one agent successfully claims the run, the other agents' attempts will fail. An agent that fails to dequeue a run will wait for another 30-second polling interval before retrying.
With a large number of agents, the probability of these dequeue conflicts increases, causing many agents to enter a 30-second wait state and leading to the observed delay in processing the run queue.
Solutions
Solution 1: Decrease the Agent Polling Interval
To reduce the delay when multiple agents compete for runs, you can decrease the agent polling interval. This allows agents that fail to dequeue a run to retry more quickly.
- Log in to your Terraform Enterprise instance.
- Navigate to Admin > Settings.
-
Locate the HCP Terraform Agents section and change the Polling Interval to a lower value, such as
5seconds. - Select Save settings.
Outcome
With a lower polling interval, agents that fail to dequeue a run will retry sooner. This increases the rate at which runs are processed, allowing the number of concurrent running agents to more quickly approach the maximum capacity of the agent pool.
Additional Information
For more details on agent configuration, please refer to the HCP Terraform Agents settings documentation.