Introduction
Problem
There are runs on the queue waiting to be executed and at the same time in the agent pool they are showing as IDLE.
Example
There is an agent pool with 120 agents. There are 120 workspaces connected to a mono repo with VCS in Terraform Enterprise. A change is made which triggers the 120 workspaces to execute a run and apply the changes. Each run should take around 3 - 4 minutes to complete
The result is something like the following when viewing the runs
after 30 seconds: 23 running. - 10 on-hold - 0 completed
after 60 seconds: 49 running - 25 on-hold - 0 completed
after 90 seconds: 66 running - 39 on-hold - 0 completed
after 120 seconds: 86 running - 34 on-hold - 0 completed
after 150 seconds: 106 running - 13 on-hold - 1 completed
after 180 seconds: 108 running - 5 on-hold - 7 completed
after 210 seconds: 93 running - 0 on-hold - 27 completed
It takes a long time before runs are getting close to 100 simultaneous runs. Never to 120 within a minute.
Cause
The agents in an agent pool verify by default every 30 seconds if there is a run on the queue for them to take. Not all 120 agents check at the same time if there is something for them to start. When they check that there is a run they dequeue it from the queue to execute it. This could fail because another agent made the exact same change for the same run. The agent that failed to dequeue the run will wait for another 30 seconds before retrying.
When you have many agents the risk of agents failing to dequeue gets higher and the waiting of 30 seconds becomes an issue.
Solutions
Within Terraform Enterprise the value of the agents checking from default 30 seconds can be altered to a lower value. This means a retry of the agent that failed to dequeue will try quicker for a new run.
Make the following change to alter the value of the agent:
- Login to Terraform Enterprise
- Admin settings --> Settings
- Alter the polling interval to 5 seconds
- Save settings
Outcome
With a lower polling interval the agents that fail to dequeue a run will try again quicker. The number of agents running will be higher and closer to the maximum running agents.
Additional Information
-
Documentation about the agents settings in Terraform Enterprise can be found here