Problem
In Terraform Enterprise, runs that use the internal agent pool and are queued for more than 10 minutes fail without a specific error message or log output.
Prerequisites
- Terraform Enterprise versions v202310-1 through v202312-1.
Cause
A bug introduced in Terraform Enterprise v202310-1 causes any run queued for longer than 10 minutes to be considered stuck and forcibly terminated. Because the termination occurs before the run is de-queued, no logs are produced.
Solutions
Solution 1: Upgrade Terraform Enterprise
The recommended solution is to upgrade your Terraform Enterprise instance to version v202401-1 or later. This version contains the fix for the bug.
Solution 2: Use an External Agent Pool (Workaround)
If you cannot upgrade immediately, you can use an external agent pool as a workaround. This involves deploying agents on the Terraform Enterprise host machine.
Procedure
- Create a new Agent Pool in the Organization and save the agent token. You will use this token in a later step.
- Navigate to your organization's settings and change the Default Execution Mode to Agent. Select the agent pool you created in the previous step and save the settings. Note that some workspaces may require you to update the Execution Mode directly if it was previously set to Custom.
-
Connect to the Terraform Enterprise host via SSH to deploy the agents. Before deploying, check your instance's capacity settings to ensure you do not exceed available resources. The example script deploys four agents, each with 2GB of memory.
First, check your current capacity settings.
For Replicated deployments:
$ replicatedctl app-config export | jq -r '.capacity_concurrency, .capacity_memory'
For Docker FDO deployments:
$ docker exec -it terraform-enterprise_tfe_1 tfectl app config --format docker | grep TFE_CAPACITY_
Next, create a
tfc-agent.shfile with the following content.#!/bin/bash ## Prompt for Docker image name (use default if no input) read -p "Enter Docker image name (default: hashicorp/tfc-agent:latest): " IMAGE_NAME IMAGE_NAME=${IMAGE_NAME:-"hashicorp/tfc-agent:latest"} ## Prompt for TFE FQDN and store in ADDRESS read -p "Enter TFE FQDN (e.g., your-tfe-domain.com): " ADDRESS ## Prompt for Agent Token and store in AGENT_TOKEN read -p "Enter Agent Token: " AGENT_TOKEN ## Prompt for Agent name prefix (use default if no input) read -p "Enter the Agent name prefix (default: tfc-agent): " CONTAINER_NAME_PREFIX CONTAINER_NAME_PREFIX=${CONTAINER_NAME_PREFIX:-"tfc-agent"} ## Set other container options OTHER_OPTIONS="--restart=unless-stopped --memory 2GB" ## Define the number of agents to deploy read -p "Enter the number of agents to deploy (default: 4): " NUM_AGENTS NUM_AGENTS=${NUM_AGENTS:-4} ## Create tfc-agent containers ## For other options refer to https://developer.hashicorp.com/terraform/cloud-docs/agents/agents#cli-options for ((i=1; i<=$NUM_AGENTS; i++)); do CONTAINER_NAME="${CONTAINER_NAME_PREFIX}_${i}" docker run -d \ -e TFC_ADDRESS="https://$ADDRESS" \ -e TFC_AGENT_TOKEN="$AGENT_TOKEN" \ -e TFC_AGENT_NAME="$CONTAINER_NAME" \ --name "$CONTAINER_NAME" \ $OTHER_OPTIONS \ $IMAGE_NAME echo "HCP Terraform Agent $i deployed: $CONTAINER_NAME" done -
Make the script executable and run it. Provide the requested values when prompted.
$ chmod +x tfc-agent.sh $ ./tfc-agent.sh
Example Execution:
Enter Docker image name (default: hashicorp/tfc-agent:latest): Enter TFE FQDN (e.g., your-tfe-domain.com): tfe.example.net Enter Agent Token: PASTE_YOUR_AGENT_POOL_TOKEN_HERE Enter the Agent name prefix (default: tfc-agent): Enter the number of agents to deploy (default: 4): f02c2fca0c7ff2b4a2e1915b3b7dc71fe806a02ab2c37d2955147e1406530242 HCP Terraform Agent 1 deployed: tfc-agent_1 cbbb4df86f9f4cb00cbbfe8a2e5cfffdc1a62df7a9960557a78fc21e84ad6205 HCP Terraform Agent 2 deployed: tfc-agent_2 7d19060e2e68288fe8b2ccc36392e9e780ed3d5011db3bb8f02307134064b951 HCP Terraform Agent 3 deployed: tfc-agent_3 a768c4f045682aef8c6cd93c13df7472593def4b8c9304b0dac1fd6e66cf0028 HCP Terraform Agent 4 deployed: tfc-agent_4
-
Verify that the agents have successfully registered with the new agent pool in the Terraform Enterprise UI.
Outcome
After applying either solution, runs queued for more than 10 minutes will execute successfully instead of failing.