Problem
- Runs on internal agents queued for more than 10 minutes error out without a message or log.
Prerequisites
- Terraform Enterprise v202310-1 thru v202312-1
Cause
-
A bug was introduced in v202310-1 where any run queued for longer than 10 minutes was considered to be stuck and forcibly killed. As this happened before the run was de-queued no logs would be produced.
Solution
Update your Terraform Enterprise instance to at least v202401-1, if upgrading is not an option, utilize an external agent pool as a workaround following these steps:
-
Step 1 - Create a new Agent Pool in the Organization and save the agent token value to be used in a later step.
-
Step 2 - Change the Organization settings Default Execution Mode to Agent, select the Agent Pool you created in the previous step and click on Update organization to save the settings. (Some workspaces might require direct update of the Execution Mode if previously set to Custom)
-
Step 3 - SSH into the Terraform Enterprise host to deploy the agents. Please adjust the values for the number of agents and memory allotted (in the example below 4 tfc-agents will be created with 2GB of memory each) to not exceed your capacity settings.
# Check your current capacity settings
# For replicated deployments
replicatedctl app-config export|jq -r '.capacity_concurrency, .capacity_memory'
# For Docker FDO deployments
docker exec -it terraform-enterprise_tfe_1 tfectl app config --format docker|grep TFE_CAPACITY_
# Create a tfc-agent.sh file with the following content
-------------------------------------------------------
#!/bin/bash
# Prompt for Docker image name (use default if no input)
read -p "Enter Docker image name (default: hashicorp/tfc-agent:latest): " IMAGE_NAME
IMAGE_NAME=${IMAGE_NAME:-"hashicorp/tfc-agent:latest"}
# Prompt for TFE FQDN and store in ADDRESS
read -p "Enter TFE FQDN (e.g., your-tfe-domain.com): " ADDRESS
# Prompt for Agent Token and store in AGENT_TOKEN
read -p "Enter Agent Token: " AGENT_TOKEN
# Prompt for Agent name prefix (use default if no input)
read -p "Enter the Agent name prefix (default: tfc-agent): " CONTAINER_NAME_PREFIX
CONTAINER_NAME_PREFIX=${CONTAINER_NAME_PREFIX:-"tfc-agent"}
# Set other container options
OTHER_OPTIONS="--restart=unless-stopped --memory 2GB"
# Define the number of agents to deploy
read -p "Enter the number of agents to deploy (default: 4): " NUM_AGENTS
NUM_AGENTS=${NUM_AGENTS:-4}
# Create tfc-agent containers
# For other options refer to https://developer.hashicorp.com/terraform/cloud-docs/agents/agents#cli-options
for ((i=1; i<=$NUM_AGENTS; i++)); do
CONTAINER_NAME="${CONTAINER_NAME_PREFIX}_${i}"
docker run -d \
-e TFC_ADDRESS="https://$ADDRESS" \
-e TFC_AGENT_TOKEN="$AGENT_TOKEN" \
-e TFC_AGENT_NAME="$CONTAINER_NAME" \
--name "$CONTAINER_NAME" \
$OTHER_OPTIONS \
$IMAGE_NAME
echo "Terraform Cloud Agent $i deployed: $CONTAINER_NAME"
done -
Step 4 - Make the script executable and invoke it.
# Add the executable bit to the script file
chmod +x tfc-agent.sh
# Execute the script
./tfc-agent.sh
Enter Docker image name (default: hashicorp/tfc-agent:latest):
Enter TFE FQDN (e.g., your-tfe-domain.com): tfe.example.net
Enter Agent Token: PASTE_YOUR_AGENT_POOL_TOKEN_HERE
Enter the Agent name prefix (default: tfc-agent):
Enter the number of agents to deploy (default: 4):
f02c2fca0c7ff2b4a2e1915b3b7dc71fe806a02ab2c37d2955147e1406530242
Terraform Cloud Agent 1 deployed: tfc-agent_1
cbbb4df86f9f4cb00cbbfe8a2e5cfffdc1a62df7a9960557a78fc21e84ad6205
Terraform Cloud Agent 2 deployed: tfc-agent_2
7d19060e2e68288fe8b2ccc36392e9e780ed3d5011db3bb8f02307134064b951
Terraform Cloud Agent 3 deployed: tfc-agent_3
a768c4f045682aef8c6cd93c13df7472593def4b8c9304b0dac1fd6e66cf0028
Terraform Cloud Agent 4 deployed: tfc-agent_4
- Step 5 - Verify the agents have registered successfully to the new TFE Agent Pool.
Outcome
Runs will not error if queued for 10+ minutes.