Overview
A failure in the Terraform Enterprise run pipeline, such as errors encountered during the setup of agents to perform runs, can occur due to application misconfigurations or other platform-related issues. When this happens, new runs created in the interim can remain queued indefinitely, even after the issue in the run pipeline is resolved. These runs will require manual intervention to be canceled so that a new one can be triggered in their associated workspaces. This guide presents a few options to programmatically cancel such runs as a cleanup procedure post-resolution.
Procedure
API
Runs can be forced cancelled through the Force a run into the "cancelled" state API. To programmatically cancel all queued runs, the following steps need to be performed.
- Query for all runs in
plan_queued
orapply_queued
state using the List all runs API with afilter[status]
URL query parameter. - Extract each runs external ID, paginating through the results.
- Call the Force a run into the "cancelled" state API with the external ID of each run.
Below is an example Bash script which implements the steps above to force cancel all runs stuck in queued state.
#!/bin/bash
TFE_HOSTNAME="<TFE_HOSTNAME>"
URL="https://$TFE_HOSTNAME/api/v2/admin/runs?filter%5Bstatus%5D=plan_queued,apply_queued"
QUEUED_RUNS=
echo "Gathering queued runs..."
while [ "$URL" != null ]; do
RESPONSE=$(curl \
--show-error \
--silent \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/vnd.api+json" \
"$URL")
QUEUED_RUNS+=($(jq -r '.data[].id' <<<$RESPONSE))
URL=$(jq '.pagination."next-page"' <<<$RESPONSE)
done
for run in ${QUEUED_RUNS[@]}; do
echo "Force cancelling $run..."
curl \
--show-error \
--silent \
--header "Authorization: Bearer $TOKEN" \
--header "Content-Type: application/vnd.api+json" \
--data '{"comment": "cancelling stuck runs"}' \
"https://$TFE_HOSTNAME/api/v2/admin/runs/$run/actions/force-cancel"
sleep 1
done
Rails Console
Runs can be force canceled in the Rails console, using the following Ruby code. This code queries for all runs in a queued state which where created at least 30 minutes ago and force cancels them as an admin user.
u = User.find_by_email "<TERRAFORM_ENTERPRISE_ADMIN_EMAIL>"
runs = Run.currently_queued.where("created_at < ?", 30.minute.ago)
runs.each{ |r| r.force_cancel!(user: u, comment: "cancelling stuck runs") }
The time frame in the query should be adjusted to match the time at which the issue in the run pipeline was resolved as to only target those queued runs which were created during or before the time in which the issue in the run pipeline was active.
Manual
If there are not many runs stuck in a queued state, optionally navigate to the Admin UI and manually cancel the queued runs using the Force cancel button.