Problem
A VCS job has errored and triggered an excessive amount of runs in TFE which is affecting system performance.
-
Prerequisites (if applicable)
- Terraform Enterprise
-
Cause
- VCS triggered a runaway process.
Solution
Runs stuck in pending status due to the system being flooded with multiple invalid requests from an errored VCS repository.
- Disable triggered runs in the affected workspace
- Set concurrent runs to 1
- Login into Atlas container :
$sudo docker exec -it ptfe_atlas /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
# Terraform Enterprise v202205-1(619) and newer
$sudo docker exec -it tfe-atlas /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
- Locate runs and set them to ERRORED.
Workspace.find_each { |w|
w.runs.planning.each { |r|
r.update_attribute(:status, Run::ERRORED)
r.plan.update_attribute(:status, Plan::ERRORED)
r.apply.update_attribute(:status, Apply::ERRORED)
}
w.unlock!
}
- Restart The TFE system once the Rails query has been completed, This includes external services if applicable such as RDS(Postgres)
- Reset the concurrency back to its original setting. ( It is recommended to slowly increase this number back to its original)
Outcome
Log in to the admin console and view the count of pending runs. The system should be functioning properly now.