Recovering from Excessive VCS-Triggered Runs in Terraform Enterprise

Problem

A Version Control System (VCS) connection has triggered an excessive number of runs in Terraform Enterprise, negatively affecting system performance. Runs are stuck in a pending status because the system is flooded with multiple invalid requests from an errored VCS repository connection.

Prerequisites

Terraform Enterprise installation
Administrative access to the Terraform Enterprise host machine

Cause

A misconfiguration or error in a VCS repository connection can trigger a runaway process, flooding the run queue with invalid requests.

Solution

This procedure details how to stop the excessive runs and restore system stability.

In the Terraform Enterprise UI, navigate to the workspace settings for the affected workspace and disable runs triggered by VCS connections.
Navigate to the TFE Capacity settings page in the admin settings and set the concurrent run limit to 1. This prevents new runs from starting while you perform maintenance.
Log in to the atlas container on the Terraform Enterprise host machine. The command varies based on your TFE version.

For Terraform Enterprise v202205-1 (build 619) and newer, run the following command:
```
$ sudo docker exec -it tfe-atlas /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
```
For older versions, run the following command:
```
$ sudo docker exec -it ptfe_atlas /usr/bin/init.sh /app/scripts/wait-for-token -- bash -i -c 'cd /app && ./bin/rails c'
```

In the Rails console, execute the following Ruby script. This script iterates through all workspaces, finds any runs that are currently in a planning state, and updates their status to errored.

Workspace.find_each { |w|
  w.runs.planning.each { |r|
    r.update_attribute(:status, Run::ERRORED)
    r.plan.update_attribute(:status, Plan::ERRORED)
    r.apply.update_attribute(:status, Apply::ERRORED)
  }
  w.unlock!
}

After the script completes, restart the Terraform Enterprise application. If you use external services like an external PostgreSQL database, restart those as well.
Once the system is stable, reset the concurrency limit back to its original setting. We recommend increasing this number gradually to ensure stability.

Outcome

After completing the procedure, log in to the admin console and verify that the count of pending runs has returned to a normal level. The system should now be functioning correctly.

Additional Information

For more details on managing a Terraform Enterprise instance, refer to the official administration documentation.

VCS triggered Excessive jobs in Terraform Enterprise and recovery is necessary.

Recovering from Excessive VCS-Triggered Runs in Terraform Enterprise

Problem

Prerequisites

Cause

Solution

Outcome

Additional Information

Articles in this section

Recovering from Excessive VCS-Triggered Runs in Terraform Enterprise

Problem

Prerequisites

Cause

Solution

Outcome

Additional Information

Articles in this section

Related articles