TFE v202302-1 - Workspaces are failing with the error "Operation failed: failed running terraform plan (exit -1)" – HashiCorp Help Center

Introduction

The issue has reported on Terraform Enterprise (TFE) version v202302-1 after a migration from a standalone TFE installation to an Active/Active installation.

Problem

Runs from the majority of workspaces are failing when the "Remote" execution mode is used.
If private/custom agents are used, the workspaces do not fail anymore.

Cause

There is a known issue in the TFE version v202302-1 for the agent run pipeline mode vs the legacymode.

Overview of possible solutions:

- Switch to `legacy` workers mode.

Rollback to legacy workers (command for standalone):

$ replicatedctl app-config set runpipelinemode --value 'legacy'  
$ replicatedctl app apply-config

Rollback to legacy workers (command for active/active):

$ tfe-admin app-config -k <KEY> -v <VALUE>

If falling back to the legacy mode does not result in successful runs for all the workspaces and some of them now show killed, check the support bundle for the following errors to confirm that Terraform is killed by the OS:

terraform invoked oom-killer

- Once confirmed, adjust the following values for the Capacity of the workers accordingly based on your system needs: `capacity_concurrency`, `capacity_cpus` and `capacity_memory`

See more details here: Capacity and Performance Guide

For Active/Active these are the commands that need to be run.

$ tfe-admin app-config -k capacity_memory -v <value>

$ tfe-admin app-config -k capacity_concurrency -v <value>

$ tfe-admin app-config -k capacity_cpus -v <value>

- Restart TFE

Outcome

The runs are successful.

NOTE

If needed, a switch to the agent run pipeline mode can be done as follows:

# TFE standalone
$ replicatedctl app-config set runpipelinemode --value ''  
$ replicatedctl app apply-config

# TFE Active/Active
$ tfe-admin app-config -k runpipelinemode -v ''

Additional Information