Terraform known issue in v1.5.3 through v1.6.0: runs fail and/or memory consumption increased and/or Terraform Enterprise UI slowdown – HashiCorp Help Center

Introduction

Problem

A memory/caching issue in certain versions of Terraform caused runs to fail with varying symptoms, including intermittent or consistent:

terraform init failing to complete
terraform plan failing to start with errors during refresh
Terraform runs failing with Out of Memory (OOM) issues
Terraform Enterprise UI page loads slowing down

Prerequisites (if applicable)

Terraform versionsv1.5.3 tov1.6.0
Terraform AWS provider versions v4.67.0 to v5.20.0

Cause

The main problem is peak memory usage:
- this peak occurs when Terraform makes calls to configured providers to load their resource/data source schemas,
- the Terraform protocol contains a single RPC which asks for all schemas regardless of the specific resources configured,
- this means that as the Terraform AWS provider grows, the memory requirements for using it also grow,
- some specific resources such as QuickSight and WAFv2 have extremely large nested schemas which can have an outsized effect on memory
  - for example - quicksight resources are the largest contributors to the memory jump, and they were added in v4.67.0 and v5.1.0
The memory requirements vary based on the particular resource configured.
The problem occurred more frequently on Terraform configuration with multiple AWS providers configured or a history of several versions of a provider.
The pressure on memory requirements has resulted in OOM errors in some cases.
Issue 31722 investigated the increasing size of the provider when combined with addition of resources with a deep and complex schema has significantly increased the peak memory requirements of using the provider.

During the terraform init - terraform locates and 'installs' the Terraform Providers used within the configuration, including the child modules called.

https://developer.hashicorp.com/terraform/cli/commands/init

Terraform Cloud and Terraform Enterprise install providers as part of every run.

Terraform CLI finds and installs providers when initializing a working directory. It can automatically download providers from a Terraform registry, or load them from a local mirror or cache. If you are using a persistent working directory, you must reinitialize whenever you change a configuration's providers.
To save time and bandwidth, Terraform CLI supports an optional plugin cache. You can enable the cache using the plugin_cache_dir setting in the CLI configuration file.

https://developer.hashicorp.com/terraform/language/providers#provider-installation

Overview of possible solutions (if applicable)

Solutions:

Most users will see a significant decrease in memory footprint by upgrading to:
- Terraform v1.6.0 and newer,
- Terraform AWS provider v5.20.0 and newer,
- other providers may also be affected and may also require updates.

Outcome

Changes included from Terraform v1.6.0 onwards, included new functionality that allowed a cached provider schema to be used rather than obtaining another copy, which significantly reduces memory consumption for configurations that include multiple instances of the same provider. Additionally, a regex cache to was added to the Terraform AWS provider (released in v5.14.0) which in testing seems to have a significant impact on memory consumption.

Additional Information

On Terraform Enterprise, generally, running out of memory, impacts Terraform operations, more than CPU:

The required CPU resources for an individual Terraform run vary considerably, but in general they are a much more minor factor than memory due to Terraform mostly waiting on IO from APIs to return.

https://developer.hashicorp.com/terraform/enterprise/system-overview/capacity#cpu

Some memory issues present as SIC-001 errors. They occur when oom-killer events occur on the linux OS. Messages are written to the dmesg log when this happens and are included in the support bundle at <host>/default/commands/dmesg/stdout

The SIC-001 (Source Ingress Controller ) error is a generic failure to process a Terraform slug. A slug refers to a blob of data which contains the current state of the Terraform configuration files. Terraform Enterprise uses slug services to pull VCS information in to extract, merge, and process Terraform configuration files. After a slug is ingressed and processed it is then uploaded to blob storage via archivist.

Terraform-Enterprise-Basic-Troubleshooting-Guide

Related to the Terraform AWS provider:

memory consumption increase since v4.67.0https://github.com/hashicorp/terraform-provider-aws/issues/31722
memory allocation: https://github.com/hashicorp/terraform-provider-aws/issues/33553
monitoring of memory usage in providers: https://github.com/hashicorp/terraform-provider-aws/issues/32289

Related to the fix made:

Updates to Terraform were released in v1.6.0, noting that updates are also required in the aws providers for the dependencies to take effect.

core: Terraform will now skip requesting the (possibly very large) provider schema from providers which indicate during handshake that they don't require that for correct behavior, in situations where Terraform Core itself does not need the schema. (#33486)

https://github.com/hashicorp/terraform/blob/v1.6.0-beta1/CHANGELOG.md#160-august-31-2023