Problem
A memory and caching issue in specific versions of Terraform can cause runs to fail with varying symptoms. These symptoms may be intermittent or consistent and include:
-
terraform initfailing to complete. -
terraform planfailing to start, with errors during the refresh phase. - Terraform runs failing with Out of Memory (OOM) errors.
- Slow page loads in the Terraform Enterprise UI.
Prerequisites
- Terraform versions
v1.5.3throughv1.5.7. - Terraform AWS provider versions
v4.67.0throughv5.20.0.
Cause
The primary cause of this issue is a significant increase in peak memory usage during Terraform operations. This peak occurs when Terraform calls configured providers to load their resource and data source schemas.
- The Terraform protocol includes a single Remote Procedure Call (RPC) that requests all schemas from a provider, regardless of which specific resources are used in the configuration.
- As providers like the Terraform AWS provider grow in size and complexity, the memory required to load their schemas also grows. Resources with large, nested schemas, such as those for AWS QuickSight and WAFv2, have an outsized effect on memory consumption.
- The issue is more frequent in configurations with multiple AWS provider instances or a history of many provider versions, which increases memory pressure and can lead to OOM errors.
- During
terraform init, Terraform locates and installs the providers used in the configuration. Both HCP Terraform and Terraform Enterprise install providers as part of every run. For more information, refer to the documentation onterraform initand provider installation.
GitHub Issue #31722 provides a detailed investigation into how the addition of resources with complex schemas increased the provider's peak memory requirements.
Solution
To resolve this issue, upgrade to the following versions or newer, as they contain significant memory usage improvements:
-
Terraform:
v1.6.0 -
Terraform AWS provider:
v5.20.0
Other providers may also be affected and could require updates.
Outcome
Terraform v1.6.0 introduced new functionality that allows Terraform to use a cached provider schema instead of fetching a new copy, which significantly reduces memory consumption in configurations with multiple instances of the same provider.
Additionally, the Terraform AWS provider v5.14.0 and newer includes a regex cache that has a significant positive impact on memory consumption. You can find more details in the v5.14.0 release notes.
Additional Information
Memory Usage on Terraform Enterprise
On Terraform Enterprise, memory is a more critical factor for Terraform operations than CPU. While CPU requirements vary, Terraform runs often wait on I/O from APIs, making memory the primary resource constraint. For more details, see the capacity and performance documentation.
SIC-001 Errors
Some memory-related issues may manifest as SIC-001 errors, which occur when the Linux oom-killer terminates a process. These events are logged in dmesg and are included in support bundles. The SIC-001 error is a generic failure related to processing a Terraform slug, which is a data blob containing the Terraform configuration. For more context, refer to the Terraform Enterprise Basic Troubleshooting Guide.
Related GitHub Issues and Release Notes
-
Terraform AWS Provider Issues:
- Memory consumption increase since
v4.67.0: Issue #31722 - Memory allocation: Issue #33553
- Monitoring of memory usage in providers: Issue #32289
- Memory consumption increase since
-
Terraform Core Fix:
- The fix was released in
v1.6.0. The release notes state that Terraform Core can now skip requesting large provider schemas when not needed. See the Terraform v1.6.0-beta1 changelog and Pull Request #33486.
- The fix was released in