Overview
Terraform Cloud Agents may fail during runs if they exhaust available memory on the host system. This typically presents as sudden agent crashes, stalled runs, or Terraform operations terminating unexpectedly during plan or apply.
This article explains common symptoms, root causes, how to diagnose memory-related failures, and steps to remediate and prevent them.
Symptoms
One or more of the following may be observed:
• Terraform runs stuck in “Planning” or “Applying” before failing
• Runs failing without clear Terraform errors
• Agent logs showing:
• OOMKilled
• signal: killed
• exit code 137
• Agent container or process restarting unexpectedly
• Host-level alerts indicating high memory usage
• Terraform Cloud UI showing the agent as offline intermittently
Common Causes
1. Large Terraform Plans
• Large state files
• Extensive use of for_each or count
• Many modules or deeply nested modules
• Large provider schemas (e.g., AWS, Azure, Google)
Terraform loads the entire dependency graph into memory during planning.
2. Insufficient Host Memory
• Agent running on a VM or container with limited RAM
• Multiple agents or workloads competing for memory on the same host
• Container memory limits set too low (Docker/Kubernetes)
3. Provider Behavior
Some providers are memory-intensive, especially when:
• Refreshing many resources
• Using data sources that enumerate large APIs
• Managing many resources in a single workspace
4. Parallelism Settings
High parallelism increases memory usage:
• Default Terraform parallelism is 10
• Providers may internally parallelize additional operations
How to Diagnose
Step 1: Check Terraform Cloud Run Logs
Look for abrupt termination or missing error output near the end of a plan/apply.
Step 2: Review Agent Logs
For Docker-based agents:
docker logs terraform-agentFor Kubernetes:
kubectl logs <agent-pod-name>Look for:
• OOMKilled
• Killed
• Memory allocation failures
Step 3: Check Host Memory Usage
On the agent host:
free -h
topFor containers:
docker stats
kubectl describe pod <agent-pod-name>
Resolution
Option 1: Increase Available Memory (Recommended)
VM / Bare Metal
• Increase RAM on the agent host
Docker
docker run --memory=8g --memory-swap=8g ...Kubernetes
resources:
requests:
memory: "4Gi"
limits:
memory: "8Gi"Terraform agents commonly require 4–8 GB RAM, and large environments may need more.
Option 2: Reduce Terraform Parallelism
Set lower parallelism in the workspace or run configuration:
terraform plan -parallelism=5Or via environment variable:
TF_CLI_ARGS_plan="-parallelism=5"
TF_CLI_ARGS_apply="-parallelism=5"
Option 3: Reduce Plan Size
• Split large configurations into multiple workspaces
• Break monolithic states into smaller components
• Avoid unnecessary data sources
• Limit use of terraform refresh-heavy patterns
Option 4: Reduce Concurrent Agent Workloads
• Run fewer agents per host
• Ensure agents are not colocated with other memory-heavy services
• Use autoscaling (for Kubernetes-based agents)
Prevention & Best Practices
• Allocate at least 4 GB RAM per agent, more for large environments
• Avoid running multiple agents on small hosts
• Monitor memory usage with alerts
• Periodically review workspace size and complexity
• Use multiple smaller workspaces instead of a single large one
• Pin provider versions to avoid unexpected memory regressions