Introduction
This article describes how to verify if your Terraform error is related to not having enough memory to complete the task.
Problem
A run in Terraform Cloud or Terraform Enterprise might fail with the following error message
|
| Error: timeout while waiting for plugin to start
|
Cause
This error message could be caused by the worker executing the code in Terraform Cloud/Terraform Enterprise not having enough memory to complete the run.
Please follow the below steps to verify this in the logging
- Go to the workspace that is having the issues
- Under Variables -> add the following variable
- Environment variable
- Key -> TF_LOG
- Value -> TRACE
- Under Settings -> General -> User Interface
- Select Console UI
- Save settings
- Start another run that will fail with the same error
- Download the run details from the workspace by clicking on the "View raw log" button
- Open this file and see if you find a message that has the word "killed" in the sentence like the example below
2023-05-31T11:29:51.574Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/4.65.0/linux_amd64/terraform-provider-aws_v4.65.0_x5 pid=254 error="signal: killed"
Solution 1:
Use less providers and resources in your code. Split these over multiple workspaces. This will cause a run to use less memory.
Solution 2:
For Terraform Enterprise you can increase the allowed memory a worker can use. This can be done using capacity management as documented here
- Verify the current allocated memory a worker is allowed to use
replicatedctl app-config export --template "{{.capacity_memory.Value}}"
- Change the value to more memory. For example 2048MB
replicatedctl app-config set capacity_memory --value "2048"
- Restart your TFE application for the settings to take effect
replicatedctl app stop
replicatedctl app start
- Do another run to verify if the issue is resolved
Please be aware of the following
The default memory limit for the Terraform worker container is 512MB. Multiply that by the default concurrency limit of 10 and you'll need 5GB of memory just for Terraform runs.Then you'll need another 4GB for Terraform Enterprise and some left over for the OS.
If you increase your memory limit from 512MB to 2048MB, be sure to keep in mind that you'll still need 4GB for Terraform Enterprise and another few GB for the OS. You may need to increase the amount of RAM on your server. If you don't you will get errors again and performance issues.
To give you an idea of the sizing
16GB RAM for 10 concurrency and 512MB memory limit
24GB RAM for 10 concurrency and 1024MB memory limit
32GB RAM for 10 concurrency and 2048MB memory limit
Outcome
After making the changes suggested the run should complete without issue. If you are still seeing issues please create a ticket with HashiCorp support and add the logfile of the failed run on the ticket.
Additional Information
-
Capacity management in our documentation can be found here