Problem
A run in Terraform Enterprise may fail with the following error message, indicating that a provider plugin process terminated unexpectedly.
Error: timeout while waiting for plugin to start
Cause
This error can occur when the worker executing the run does not have enough memory, causing the operating system to terminate the plugin process. You can verify this by inspecting the detailed run logs for a signal: killed message.
Log Verification Steps
- Navigate to the workspace experiencing the issue.
-
Add an environment variable with the key
TF_LOGand the valueTRACEto enable detailed logging. -
In the workspace settings under General > User Interface, select Console UI and save the settings.
- Start a new run to reproduce the failure.
-
Download the raw log file by clicking the View raw log button in the run details.
-
Search the downloaded log file for a line containing
signal: killed, similar to the example below. This message confirms the process was terminated due to resource constraints.2023-05-31T11:29:51.574Z [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/4.65.0/linux_amd64/terraform-provider-aws_v4.65.0_x5 pid=254 error="signal: killed"
Alternatively, follow the steps in the How to Identify when Terraform Enterprise Runs Exceed Memory Capacity Article to identify Linux OOM-killer events in the system logs.
Solutions
Solution 1: Reduce Memory Usage
Reduce the memory required for a single run by splitting large configurations across multiple workspaces. Using fewer providers and resources in a single configuration decreases its memory footprint.
Solution 2: Increase Worker Memory
You can increase the memory allocated to the Terraform worker container using the following steps:
Flexible Deployment Options
Edit the TFE_CAPACITY_MEMORY setting in your Terraform Enterprise deployment file (i.e Docker Compose YAML, Helm chart, etc.) and restart Terraform Enterprise.
Replicated
-
Check the currently allocated memory for a worker.
# replicatedctl app-config export --template "{{.capacity_memory.Value}}" -
Increase the memory value. For example, to set it to 2048MB.
# replicatedctl app-config set capacity_memory --value "2048"
-
Apply the changes by restarting the Terraform Enterprise application.
# replicatedctl app stop # replicatedctl app start
- Trigger another run to verify the issue is resolved.
Note on Capacity Planning: The default memory limit for a Terraform worker is 512MB (Replicated deployment) and 2048MB (Flexible Deployment Options) . When increasing this limit, you must ensure the host instance has sufficient total RAM. You need to account for the memory required by the Terraform Enterprise application (approx. 4GB), the operating system, and the cumulative memory for all concurrent runs (TFE_CAPACITY_MEMORY * TFE_CAPACITY_CONCURRENCY).
Example sizing recommendations:
- 16GB RAM: Supports 10 concurrent runs with a 512MB memory limit.
- 24GB RAM: Supports 10 concurrent runs with a 1024MB memory limit.
- 32GB RAM: Supports 10 concurrent runs with a 2048MB memory limit.
Insufficient host RAM can lead to performance degradation or other system errors.
Outcome
After applying one of the suggested solutions, subsequent Terraform runs should complete successfully without the timeout error.
Additional Information
- For more details on managing TFE resources, refer to the capacity management documentation.