Problem
When running a plan in Terraform Enterprise (TFE) with a Flexible Deployment Options (FDO) installation on Kubernetes, the run fails with a generic error message like "Plan errored" or "undefined" without providing a clear cause.
Prerequisites
- You have Terraform Enterprise with Flexible Deployment Options running on a Kubernetes cluster.
Cause
Terraform Enterprise initiates a new pod in the terraform-enterprise-agents namespace for each run. This error often occurs when the Kubernetes cluster lacks sufficient resources, such as memory or CPU, to schedule and start this new agent pod.
You can verify this by checking the events in the agent namespace.
-
Check the Kubernetes events for the agent namespace.
$ kubectl get events -n terraform-enterprise-agents
-
Review the output for resource-related errors. An insufficient memory error may appear similar to the following example.
##... 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. ##...
Solutions
Solution 1: Increase Cluster Resources
The primary recommendation is to resize your Kubernetes cluster to ensure it has adequate resources to schedule the TFE task agent pods.
Solution 2: Adjust TFE Application Settings
If increasing cluster resources is not immediately possible, you can adjust TFE's capacity and concurrency settings to better fit the constraints of your Kubernetes cluster.
Refer to the TFE Application Settings documentation and consider altering the following:
- CPU/Memory limits: Adjust the CPU and memory that a task is allowed to consume.
-
Concurrency: Modify the
TFE_CAPACITY_CONCURRENCYenvironment variable to limit the number of concurrent runs.
Outcome
After allocating sufficient resources or adjusting application settings, TFE runs should execute successfully. You can confirm that the task pods are starting correctly by listing the pods in the agent namespace.
$ kubectl get pods -n terraform-enterprise-agents
A successful run will show a task pod in the Running state.
NAME READY STATUS RESTARTS AGE tfe-task-ddab255d-7dc6-4ca9-8ecb-b47e8a27bee0-klzkh 1/1 Running 0 11s
Additional Information
- For more details on configuration, refer to the TFE FDO Kubernetes settings in the official documentation.