Autoscaling in Kubernetes for Terraform Enterprise FDO Versions Below v202501-1 Is Unsupported and May Cause Run Queueing – HashiCorp Help Center

Problem

Autoscaling in Kubernetes (K8s) enables dynamic scaling of workloads based on resource utilization. However, for versions of the TFE Flexible Deployment Option (FDO) earlier than v202501-1, autoscaling configurations are not officially supported until v202501-1. Implementing autoscaling in these versions may lead to queued Terraform runs, potentially impacting workflow efficiency.

Prerequisites (if applicable)

K8s Terraform Enterprise (TFE) Flexible Deployment Option (FDO) versions below 202501-1

Cause

In FDO versions below v202501-1, cluster scaling events during periods of heavy load can trigger micro-outages at the workspace plan/apply level. These brief disruptions frequently occur due to how FDO manages resource allocation and workload distribution in unsupported versions.

As a result:
- Runs Get Queued: Ongoing and new runs are delayed, entering a queued state.
- No Relief from New Nodes: Even when Kubernetes deploys additional nodes, FDO cannot efficiently rebalance workloads, preventing the expected performance improvement.
- Resource Allocation Lag: FDO struggles to recognize and utilize newly added resources during autoscaling events, compounding delays.

Solutions:

Upgrade to v202501-1 upon where FDO k8s autoscaling is supported
With autoscaling enabled within the k8s environment, adjust the TFE_RUN_PIPELINE_KUBERNETES_WORKER_TIMEOUTsetting in accordance with the recommended configuration

Additional Information

Problem

Prerequisites (if applicable)

Cause

Solutions:

Additional Information

Articles in this section

Related articles