Problem
When you run Terraform Enterprise in a Kubernetes environment, pods may crash due to ephemeral storage exhaustion. This occurs because logs written to the /var/log/terraform-enterprise directory eventually consume all allocated space, particularly if the ephemeral storage limit is set too low.
You can check the ephemeral storage usage for the Terraform Enterprise pod by running the following command.
$ kubectl describe pod <tfe-pod-name> -n <namespace> ## ... Limits: ephemeral-storage: 1Gi Requests: ephemeral-storage: 1Gi ## ...
Cause
Terraform Enterprise uses Supervisord to manage service logs. The default Supervisord configuration sets logfile_backups to 10 and logfile_maxbytes to 50MB. This configuration allows each of the approximately 20 services to generate up to 10 log files of 50MB each before log rotation, resulting in a potential storage consumption of 500MB per service.
With around 20 services, the total log storage can reach nearly 10GB. If the pod's ephemeral storage is allocated at a low value, such as 1Gi or 2Gi, it will eventually fill up, causing the pod to crash.
Solutions
Solution 1: Increase Ephemeral Storage Limit
The recommended ephemeral storage size for a Terraform Enterprise pod is 10Gi or higher. To apply this change, update your pod or deployment YAML manifest by setting the ephemeral-storage limit and request to the recommended value.
## ...
resources:
limits:
ephemeral-storage: "10Gi"
requests:
ephemeral-storage: "10Gi"
## ...Outcome
After applying the updated configuration, the Terraform Enterprise pod will have sufficient ephemeral storage to accommodate log rotation without crashing, leading to improved stability.