Introduction:
When using Terraform Enterprise in Kubernetes, ephemeral storage is used by TFE to write logs, specifically within the container backing the path /var/log/terraform-enterprise
. Over time, this storage fills up, causing the pod to crash due to lack of available space and due to low ephemeral storage size. Below are some potential workarounds to address this issue.
Scenario:
When a customer is running Terraform Enterprise (FDO) on Kubernetes, ephemeral storage is used by TFE to write logs. Over time, this storage can fill up, leading to pod crashes. This typically occurs due to a low ephemeral storage allocation. Below is an example of how to check ephemeral storage usage for TFE.
kubectl describe pod <tfe-pod-name> -n <namespace>
....
Limits:
ephemeral-storage: 1Gi
Requests:
ephemeral-storage: 1Gi
Cause:
In general, Supervisord is responsible for managing logs. One of its primary functions is to maintain an activity log that records its operations in real time.
In Terraform Enterprise (TFE), the default Supervisord configuration configures the services' logfile_backups
to 10 and logfile_maxbytes
to 50MB. This means each service can generate up to 10 log files of 50MB each before rotation, resulting in up to 500MB per service. With approximately 20 services, this can consume around 10 GB of ephemeral storage. If the allocated ephemeral storage is set below 2 GB, it will eventually fill up with logs, leading to pod crashes.
Recommendation:
The current recommended ephemeral storage size is approximately 10 GB or higher. To apply this, update the pod or deployment YAML by setting the ephemeral-storage
limit to 10Gi or higher.
Additional Information