Problem
When running Terraform Enterprise (TFE) in a Kubernetes environment, pods may enter a crash loop if the TFE image has been improperly customised. Specifically, if the image is modified to run as the root
user instead of the built-in terraform-enterprise
user (e.g., to add certificates), the nginx component may fail to start due to permission denied errors when accessing log files.
You may observe the following symptoms:
1. TFE Pods enter a crash loop during startup
2. Logs from pods contains permission denied errors, such as
{"component":"nginx","log":"nginx: [alert] could not open error log file: open() \"/var/log/terraform-enterprise/nginx.log\" failed (13: Permission denied)"}
3. Nginx exits unexpectedly, with log entries like:
INFO exited: nginx (exit status 1; not expected)
4. Logs show Supervisord runs as root
instead of the expected terraform-enterprise
user,
Not running as builtin tfe user, will attempt to create scratch directories but skipping ownership changes... CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file.
5. You can also confirm by checking the running user inside the container with command id
. If the user running the process is root(UID 0)
instead of terraform-enterprise(UID 1000)
, this tells you the image got problem.
Cause
The root cause of this issue is an improperly customised TFE image that runs the application as the root
user instead of the built-in terraform-enterprise
user. TFE is designed to operate as the non-root terraform-enterprise
user for security and proper permission management.
When the TFE image has to be customised (e.g., to add certificates ), the user may be switched to root
to perform some privileged operations. In addition, if no USER
is specified in Dockerfile, Docker will default running commands as the root
user. In this situation, if the image does not revert to terraform-enterprise
afterward, the application still runs as root
, leading to permission issues.
Solution
Ensure the customised TFE image runs as the terraform-enterprise
user after any modifications. It is required to switch back to the non-root terraform-enterprise
user once those operations are complete, for example:
USER root ... RUN update-ca-certificates USER terraform-enterprise
Then you can rebuild the docker image with the updated configuration, and redeploy using the updated image.
Once again, you can always confirm the change by checking the running user inside the TFE container. It should return the terraform-enterprise
user.
$ kubectl exec <TFE pod name> -n <TFE namespace> -- id
Outcome
TFE pods should start successfully.
References