Problem
Terraform Enterprise fails to start when either the /var
or Docker root directory mount (docker info -f '{{ .DockerRootDir}}'
) use the noexec
option.
Prerequisites
- Terraform Enterprise
Cause
- The
noexec
option is set for/var
or Docker root directory in/etc/fstab
, example:UUID=920cd144-09bc-487c-ad56-0f09fdd53cb9 /var xfs nodev,noexec,nosuid 0 2
UUID=cd2822cd-6ed5-4bda-9912-eb013f62d763 /var/lib/docker xfs nodev,noexec,nosuid 0 2 - If the
noexec
option has been applied to the mount before Terraform Enterprise is installed the failure happens during the bootstrapping phase:# tfe-bootstrap
/tfe-bootstrap.sh: line 12: /build-worker-metadata-firewall.sh: Permission denied
# TFE startup failed
[
{
"AppID": "0129182acab44ab5534b4b6205257550",
"Sequence": 722,
"PatchSequence": 0,
"State": "stopped",
"DesiredState": "started",
"Error": "Container tfe-bootstrap failed: Container 31da3bdbdcbdb79ef7513eda30cdd9f968e46b84e26c6d20be13a39144696ea8 exited with non-zero exit status 126: ",
"IsCancellable": false,
"IsTransitioning": false,
"LastModifiedAt": "2023-07-28T14:17:30.278240957Z"
}
] - When the
noexec
option is applied after installing Terraform Enterprise there is a startup failure in multiple containers includingtfe-vault
,tfe-health-check
,tfe-state-parser
,tfe-backup-restore
andtfe-base-startup
. Finally the Terraform Enterprise application startup will timeout leaving it in a failed state.
# tfe-vault
...
+ starting vault
...
Vault is already initialized
+ killing vault with pid 23
==> Vault shutdown triggered
+ vault has exited
+ exiting vault setup with 0
+ Retrieving Vault unseal key
+ Retrieving Vault root token
+ Setting IPC lock...
/usr/bin/vault-start: line 182: /gosu/gosu: Permission denied
+ Starting vault
# tfe-health-check
/usr/bin/setup-ca-certificates.sh: exec: line 41: /gosu/gosu: Permission denied
# tfe-state-parser
/usr/bin/setup-ca-certificates.sh: exec: line 41: /gosu/gosu: Permission denied
# tfe-backup-restore
INFO: Vault token retrieval timeout not yet reached
ERROR: Operation timed out waiting for vault token
# tfe-base-startup
INFO: Vault token retrieval timeout not yet reached
ERROR: Operation timed out waiting for vault token
# TFE startup failed
[
{
"AppID": "6e613d8ecae148c3642c6588fe75b597",
"Sequence": 722,
"PatchSequence": 0,
"State": "stopped",
"DesiredState": "started",
"Error": "Container tfe-base-startup failed: Container 56bd2bc2dcd8f523e281923354fe41fd913e0c12f9f569fdd82c3a5115eb0d23 exited with non-zero exit status 1: ",
"IsCancellable": false,
"IsTransitioning": false,
"LastModifiedAt": "2023-07-19T15:37:12.954203171Z"
}
]
Solution #1:
- Stop the Terraform Enterprise application, Replicated services, Docker and associated services.
# Stop TFE
replicatedctl app stop
# Verify it has successfully stopped before proceeding
replicatedctl app status
# Stop Replicated services
sudo systemctl stop replicated replicated-operator replicated-ui
# Stop Docker
sudo systemctl stop docker
# Stop Docker socket
sudo systemctl stop docker.socket
# Stop containerd
sudo systemctl stop containerd - Edit
/etc/fstab
and remove thenoexec
option from/var
or Docker root directory mount, save the file changes and remount the partition.
# Use your favorite editor, example: vim
sudo vim /etc/fstab
# Example entry in fstab for /var/lib/docker
UUID=cd2822cd-6ed5-4bda-9912-eb013f62d763 /var/lib/docker xfs nodev,nosuid 0 2
# Remount the partition
sudo mount -o remount /var/lib/docker - Start Docker, verify Replicated has started successfully and start Terraform Enterprise
# Start Docker
sudo systemctl start docker
# Verify Replicated is ready before proceeding
replicatedctl system status
# Start Terraform Enterprise
replicatedctl app start
Solution #2:
- If to comply with internal policies, is necessary to maintain the
/var
partition mounted with thenoexec
option, besides ensuring that/var/lib/docker
or alternative Docker root directory mount excludes thenoexec
option, an additional partition must be created for the/var/lib/replicated-operator
mount with a minimum of 2GB and also exempt from using the aforementioned option.
Outcome
The Terraform Enterprise application will now start successfully. Watch the containers being launched and check the logs:
- Watch the docker containers being launched
watch docker ps -f network=tfe_services
- Check the previously failed containers logs to confirm the issue has been resolved, example:
$ docker logs -f tfe-vault
...
+ Configuring Vault
+ mounting vault transit backend
+ tuning vault lease TTLs
Success! Data written to: sys/mounts/auth/token/tune
+ adding vault policy
Success! Uploaded policy: atlas
+ Creating vault token for use by services
+ Successfully created vault token