Terraform Enterprise Replicated app won't start after upgrade with and error manifest unknown – HashiCorp Help Center

Introduction

After performing Terraform Enterprise (TFE) upgrade in Replicated installation method, the application won't start. The error message shows: Error response from daemon: manifest for replicated/replicated:current not found: manifest unknown: manifest unknown.

Problem

There are supported Docker versions and Replicated version with each versions of TFE release.

If the Docker installation version in TFE host was upgraded during the upgrade process, it will pull the services container images from a remote repository, these images are tagged as current. The images are referenced during TFE startup in their respective systemd service file. When Replicated services files are manually changed to specific version the start up will failed with and error message shown below:

Dec 30 23:28:33 tfehost systemd[1]: Starting Replicated Service...
Dec 30 23:28:33 tfehost docker[21167]: Error: No such container: replicated
Dec 30 23:28:33 tfehost systemd[1]: Started Replicated Service.
Dec 30 23:28:33 tfehost docker[21185]: Unable to find image 'replicated/replicated:current' locally
Dec 30 23:28:34 tfehost docker[21185]: docker: Error response from daemon: manifest for replicated/replicated:current not found: manifest unknown: manifest unknown.
Dec 30 23:28:34 tfehost docker[21185]: See 'docker run --help'.
Dec 30 23:28:34 tfehost systemd[1]: replicated.service: main process exited, code=exited, status=125/n/a
Dec 30 23:28:34 tfehost systemd[1]: Unit replicated.service entered failed state.
Dec 30 23:28:34 tfehost systemd[1]: replicated.service failed.
Dec 30 23:28:41 tfehost systemd[1]: replicated.service holdoff time over, scheduling restart.
Dec 30 23:28:41 tfehost systemd[1]: Stopped Replicated Service.

Cause

The error can happen when TFE version is changed or Docker version is upgraded and the replicated version is manually set to specific version in a service file.

Docker is unable to start the Replicated containers and the commands fail with the errors above.

Replicated service file (replicated.service) is manually edited and the version number are passed with a specific version e.g replicated/replicated:stable-2.53.2 as opposed to replicated/replicated:current

# pwd
/etc/systemd/system

system# cat replicated.service
-------some output omitted for brevity---
 $REPLICATED_OPTS \
replicated/replicated:current
ExecStop=/usr/bin/docker stop replicated
Restart=on-failure
RestartSec=7

-------------------------
system# cat replicated.service
-------some output omitted for brevity---
$REPLICATED_OPTS \
replicated/replicated:stable-2.53.2
ExecStop=/usr/bin/docker stop replicated
Restart=on-failure
RestartSec=7

Overview of possible solutions (if applicable)

Solutions:

Do not edit the TFE services file manually as it could lead to startup problems
Revert the changes if any service file are edited by hand manually.

Outcome

Terraform Enterprise application start up fine and user should be able to perform Terraform Runs.