Problem
When restarting Replicated services as a part of Terraform Enterprise maintenance or troubleshooting, the replicated
, replicated-operator
, and replicated-ui
services exit with status 125
:
$ systemctl status replicated replicated-ui replicated-operator
● replicated.service - Replicated Service
Loaded: loaded (/etc/systemd/system/replicated.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2023-01-16 23:33:18 UTC; 255ms ago
Process: 395172 ExecStartPre=/usr/bin/docker rm -f replicated (code=exited, status=0/SUCCESS)
Process: 395197 ExecStartPre=/bin/mkdir -p /var/run/replicated /var/lib/replicated /var/lib/replicated/statsd /var/lib/replicated/retraced (code=exited, status=0/SUCCESS)
Process: 395200 ExecStartPre=/bin/chown -R 1001:998 /var/run/replicated /var/lib/replicated/branding (code=exited, status=0/SUCCESS)
Process: 395203 ExecStartPre=/bin/chown 1001:998 /var/lib/replicated /var/lib/replicated/statsd /var/lib/replicated/retraced (code=exited, status=0/SUCCESS)
Process: 395214 ExecStartPre=/bin/chmod -R 755 /var/lib/replicated/tmp (code=exited, status=0/SUCCESS)
Process: 395215 ExecStart=/usr/bin/docker run --name=replicated -p 9874-9879:9874-9879/tcp -u 1001:998 -v /var/lib/replicated:/var/lib/replicated -v /var/run/docker.sock:/host/var/run/docker.sock -v /proc:>
Main PID: 395215 (code=exited, status=125)
● replicated-ui.service - Replicated Service
Loaded: loaded (/etc/systemd/system/replicated-ui.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2023-01-16 23:33:18 UTC; 301ms ago
Process: 395171 ExecStartPre=/usr/bin/docker rm -f replicated-ui (code=exited, status=0/SUCCESS)
Process: 395195 ExecStartPre=/bin/mkdir -p /var/run/replicated (code=exited, status=0/SUCCESS)
Process: 395198 ExecStartPre=/bin/chown -R 1001:998 /var/run/replicated (code=exited, status=0/SUCCESS)
Process: 395201 ExecStart=/usr/bin/docker run --name=replicated-ui -p 8800:8800/tcp -u 1001:998 -v /var/run/replicated:/var/run/replicated --security-opt label=type:spc_t $REPLICATED_UI_OPTS replicated/rep>
Main PID: 395201 (code=exited, status=125)
● replicated-operator.service - Replicated Operator Service
Loaded: loaded (/etc/systemd/system/replicated-operator.service; enabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2023-01-16 23:33:18 UTC; 290ms ago
Process: 395170 ExecStartPre=/usr/bin/docker rm -f replicated-operator (code=exited, status=0/SUCCESS)
Process: 395194 ExecStartPre=/bin/mkdir -p /var/run/replicated-operator /var/lib/replicated-operator (code=exited, status=0/SUCCESS)
Process: 395196 ExecStartPre=/bin/chown -R 1001:998 /var/run/replicated-operator (code=exited, status=0/SUCCESS)
Process: 395199 ExecStartPre=/bin/chown 1001:998 /var/lib/replicated-operator (code=exited, status=0/SUCCESS)
Process: 395202 ExecStart=/usr/bin/docker run --name=replicated-operator -u 1001:998 -v /var/lib/replicated-operator:/var/lib/replicated-operator -v /var/run/replicated-operator:/var/run/replicated-operato>
Main PID: 395202 (code=exited, status=125)
Viewing the units' logs in the journal shows the services' start commands failed with the following Docker errors:
$ journalctl -u replicated -u replicated-ui -u replicated-operator -n 18
-- Logs begin at Mon 2023-01-16 19:16:40 UTC, end at Mon 2023-01-16 23:41:38 UTC. --
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: Started Replicated Service.
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: Started Replicated Service.
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: Started Replicated Operator Service.
Jan 16 23:41:38 ip-10-0-173-9 docker[402028]: Unable to find image 'replicated/replicated-ui:current' locally
Jan 16 23:41:38 ip-10-0-173-9 docker[402038]: Unable to find image 'replicated/replicated-operator:current' locally
Jan 16 23:41:38 ip-10-0-173-9 docker[402031]: Unable to find image 'replicated/replicated:current' locally
Jan 16 23:41:38 ip-10-0-173-9 docker[402028]: docker: Error response from daemon: manifest for replicated/replicated-ui:current not found: manifest unknown: manifest unknown.
Jan 16 23:41:38 ip-10-0-173-9 docker[402028]: See 'docker run --help'.
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: replicated-ui.service: Main process exited, code=exited, status=125/n/a
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: replicated-ui.service: Failed with result 'exit-code'.
Jan 16 23:41:38 ip-10-0-173-9 docker[402031]: docker: Error response from daemon: manifest for replicated/replicated:current not found: manifest unknown: manifest unknown.
Jan 16 23:41:38 ip-10-0-173-9 docker[402031]: See 'docker run --help'.
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: replicated.service: Main process exited, code=exited, status=125/n/a
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: replicated.service: Failed with result 'exit-code'.
Jan 16 23:41:38 ip-10-0-173-9 docker[402038]: docker: Error response from daemon: manifest for replicated/replicated-operator:current not found: manifest unknown: manifest unknown.
Jan 16 23:41:38 ip-10-0-173-9 docker[402038]: See 'docker run --help'.
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: replicated-operator.service: Main process exited, code=exited, status=125/n/a
Jan 16 23:41:38 ip-10-0-173-9 systemd[1]: replicated-operator.service: Failed with result 'exit-code'.
Prerequisites
Cause
This can occur if the Replicated images were inadvertently removed from the local image repository during system maintenance (for example, through the use of docker system prune
, docker image prune
, or docker rmi
). When Replicated is installed through the Terraform Enterprise install script, its services' container images are pulled from a remote registry and tagged with current
, which is why attempting to pull the images will not resolve this. These tagged images are then referenced in the docker run
commands in their respective Systemd units. Without these local images, Docker is unable to start the Replicated containers and the commands fail with the errors above.
Solution
To resolve this, you can run the Terraform Enterprise install script again to restore the state of the local repository. Alternatively, you can manually tag the Replicated images with the following steps if they still exist in the local repository:
-
Run
docker image ls
and locate thereplicated/replicated
,replicated/replicated-operator
, andreplicated/replicated-ui
images
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
replicated/replicated stable-2.54.1 3f4f8cf65a77 5 weeks ago 348MB
replicated/replicated-ui stable-2.54.1 24cff54683d7 5 weeks ago 138MB
replicated/replicated-operator stable-2.54.1 c01aa42bf17b 5 weeks ago 129MB
registry.replicated.com/library/retraced 1.3.53 5d07c23ca0cd 5 weeks ago 407MB
registry.replicated.com/library/retraced-nsq 1.3.53 6d99361f86c0 5 weeks ago 120MB
registry.replicated.com/library/retraced-postgres 1.3.53 38f311ac3e42 5 weeks ago 81MB
registry.replicated.com/library/premkit v1.4.5 462904c6af4d 5 weeks ago 86.5MB
-
Tag those images with the latest version tag with
current
docker tag replicated/replicated:stable-2.54.1 replicated/replicated:current
docker tag replicated/replicated-operator:stable-2.54.1 replicated/replicated-operator:current
docker tag replicated/replicated-ui:stable-2.54.1 replicated/replicated-ui:current
Once complete, the replicated
, replicated-operator
, and replicated-ui
services should successfully start in the next restart attempt.