Problem
When troubleshooting Terraform Enterprise, using docker prune commands without careful consideration can cause significant operational delays. This is especially true when addressing disk space issues on the host machine.
Aggressive pruning commands like docker system prune -a can delete necessary images and stopped containers. In an online environment, this forces a time-consuming re-download of all required images. In an airgapped environment, this can be catastrophic if the installation bundle is not readily available, leading to extended downtime.
Cause
Docker considers any resource not actively referenced by a running container to be "unused." This includes stopped containers and the images they use, which may still be critical for Terraform Enterprise operations or for recovering the system. Commands like docker system prune -a or docker system prune --volumes remove all of these "unused" resources, which can prolong the recovery of a production system.
Solutions
Solution 1: Use Safe Pruning Commands in Online Environments
Before removing any resources, inspect your Docker environment to understand resource usage and prevent the accidental deletion of operationally necessary components.
-
To view a list of all containers, including stopped ones, run the following command.
$ docker ps -a
-
To see a breakdown of Docker's disk usage, run the following command.
$ docker system df
-
To list all Docker volumes, run the following command.
$ docker volume ls
Safe Pruning Strategies
Adopt a more targeted approach to pruning to avoid removing critical resources.
-
Prune stopped containers only: This command removes only stopped containers, which is a safer first step.
$ docker container prune -f
-
Prune dangling images only: This command removes only dangling images (images that are not tagged and are not referenced by any container).
$ docker image prune -f
-
Use time-based filters: This command targets older, unused images, leaving more recent ones intact.
$ docker image prune --filter "until=24h"
Understanding Destructive Commands
It is critical to understand the difference between safe and destructive docker image prune flags.
-
Safe Command: The
docker image prunecommand without the-aflag is a safe approach that only deletes dangling images.$ docker image prune
-
Destructive Command: The
docker image prune -acommand is potentially destructive, as it deletes all unused images from the machine, including those that may be needed for Terraform Enterprise to function correctly.$ docker image prune -a
In Terraform Enterprise environments, pay special attention to critical TFE core images, custom worker images, and database container images. Do not remove images from the host even when they appear unused, as they may be required for specific operations.
Solution 2: Implement Backup Procedures in Airgapped Environments
For airgapped environments where re-downloading images is not possible, you must take extra precautions.
-
Create Backups: Before performing any pruning, export critical images to a tarball as a backup.
$ docker save -o tfe-images-backup.tar <image_name_1> <image_name_2>
- Maintain an Inventory: Keep a documented inventory of all critical Terraform Enterprise images that should never be pruned.
Additional Information
For more details on Docker's pruning commands, refer to the official documentation.