Problem
Scheduled snapshots configured in the Terraform Enterprise dashboard are failing. The dashboard may display an error message indicating the failure.
Cause
This issue can occur due to several factors, including insufficient storage space, limited Docker volume capacity, temporary file conflicts, or a snapshot timeout that is too short for the operation to complete.
Solutions
Follow these solutions sequentially to diagnose and resolve the issue.
Solution 1: Verify and Clear Disk Space
Ensure that the Terraform Enterprise host has sufficient disk space and inodes available.
-
Check for available disk space.
# df -Ph
-
Check for available inodes.
# df -i
If space is low, you must free up disk space on the host machine.
Solution 2: Prune Docker Volumes
Insufficient space within Docker's storage can also cause snapshot failures. You can clear unused Docker containers and volumes to reclaim space.
Refer to the guide on Automating Docker container and volume pruning in Terraform Enterprise for detailed procedures.
Solution 3: Clear the Replicated Temporary Directory
Lingering files in the Replicated temporary directory can sometimes interfere with the creation of new snapshots. Clear the contents of this directory.
# rm /var/lib/replicated/tmp/*
Solution 4: Increase the Snapshot Timeout
By default, a snapshot operation is allowed two minutes to complete. For larger installations, this may not be enough time. Increase the timeout value to 10 minutes.
# replicatedctl params set SnapshotsTimeout --value 10m
Additional Information
Please be aware that these Replicated snapshots only store data required by the Replicated management console and are used for tasks like restoring the console configuration. This is not a complete backup of your Terraform Enterprise application data.
For comprehensive backup and recovery strategies for your Terraform Enterprise environment, refer to the Backup and Recovery for Terraform Enterprise documentation.