Problem
In some Terraform Enterprise environments, the /var/lib/replicated/retraced directory may grow excessively, consuming a large amount of disk space and potentially causing outages.
Prerequisites
- A Terraform Enterprise instance installed via the Replicated method.
Cause
Terraform Enterprise installations managed by Replicated use an audit system to log events, such as access to the Replicated admin console or command-line operations. These audit events are stored in a database within the /var/lib/replicated/retraced directory.
The directory can grow unexpectedly due to a high volume of audit events generated by:
- A misconfigured load balancer health check that repeatedly hits an authentication endpoint like
https://<TFE_FQDN>/authenticate. - Automated scripts or monitoring tools that frequently run the
replicatedctl app statuscommand.
Over time, these repeated actions can generate a large volume of audit data, causing the directory to fill the available disk space.
Solutions
This issue requires two approaches: first, remediating the underlying cause to prevent recurrence, and second, cleaning up the existing data to reclaim disk space.
Solution 1: Remediate the Cause
To prevent the audit log from growing excessively, you must address the source of the high-volume events.
Correct the Load Balancer Health Check
Update your frontend load balancer to use the dedicated health check endpoint. This endpoint does not generate an audit event.
- Incorrect Endpoint:
https://<TFE_FQDN>:8800/authenticate - Correct Endpoint:
https://<TFE_FQDN>:8800/ping
- Incorrect Endpoint:
Review Automated Scripts
Ensure that no scheduled jobs or monitoring scripts are running the
replicatedctl app statuscommand at a high frequency.
Solution 2: Clean Up Existing Audit Data
After addressing the cause, you can safely remove the existing audit data to reclaim disk space.
Check the disk usage of the directories under
/var/lib/replicatedto confirm thatretracedis the primary consumer of space.# cd /var/lib/replicated # du -sh *
Stop the Replicated service, remove the
retraceddirectory, and restart the service. Replicated will recreate the directory with a clean database upon restart.# sudo systemctl stop replicated # sudo rm -r /var/lib/replicated/retraced/ # sudo systemctl start replicated
Outcome
After completing these steps, the size of the /var/lib/replicated/retraced directory will be significantly reduced, freeing up disk space. The corrective measures in Solution 1 will prevent the issue from recurring.
Additional Information
For more details on managing your Terraform Enterprise instance, please refer to the official Terraform Enterprise documentation.