Important Note: The Terraform Enterprise Backup and Restore API is intended for use primarily before migrating to a new TFE host or when transitioning between Production mode types, rather than a routine backup solution.
Problem
Terraform Enterprise (TFE) admins who regularly utilize the backup and restore API may encounter instances where the overall system memory utilization increases, even in the absence of identifiable cause within the application itself.
Prerequisites
- Terraform Enterprise
- Excessive use of the TFE Backup and Restore API
Cause
When invoking the TFE Backup and Restore API on a regular basis, the chance of a stalled backup-restore process increases, leading to a zombie process that continue to consume system memory until the process is killed.
The symptoms can be exacerbated if the TFE instances is attempting to regularly backup large TFE instances.
Solution
How to find and remove a stalled backup-restore
process.
Find zombie processes:
ps -ef | grep backup-restore
Remove zombie process:
kill $(pgrep backup-restore)
Outcome
Once any zombie processes are removed, the consumed TFE system memory should return to it's normal operating range.