Introduction
When using Terraform Enterprise (TFE), you should configure monitoring and alerting to proactively detect anomalous incidents, performance degradation, and capture utilization trends.
You can export performance metrics and log details from a TFE instance to several analysis tools, including Amazon CloudWatch, Azure Monitor, Google Cloud Operations, and Prometheus DB. Metrics can be exported from a TFE instance in either Prometheus format or JSON.
Recommendation
Establish Metric Baselines
When monitoring TFE applications, you must establish baselines for metrics to surface resource utilization patterns and set appropriate thresholds for alerting. As a general guideline, collect metrics for one to two weeks before setting alert thresholds. This process can also indicate if hosts or containers have been under-provisioned for your workloads. If so, you should adjust the underlying resources before establishing a new baseline.
Monitor Container-Level Metrics
TFE applications are deployed as a collection of Docker containers. You can export performance metrics from each container to monitor resource consumption. Key metrics include, but are not limited to:
- CPU usage at the kernel and user space level
- Memory usage and set limit
- Disk IOPS and byte counts
For more details, refer to the official documentation on Terraform Enterprise monitoring.
Monitor Host-Level Metrics
In addition to container-level metrics, you should monitor host-level metrics to identify if baseline resource utilization has been set appropriately or if any resource limitation is exceeded. The specific metrics to monitor may differ based on your operational mode (External Services, Active/Active, or Mounted Disk). The following are useful metrics to collect from the TFE instance host and should serve as a minimum requirement:
- CPU utilization
- Memory utilization
- Disk space
- Disk IOPS (read/write)
Monitor External Services
For TFE instances deployed in External Services or Active/Active mode, you must also collect metrics for the following services in addition to the TFE instance and containers:
- Database (
Postgresql,RDS, etc.) -
Redis(Active/Active mode only)
Additional Resources
Official Documentation
- Reference Architecture
- Capacity and Performance
- Terraform Enterprise Metrics
- Container Metrics
- Terraform Enterprise and Postgres Utilization
Cloud Provider Monitoring Guides
- AWS
- Azure
- GCP