Metrics Monitoring for Terraform Enterprise – HashiCorp Help Center

Overview

In using Terraform Enterprise (TFE), it is important to configure monitoring and alerting to proactively detect anomalous incidents, performance degradation, and capture utilization trends.

Performance metrics and log details can be exported from a TFE instance to a number of tools for analysis, including Amazon CloudWatch, Azure Monitor, Google Cloud Operations, and Prometheus DB.

Metrics can be exported from a TFE instance in either Prometheus format or JSON.

Establishing Metric Baselines

When monitoring TFE applications, it is important to establish baselines for metrics in order to surface resource utilization patterns and set appropriate thresholds for alerting. A good rule of thumb here would be to collect metrics for 1-2 weeks before setting alert thresholds. This can also help in indicating if hosts or containers have been under-provisioned for your workloads, in which case adjustments should be made to the underlying resources before establishing a new baseline.

Container Level Metrics

TFE applications are deployed as a collection of docker containers. Performance metrics can be exported from each to monitor resource consumption at the container level. These include (but are not limited to):

CPU usage at the kernel and user space level
Memory usage and set limit
Disk IOPS and byte counts

More details on container level metrics and instructions for setup can be found here.

Host Level Metrics

In addition to container level metrics, it is important to monitor host level metrics to identify if baseline resource utilization has been set appropriately or if any resource limitation is exceeded. There may be differences in the desired metrics to monitor based on operational mode (External Services, Active/Active, Mounted Disk). The following are useful metrics to collect from the TFE instance host, and should serve as a good minimum requirement:

CPU utilization
Memory utilization
Disk space
Disk IOPS (read/write)

NOTE: For TFE instances deployed in External Services or Active/Active mode, metrics should also be collected for the following services in addition to the TFE instance and containers:

* Database (Postgresql, RDS, etc)
* Redis (Active/Active mode only)

Links to relevant metrics can be found in the Appendix.

Appendix I

Export host metrics in AWS to CloudWatch

Export host metrics in Azure to Azure monitor

Export host metrics in GCP to to GCP dashboards

AWS RDS Metrics (SQL)

AWS ElastiCache Metrics (Redis)

Azure Database Metrics (SQL)

Azure Cache (Redis)

GCP Cloud SQL Metrics (SQL)

GCP MemoryStore (Redis)

Appendix II (Examples)

Overview: Terraform Enterprise Monitoring

Example: Monitoring with Azure Monitor

Example: Monitoring within GCP

Additional Information