Introduction
Terraform Enterprise supports internal monitoring by capturing metrics from each component at runtime. The monitoring process is assisted by dedicated containers, telegraf, and influxdb. They are only available when the enable_metrics_collection is set to 1 in the application configuration or enable the setting Enable Metrics Collection from the Replicated console https://<HOSTNAME>:8800/settings.
Problems
TFE 202104-1 failed to start with the error message below causing ptfe-atlas or tfe-atlas (for TFE v202205-1
or above) stuck in the restart loop. The application may experience error 502 Gateway timeout due to the core application is unable to start up. When checking the logs with the command "docker logs ptfe-atlas", the error below will be displayed in the output.
2021-05-07T03:12:38.904354854Z [1] ERROR: ! statsd: notify stats failed:
2021-05-07T03:12:38.904391183Z getaddrinfo: Name does not resolve
2021-05-07T03:12:38.904396035Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:23:in `send'
2021-05-07T03:12:38.904410133Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:23:in `send'
2021-05-07T03:12:38.904414179Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:175:in `block in stats_loop'
2021-05-07T03:12:38.904418070Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:171:in `loop'
2021-05-07T03:12:38.904422120Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:171:in `stats_loop'
Causes
- The previous versions of TFE using a static host file in each container to resolve hostname so there was no issue when disabling the metric and those containers missing, however, with
v202104
, we introduced a couple of dedicated docker networks to isolate traffic therefore those host files are no longer needed and have been removed, the hostname resolution between containers become dynamic, unfortunately, ptfe_atlas container does not handle this dynamically in the versionv202104
. - Terraform Enterprise expects the value of 1 to enable the property enable_metrics_collection and the value of 0 to disable it. There are chances that the value is set to true, false, or any other string, this will cause the application startup to fail with the same error due to the application does not recognize the value.
NOTE: For Terraform Enterprise v202205-1
or later the container names have changed as the "p" has been dropped, such as for example in comparison pre v202205-1
container names would suggest ptfe-atlas where v202205-1
and above would reflect tfe-atlas.
Solutions
- If your Terraform Enterprise is on version
v202104-1
, it will require you to enable the metric collection or upgrade to v202105-1. - If your Terraform Enterprise environment is already on v202105-1 and above, please update the Enable Metrics Collection to ensure that it is configured with a valid value.
In order to enable the metric collection, please follow one of these two options:
- Replicated UI
-
- Navigate to https://<HOSTNAME>:8800/settings
- Select the Enable Metrics Collection.
- Save
- Restart the application
-
- Replicated CLI
-
- SSH to Terraform Enterprise
- Execute the command below the enable
-
replicatedctl app-config set enable_metrics_collection --value 1
In order to disable the metric collection, please follow one of these two options:
- Replicated UI
-
- Navigate to https://<HOSTNAME>:8800/settings
- Deselect the Enable Metrics Collection.
- Save
- Restart the application
-
- Replicated CLI
-
- SSH to Terraform Enterprise
- Execute the command below the enable
-
replicatedctl app-config set enable_metrics_collection --value 0