Introduction
Terraform Enterprise supports internal monitoring by capturing metrics from each component at runtime. The monitoring process is assisted by dedicated containers, telegraf, and influxdb. They are only available when the enable_metrics_collection is set to 1 in the application configuration or enable the setting Enable Metrics Collection from the Replicated console https://<HOSTNAME>:8800/settings.
Problems
TFE 202104-1 failed to start with the error message below causing ptfe_atlas stuck in the restart loop. The application may experience error 502 Gateway timeout due to the core application is unable to start up. When checking the logs with the command "docker logs ptfe_atlas", the error below will be displayed in the output.
2021-05-07T03:12:38.904354854Z [1] ERROR: ! statsd: notify stats failed:
2021-05-07T03:12:38.904391183Z getaddrinfo: Name does not resolve
2021-05-07T03:12:38.904396035Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:23:in `send'
2021-05-07T03:12:38.904410133Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:23:in `send'
2021-05-07T03:12:38.904414179Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:175:in `block in stats_loop'
2021-05-07T03:12:38.904418070Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:171:in `loop'
2021-05-07T03:12:38.904422120Z /app/vendor/bundle/ruby/2.6.0/gems/puma-plugin-statsd-1.2.1/lib/puma/plugin/statsd.rb:171:in `stats_loop'
Causes
- The previous versions of TFE using a static host file in each container to resolve hostname so there was no issue when disabling the metric and those containers missing, however, with 202104, we introduced a couple of dedicated docker networks to isolate traffic therefore those host files are no longer needed and have been removed, the hostname resolution between containers become dynamic, unfortunately, ptfe_atlas container does not handle this dynamically in the version 202104.
- Terraform Enterprise expects the value of 1 to enable the property enable_metrics_collection and the value of 0 to disable. There are chances that the value is set to true, false, or any other string, this will cause the application startup to fail with the same error due to the application does not recognize the value.
Solutions
- If your Terraform Enterprise is on version v202104-1, it will require you to enable the metric collection or upgrade to v202105-1.
- If your Terraform Enterprise environment is already on v202105-1 and above, please update the Enable Metrics Collection to ensure that it is configured with a valid value.
In order to enable the metric collection, please follow one of these two options:
- Replicated UI
-
- Navigate to https://<HOSTNAME>:8800/settings
- Select the Enable Metrics Collection.
- Save
- Restart the application
-
- Replicated CLI
-
- SSH to Terraform Enterprise
- Execute the command below the enable
-
replicatedctl app-config set enable_metrics_collection --value 1
In order to disable the metric collection, please follow one of these two options:
- Replicated UI
-
- Navigate to https://<HOSTNAME>:8800/settings
- Deselect the Enable Metrics Collection.
- Save
- Restart the application
-
- Replicated CLI
-
- SSH to Terraform Enterprise
- Execute the command below the enable
-
replicatedctl app-config set enable_metrics_collection --value 0