Problem
When enable_metrics_collection is enabled in Terraform Enterprise, you may encounter errors in the Telegraf logs related to Docker socket permissions and the InfluxDB database not being found. This can occur during new installations or after an upgrade.
Example errors from telegraf.stderr log:
E! [inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/info": dial unix /var/run/docker.sock: connect: permission denied E! [outputs.influxdb] When writing to [http://influxdb:8086]: 404 Not Found: database not found: "hashicorp"
Prerequisites
- Terraform Enterprise versions up to
v202112-2.
Cause
A change to the Telegraf Docker image causes the Telegraf process to run as the non-root user telegraf. This user does not have the necessary permissions to access the Docker socket (/var/run/docker.sock), which is owned by root.
Additionally, a failed CA certificate injection can prevent the InfluxDB database from initializing correctly, leading to the 404 Not Found: database not found: "hashicorp" error.
You can confirm the database issue by running the following commands on the instance.
-
Access the
influxdbcontainer shell.$ docker exec -it influxdb bash
-
Attempt to show databases. The command fails because the admin user was never created.
# influx -precision rfc3339 -username $INFLUXDB_ADMIN_USER -password INFLUXDB_ADMIN_USER_PASSWORD # SHOW DATABASES
ERR: error authorizing query: create admin user first or disable authentication Warning: It is possible this error is due to not setting a database. Please set a database with the command "use <database>".
Solutions
The permanent fix for this issue was released in Terraform Enterprise versions after v202112-2. For affected versions, the workaround is to disable metrics collection.
Solution 1: Disable Metrics Collection on Standalone Instances
You can disable metrics collection either through the Replicated Console UI or via the command line.
Option A: Using the Replicated Console
- Log in to the Replicated Console at
https://$TFE_FQDN:8800. - Navigate to the Advanced Settings page.
- Uncheck Enable Metrics Collection.
- Click Save to apply the changes and restart the Terraform Enterprise application.
Option B: Using the Command Line
- Log in to the Terraform Enterprise host via SSH.
-
Execute the following commands to disable metrics collection and apply the configuration.
$ replicatedctl app-config set enable_metrics_collection --value 0 $ replicatedctl app apply-config
Solution 2: Disable Metrics Collection on Active/Active Instances
-
Log in to the secondary node via SSH and drain it.
$ tfe-admin node-drain
-
On the remaining primary node, execute the following commands to disable metrics collection and restart the application.
## Disable metrics collection $ tfe-admin app-config -k enable_metrics_collection -v 0 ## Stop the application and confirm its status $ replicatedctl app stop $ replicatedctl app status ## Start the application $ replicatedctl app start
Outcome
After you disable metrics collection, the influxdb and telegraf containers will be removed. You can confirm their removal by running the following command, which should produce no output.
$ docker ps -a | grep -Ei 'telegraf|influxdb'
Additional Information
- The fix for this issue was proposed in this GitHub pull request.