Prerequisites
- All TFE versions up to v202112-2 ~ 590.
Problem
This error can be experienced on new installs or upgrades if enable_metrics_collection
is enabled.
Cause
While enable_metrics_collection
is enabled the following errors are found in the telegraf logs:
2021-12-22T17:48:50Z E! [inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D&limit=0": dial unix /var/run/docker.sock: connect: permission denied
2021-12-22T17:48:53Z E! [outputs.influxdb] When writing to [http://influxdb:8086]: 404 Not Found: database not found: "hashicorp"
There was a change made to the telegraf docker image and now the telegraf process runs as user telegraf instead of root.
docker exec -it telegraf ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
telegraf 1 0.5 0.5 5025432 92880 ? Ssl 17:45 0:01 telegraf
getent group docker
docker:x:998:replicated,azureuser
docker exec -i telegraf ls -l /var/run/docker.sock
srw-rw---- 1 root 998 0 Dec 20 22:27 /var/run/docker.sock
Gather a support-bundle and check the following files:
- primary/app/logs/telegraf.stdout
Skipping CA certificate injection. Destination CA certificate file /etc/ssl/certs/ca-certificates.crt does not exist or is not writable.
2. primary/app/logs/telegraf.stderr
E! [inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/info": dial unix /var/run/docker.sock: connect: permission denied
E! [inputs.docker] Error in plugin: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D&limit=0": dial unix /var/run/docker.sock: connect: permission denied
E! [outputs.influxdb] When writing to [http://influxdb:8086]: 404 Not Found: database not found: "hashicorp"
The failed CA certificate injection in the output above causes the InfluxDB database not to initialize and the confirmation is the 404 Not Found: database not found: "hashicorp"
error in the telegraf.stderr
log.
This may also be confirmed on the instance by running:
docker exec -it influxdb bash
influx -precision rfc3339 -username $INFLUXDB_ADMIN_USER -password INFLUXDB_ADMIN_USER_PASSWORD
SHOW DATABASES
ERR: error authorizing query: create admin user first or disable authentication
Warning: It is possible this error is due to not setting a database.
Please set a database with the command "use <database>".
Solution
A fix has been proposed in this GitHub PR and will ship in the next release of Terraform Enterprise.
( v202112-2 ~ 590 is the current release at the time of this writing.)
Workaround:
For Standalone Installs
-
Login to the replicated console https://$TFE_FQDN:8800/settings#advanced.
-
Uncheck "Enable Metrics Collection" .
-
Click Save to restart the TFE application.
OR
Login to the Terraform Enterprise host via SSH and execute the following commands:
replicatedctl app-config set enable_metrics_collection --value 0
replicatedctl app apply-config
For Active/Active Installs
Login via SSH and execute the following commands:
tfe-admin node-drain # Execute on secondary node if existing before scaling in to 1 node
tfe-admin app-config -k enable_metrics_collection -v 1 # Execute on the remaining node before scaling out
replicatedctl app stop
watch replicatedctl app status # Confirm the App has successfully stopped
replicatedctl app start
Workaround Outcome
Both influxDB and telegraf containers will be removed , please confirm with:
docker ps -a|grep -Ei 'telegraf|influxdb' # Output will not return any results
Additional Information
-
Please contact HashiCorp Support to request further assistance.