Problem
After initial deployment of TFE FDO on GCP, it seems to be running properly (container was created, UI is coming up, you are able to authenticate and create a new Workspace), however, the Health Check may fail with ERROR: error reaching http://127.0.0.1:7675/healthz: Get "http://127.0.0.1:7675/healthz": dial tcp 127.0.0.1:7675: connect: connection refused
In addition to that, a test Terraform run fails with the following error:
There was an error connecting to Terraform Cloud.
Please do not exit Terraform to prevent data loss! Trying to restore the connection...
│ Error: Failed to create configuration version: internal server error
│ Terraform Cloud returned an unexpected error. Sometimes this is caused by network connection problems, in which case you could retry the
│ command. If the issue persists please open a support ticket to get help resolving the problem.
Cause
Google Cloud Storage JSON credentials were not provided in the Docker Compose configuration (TFE_OBJECT_STORAGE_GOOGLE_CREDENTIALS environment variable)
This can be confirmed by either:
- executing following docker command
docker exec -it terraform-enterprise-tfe-1 bash -c "cat /var/log/terraform-enterprise/archivist.log"
- checking the archivist.log in the Support Bundle:
{"@level":"error","@message":"failed to start server","@module":"archivist","@timestamp":"2024-03-02T09:19:04.449099Z","err":"failed querying bucket attrs: Get \"https://XYZ?alt=json\u0026prettyPrint=false\u0026projection=full\": metadata: GCE metadata \"instance/service-accounts/default/token\" not defined"}
Solution
Depending on how GCP infrastructure was provisioned, you can obtain the required JSON by:
-
Following this guide (manual configuration on the Google Cloud portal)
- Using
google_service_account_key.example.private_key
attribute (Terraform code)
When it's done, TFE FDO needs to be re-deployed with an updated Docker Compose file.
Outcome
docker compose exec tfe tfe-health-check-status
succeeds, and Terraform runs are processing without any error.