Introduction
Terraform Enterprise v202103-1
introduced logic to upgrade the internally-managed PostgreSQL data from PostgreSQL 9.5 to PostgreSQL 12. This change only affects proof of concept and mounted disk installations.
Problem
Replicated, a service that Terraform Enterprise uses to schedule its containers, has a predefined timeout for how long it will wait for certain events to trigger during the startup or upgrade phase. When this timeout is reached, Replicated will mark Terraform Enterprise as failed to start and show an error. While the timeout is configurable, there is no way to reliably know how long certain events may take as the time needed depends on variables such as disk space used, IOPS, etc.
Operators upgrading from a previous version of Terraform Enterprise to Terraform Enterprise v202103-1
may run into the following error:
Timeout waiting for event PostgreSQL Upgraded
Cause
This error occurs when the PostgreSQL data upgrade exceeded the Replicated timeout value.
Solution
Once a Replicated timeout is reached and the Terraform Enterprise application is marked as failed to start, the actual containers running the Terraform Enterprise application may still be in the running state until the operator intervenes. As such, it is recommended to check the logs of the ptfe_postgres_upgrade
(Proof of Concept) or ptfe_postgres_upgrade_disk
(Mounted Disk) containers to see what the current status is.
For proof of concept installations:
$ sudo docker logs -f ptfe_postgres_upgrade
For mounted disk installations:
$ sudo docker logs -f ptfe_postgres_upgrade_disk
If the last line of the container logs reads:
PostgreSQL data upgrade completed successfully! Exiting.
Then the PostgreSQL data upgrade has successfully completed and the Terraform Enterprise application should be restarted to begin to use the newly upgraded PostgreSQL data.
$ replicatedctl app stop
$ replicatedctl app start
If the container logs are actively updating, that means the PostgreSQL data upgrade is still in progress. In this case, it is recommended to continue to watch the container logs until either an error occurs or the PostgreSQL data is successfully upgraded. In the case of an error, detailed information should be provided so the operator can take relevant next steps.
Additional Information
Here are some related articles.