Problem
Database migrations fail upon upgrade to Terraform Enterprise >= v202305-1 on the RunPlanOnlyNonNull
migration with error PG::CheckViolation:
ERROR: check constraint \"runs_plan_only_not_null\" is violated by some row"
and a stack trace resembling the following.
{"component":"atlas","log":"StandardError: An error has occurred, all later migrations canceled:"}
{"component":"atlas","log":""}
{"component":"atlas","log":"PG::CheckViolation: ERROR: check constraint \"runs_plan_only_not_null\" is violated by some row"}
...
{"component":"atlas","log":"Caused by:"}
{"component":"atlas","log":"ActiveRecord::StatementInvalid: PG::CheckViolation: ERROR: check constraint \"runs_plan_only_not_null\" is violated by some row"}
...
{"component":"atlas","log":"Caused by:"}
{"component":"atlas","log":"PG::CheckViolation: ERROR: check constraint \"runs_plan_only_not_null\" is violated by some row"}
...
{"component":"atlas","log":"2024-03-13 14:50:01 [INFO] Migrating to RunPlanOnlyNonNull (20230505152154)"}
{"component":"atlas","log":"== 20230505152154 RunPlanOnlyNonNull: migrating ==============================="}
{"component":"atlas","log":"-- execute(\"alter table runs add constraint runs_plan_only_not_null check (plan_only is not null) not valid;\")"}
{"component":"atlas","log":" -> 0.0035s"}
{"component":"atlas","log":"-- execute(\"alter table runs validate constraint runs_plan_only_not_null;\")"}
{"component":"atlas","log":"- migrations done! removing lock..."}
{"component":"atlas","log":"- exiting with failure - see migration output for details"}
Prerequisites
- Terraform Enterprise instance is being upgraded to v202305-1 or later
Cause
Terraform Enterprise v202206-1 includes a background (asynchronous) database migration (BackfillPlanOnlyOnRuns
) which uses the DataMigrations::BackfillPlanOnlyOnRuns
migration class. This migration class is also used in a synchronous migration (RunPlanOnlyBackfillTfeFinish
) included in v202305-1 and later. If the migration class is run as part of the installation of or an upgrade to v202205-1 or later, it will be marked as completed and, when invoked for a second time in the RunPlanOnlyBackfillTfeFinish
migration as part of an upgrade to v202305-1, will be skipped. If runs are created with a plan_only
value of null between those two versions, the skipping of this migration will cause a subsequent migration (RunPlanOnlyNonNull
) to fail, as there will be rows which violate constraints this migration creates and enforces on the runs table.
To confirm this is the cause, start a Rails console and check if there are any runs with a null plan_only
attribute.
irb(main):001:0> Run.where(plan_only: nil).any?
=> true
Additionally, inspect the background migration to confirm that it has already been recorded as completed, and that the timestamp recorded is prior to the time of current installation/upgrade.
irb(main):002:0> BackgroundMigration.find_by(migration_class: "DataMigrations::BackfillPlanOnlyOnRuns").complete_at?
=> true
irb(main):003:0> BackgroundMigration.find_by(migration_class: "DataMigrations::BackfillPlanOnlyOnRuns").complete_at
=> Wed, 27 Mar 2023 15:37:21.988129000 UTC +00:00
Assuming the environment meets the prerequisites and this particular cause has been confirmed using the commands above, proceed to the solution below.
Solution
This can be resolved by updating the migration classes completed status and manually invoking it in the Rails console.
Start a Rails console and execute the following commands to set the background migration's completion status to nil and invoke the migration.
migration = BackgroundMigration.find_by(migration_class: "DataMigrations::BackfillPlanOnlyOnRuns")
migration.complete_at = nil
migration.save!
BackgroundMigration.perform_all(migration.migration_class)
Ensure that there are no longer any rows in the runs table where plan_only
is null.
irb(main):002:0> Run.where(plan_only: nil).any?
=> false
The previous migration failure will have left behind a constraint on the runs table which will need to be removed before the application is restarted and the migrations are run again; otherwise the RunPlanOnlyNonNull
migration will fail again with the following error.
PG::DuplicateObject: ERROR: constraint \"runs_plan_only_not_null\" for relation \"runs\" already exists
The constraint can be safely removed with the following command. It will be re-created, validated, and then dropped (after it is created as a table constraint) when the RunPlanOnlyNonNull
migration is invoked again at startup.
Terraform Enterprise >= v202309-1 (Replicated deployment in consolidated services or Flexible Deployment Options):
docker exec terraform-enterprise bash -c '. atlas-env && psql $DATABASE_URL -c "alter table runs drop constraint runs_plan_only_not_null;"'
Terraform Enterprise < v202309-1 (Replicated deployment in non-consolidated services):
docker exec tfe-atlas bash -c '. atlas-env && psql $DATABASE_URL -c "alter table runs drop constraint runs_plan_only_not_null;"'
Outcome
After performing the steps above and restarting the Terraform Enterprise application, the database migrations should proceed to completion.
Additional Information
If you continue to experience issues, please contact HashiCorp Support.