A crashed or prematurely stopped TFE instance may not have completed the process of running database migrations, causing other TFE instances connected to the database to hang up to one hour before proceeding with migrations.
TFE uses Redis to establish a lock around running database migrations, in order to prevent data corruption caused by multiple TFE instances (such as in an Active/Active deployment) from modifying the database at the same time. In catastrophic cases, such as a compute instance being forcefully terminated while migrations are running, the lock value in Redis may not have been removed, resulting in other TFE nodes waiting to establish the lock before running migrations.
The migration lock is created with a one hour TTL; in the case of a crashed migration, the lock will remove itself one hour after the migration process began. To reduce this delay, the lock can be removed manually:
Connect to the Redis instance.
> del tfe_migration_lock (integer) 1
Once the key is removed from Redis, any nodes attempting to run migrations will re-attempt to acquire the lock, and proceed with running migrations.