Problem statement
When we need to migrate the Vault backend on Kubernetes, OpenShift, AKS, and EKS, at that time we have to face the problem because POD is always running which means the vault process is running, and for backend migration, we have to stop the vault process & then migration process added a lock file to prevent starting the Vault server or another migration.
Observation & Solution
We have observed the below error when people are trying to migrate the backend by running the migration operation inside the PODs
Error 1) When migrating the backend to Raft
2022-09-12T18:59:16.711Z [WARN] appending trailing forward slash to the path
Error migrating: error mounting 'storage_destination': failed to create fsm: failed to open bolt file: open /vault/data/vault.db: no such file or directory
Sol 1) If you are getting the above error then it means the path defined storage_destination
which is /vault/data/vault.db
here, is not exist on the filesystem where we are running the migration operation so we have to check the path on the filesystem & if it's not existing then we have to create it. There may be also a chance that we are running the migration command on the wrong cluster so we have to verify that too, for this you can visit this article.
Error 2) Consul to Raft Migration
2022-09-13T05:21:10.453Z [WARN] appending trailing forward slash to the path
Error migrating: error mounting 'storage_destination': failed to create fsm: failed to open bolt file: timeout
Sol 2) Above error comes when the Vault process is running irrespective of whether the Vault is initialized or not & it will generate the vault.db & raft folder on the path defined in, that's why we recommended stopping the vault process so in this case, we have to delete this generated vault.db & raft folder & then run the migration operation. Once it is finished then restart the POD. Secondly, we can give the different paths in migration.hcl file for storage_destination
and then run the migration operation, once done move the newly created vault.db & raft folder in the path defined in Vault configuration file & re-start the POD.
Error 3) General error that can come with any destination backend
Error migrating: error mounting 'storage_destination': could not bootstrap clustered storage: error bootstrapping cluster: cluster already has state
Sol 3) Above error comes when the destination vault cluster already has stated in this case we have to delete the vault data at the destination cluster after taking the backup of it & then re-run the migration operation again. It will resolve the issue.
Error 4) Raft to S3 Migration
Error migrating: error mounting 'storage_destination': unable to access bucket "vaultstorages3" in region "ap-south-1": InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.
status code: 403, request id: 50VNJ39PP20YNDHJ, host id: u4+YSoMdDV7SQBy1NwN0k+8P/19qLSY2qkjzqqMxM0DLgrhNkdLVpFflwSMl1wiDLxpuv0l0Mdo=
Sol 4) To resolve the above error we have to add session_token parameter in s3 storage_destination
stanza & then re-run the migration operation.
So in Kubernetes/OpenShift/AKS & EKS, we have to not initialize the destination vault cluster & then need to perform the migration. We can also take the snapshot from the source vault cluster & restore that snapshot on the destination vault cluster which has the same backend(only possible in the case of consul & raft) as the source, once done then we can migrate the backend to the destination cluster by sealed the Vault & migrate the Vault backend to other location on the same server, once done we need to update the vault configuration with the desired backend & then move the migrated data into the path defined in the vault configuration file.
References
https://support.hashicorp.com/hc/en-us/articles/9594980972819-Vault-Storage-Backend-Migration
https://learn.hashicorp.com/tutorials/vault/raft-migration
Perform the test in the local lab & then put the scenarios on the basis of it.