This article provides some detail and starting points related to migration of Vault data stored in a Consul cluster for the purposes of informing your own Vault backup/restore and data migration strategies when using Consul as your Vault storage backend.
In particular, two recommended techniques for migration of Vault data stored in Consul will be detailed here:
- Consul Snapshots
- Disaster Recovery (DR) Mode Replication
You can apply the general concepts shared here along with your specific policies and procedures to architect a backup and recovery plan or data migration strategy that is best suited for your organization’s needs.
Consul Snapshots
The recommended approach for migration of Vault data stored in Consul is to use Consul Snapshots.
If your Consul cluster is used exclusively for Vault data, then you can simply save and restore Consul snapshots as a backup/restoration or data migration solution. A saved snapshot provides an atomic point-in-time representation of your Vault instance’s key/value data from which you can restore to another Vault instance later.
Consul Enterprise users can also enable the automated Snapshot Agent that helps to ensure snapshots are taken on your desired schedule and retained in your specified destination.
If your Consul cluster is used for more than Vault data storage, then you must also take these data and the services which use them into consideration when restoring the snapshot as selective key/value data restoration from snapshot is not yet currently possible.
This article discusses taking manual snapshots with consul snapshot
which is available in both Enterprise and OSS Consul versions. See consul snapshot --help
or the Consul Snapshot documentation for more information about the consul snapshot
command.
Here is a brief and simplified example of manually saving and restoring a Consul snapshot.
Source Consul Cluster
On the source Consul server cluster that contains the Vault data to be saved in a snapshot, execute this command from either a Consul server directly or any system running a Consul client agent connected to the source server cluster:
$ consul snapshot save backup.snap
Saved and verified snapshot to index 1394
The snapshot file backup.snap
will be present in the current working directory.
Inspecting the Snapshot
The snapshot file is simply a gzip compressed archive. You can perform some operational inspection on the snapshot file via the consul snapshot inspect
command and also manually by decompressing the file and examining its contents.
$ consul snapshot inspect backup.snap
ID 2-1394-1515172423763
Size 481887
Index 1394
Term 2
Version 1
This output shows the snapshot ID, size in bytes, plus the snapshot index, term, and version, which can be compared with the server (e.g. via consul info
) and is useful to detect corruption.
If you’re interested in manually inspecting the snapshot content, then steps like this will be handy:
$ cp backup.snap backup.tgz
$ tar zxvf backup.tgz
x meta.json
x state.bin
x SHA256SUMS
The archive contains the binary snapshot data, metadata, and the the SHA 256 summaries for each file, which you can compare using conventional tools:
$ cat SHA256SUMS ; \
echo " " ; \
sha256sum meta.json state.bin
2fe4fe2876073c5576c903a6752eea8303d68df440fb1492ae06b8bb7cbd7426 meta.json
64718fc06fae24ddde9a97bab300c95e589030d677fa71a3fbd5a6e982657b29 state.bin
2fe4fe2876073c5576c903a6752eea8303d68df440fb1492ae06b8bb7cbd7426 meta.json
64718fc06fae24ddde9a97bab300c95e589030d677fa71a3fbd5a6e982657b29 state.bin
Destinatation Consul Cluster
Once you’ve validated your snapshot as the correct one to restore, you can restore it against the destination Consul server cluster like this:
$ consul snapshot restore backup.snap
Restored snapshot
IMPORTANT: Please see the Post Restoration Notes section for details on what could be required after restoring Vault data depending on your use case.
Disaster Recovery Mode Replication
Vault Enterprise versions offer a second data migration option which can be realized with the Consul storage backend, and that is DR mode replication.
This data migration process in a nutshell is essentially performing the following steps:
- Enable replication on the source Vault cluster as a Disaster Recovery mode Primary
- Configure and enable DR mode secondary cluster
- Replication of all Vault data will occur between the primary and secondary Vault clusters
Before using this method, you should be familiar with Vault Enterprise replication documentation and in particular, the following resources:
If your intention after migration of data via replication is to use the secondary DR cluster as a primary Vault cluster, then you must first promote the DR secondary cluster to a primary cluster.
See the /sys/replication/dr/secondary/promote API documentation and output from vault path-help sys/replication/dr/secondary/promote
for more details about promoting a DR secondary to a DR primary.
CAUTION: Only one primary should be active at a given time. The replication model is not designed for active-active usage and enabling two primaries should never be done, as it can lead to data loss if they or their secondaries are ever reconnected.
Note that the items detailed in Post Restoration Notes typically will not apply to data migrations performed via replication.
Post Restoration Notes
This section contains important notes which you should familiarize yourself with prior to performing any migration or backup/restore of Vault data stored in Consul.
Please be aware of the following caveats and conditions around restoring Vault data before you proceed with a backup and restoration.
High Availability Mode Lock
In a High Availability Vault cluster, the active node will have held the cluster leadership lock at the time of the data export or snapshot. After restoring Vault data to Consul, you must manually remove this lock so that the Vault cluster can elect a new leader.
Execute this consul kv
command immediately after restoration of Vault data to Consul:
$ consul kv delete vault/core/lock
See consul kv delete --help
or the Consul KV Delete documentation for more details on the command.
Dynamic Secret Backends
With use of dynamic secret backends there could be user credentials in a database secret backend like PostgreSQL for example, that Vault doesn’t have knowledge of.
If users for such dynamic backends were created after you took a snapshot, then Vault would not be aware of them after restoring the snapshot. There is not much that can be done to mitigate this; frequent snapshotting of Vault data can help.
If you’re using dynamic secret backends, after restoring Vault data, you can go through active Vault users, and revoke them all to force your clients to get new credentials and generate a new lease.
Deleted Users After Snapshot
If users for a given backend were deleted after you took the snapshot you are restoring from, you could experience issues with Vault automatically revoking their leases, which appear in the logs as revocation errors along with User not found
or no such user
.
In these cases, you’ll need to manually force revocation of the user by their lease ID. Here’s an example:
$ vault revoke -force -prefix ce9e899b-49d0-9646-9769-381909fea995
Success! Revoked the secret with ID 'ce9e899b-49d0-9646-9769-381909fea995', if it existed.
If you want to use the vault
command to revoke, see vault revoke --help
for more details on the -force
flag syntax.
To learn more about doing this programmatically, see the Revoke Force API documentation
Restoration After Vault is Rekeyed
Some tips which can help with the scenario where restoration of an older Vault export or snapshot occurs after Vault is rekeyed:
- Use key manager to store unseal keys so you have a versioned history of them
- When transmitted PGP encrypted keys, just use email so you have a history of the unseal keys there
- Archive PGP encrypted unseal keys into a backup and store it somewhere in the event you have to do an older restore
- You can even maintain a history of PGP-keys stored in Vault