Introduction
This article shows how to migrate a performance secondary(PR) cluster to a new environment.
The solution presented below applies for moving the cluster from on-prem to a cloud platform or between cloud platforms. In addition, this method can be used to upgrade the secondary performance cluster to a newer Vault version.
There are two clusters with performance replication enabled and the secondary cluster needs to be moved to a new cluster.
The first solution that comes to mind is to deploy a new Vault cluster in the target environment and establish performance replication from the original cluster to the newly deployed cluster.
Challenge
Performance replication synchronizes configuration, policies, and secrets between clusters. Performance secondary cluster has its own tokens and leases. However, tokens and leases are not replicated and are managed independently by each cluster. If the new cluster will be set directly as a new secondary performance replication cluster to cluster A, the tokens and leases and other cluster specific mounts (PKI secretes mounts and local mounts) will not be replicated to the new cluster.
Solution
Disaster recovery (DR) replication can be used to create an exact replica of the cluster B on the new cluster C. This setup ensures that Cluster C mirrors all aspects of Cluster B including secrets and authentication mounts, tokens and leases, policies and cluster capacity.
Prerequisites
- Vault Enterprise License: Performance replication requires a Vault Enterprise license.
- New Vault Cluster: Deploy a new Vault cluster in the target environment.
- Network Configuration: Ensure network connectivity between the primary and new secondary clusters.
- Unique Cluster ID: Assign a unique identifier to the new secondary cluster.
Procedure
The steps below needs to be verified in a lower environment before execution in the production server.
- Set up a new Vault cluster in the target environment, ensuring it meets the required specifications and configurations.
- Take a backup of the Vault data on the secondary performance cluster in case roll-back or restore service using Vault snapshots is needed
If Vault clusters is using Integrated Storage (Raft), the backup can be created by running the following command on the active node:
vault operator raft snapshot save /path/to/backup.snap
vault write -f sys/replication/dr/primary/enable
vault write sys/replication/dr/primary/secondary-token id="dr-secondary"
- Enable DR secondary replication on new cluster (cluster C)
vault write sys/replication/dr/secondary/enable token=$DR
Create a policy named "dr-secondary-promotion" allowing the update
operation against the sys/replication/dr/secondary/promote
path.
vault policy write dr-secondary-promotion - <<EOF path "sys/replication/dr/secondary/promote" { capabilities = [ "update" ] } # Only if using integrated storage (raft) as the storage backend # To read the current autopilot status path "sys/storage/raft/autopilot/state" { capabilities = [ "update" , "read" ] } EOF
Create a token role named "failover-handler" with the dr-secondary-promotion
policy attached and its type should be batch
, renewable
parameter value to false
and set orphan
parameter to true
.
vault write auth/token/roles/failover-handler \ allowed_policies=dr-secondary-promotion \ orphan=true \ renewable=false \ token_type=batch
Create a token for the role with time-to-live (TTL) set to 8 hours:
vault token create -role=failover-handler -ttl=8h
- Stop Vault service on the original PR secondary cluster (cluster B)
sudo systemctl stop vault
In the event that something goes wrong with the promotion of the newly created DR Secondary (Cluster C) it’s still possible to use the original PR Secondary (Cluster B).
- Promote DR Secondary (cluster C) to Primary using batch token
vault write sys/replication/dr/secondary/promote dr_operation_token=${BT}
-
Verify Replication Status
Check the replication status to ensure the new secondary cluster is properly connected and synchronized with the primary cluster:
vault read sys/replication/performance/status
Example output on the primary performance cluster:
vault read -format=json sys/replication/status
{ ... "performance": { "cluster_id": "fe5a06fd-b478-da86-79e3-ce9a18af010a", "corrupted_merkle_tree": false, "known_secondaries": [ "DC2-PR-B" ], "last_corruption_check_epoch": "-62135596800", "last_performance_wal": 400064, "last_reindex_epoch": "0", "last_wal": 400064, "merkle_root": "36dc35b7f1c79aa40127f8919618ea072de68642", "mode": "primary", "primary_cluster_addr": "", "secondaries": [ { "api_address": "https://192.168.86.233:8200", "clock_skew_ms": "35", "cluster_address": "https://192.168.86.233:8201", "connection_status": "connected", "last_heartbeat": "2025-06-02T10:32:34Z", "last_heartbeat_duration_ms": "2", "node_id": "DC2-PR-B", "replication_primary_canary_age_ms": "298" } ... }
The output indicates the api_address
and cluster_address
of the new performance replication cluster.
Example output on the new secondary performance cluster:
vault read -format=json sys/replication/performance/status
{ ... "data": { "cluster_id": "fe5a06fd-b478-da86-79e3-ce9a18af010a", "connection_state": "ready", "corrupted_merkle_tree": false, "known_primary_cluster_addrs": [ "https://192.168.86.253:8201" ], ... "last_remote_wal": 400393, "last_start": "2025-05-26T16:08:03Z", "merkle_root": "b591d0e8f519b42b627105e8a17d2154d89f2c31", "mode": "secondary", "primaries": [ { "api_address": "https://192.168.86.253:8200", "clock_skew_ms": "51", "cluster_address": "https://192.168.86.253:8201", "connection_status": "connected", "last_heartbeat": "2025-06-02T10:38:04Z", "last_heartbeat_duration_ms": "2", "replication_primary_canary_age_ms": "852" } ], "primary_cluster_addr": "https://192.168.86.253:8201", "secondary_id": "DC2-PR-B", "ssct_generation_counter": 1, "state": "stream-wals" }, "warnings": null }
Notice known_primary_cluster_addrs
is the address of the primary performance cluster.
- Monitor and Test
Monitor the logs and perform tests to verify that the new secondary cluster is handling read operations and forwarding write requests correctly.
Conclusion
The procedure presented above shows how to successfully move secondary performance cluster to a new environment, ensuring minimal disruption and continued optimal performance.