Introduction
This KB will discuss steps to perform a DR Fire-Drill to test Vault DR setup.
Prerequisites (if applicable)
- This topic requires Vault Enterprise with DR replication enabled running vault version 1.4.x and higher.
- If the clusters don't have direct communication, then please see the following article on configuring replication, using a load-balancer to establish connectivity to the Primary cluster.
- You will need two Vault clusters
- DR Primary Cluster (ClusterA)
- DR Secondary Cluster (ClusterB)
Setup DR (If not configured already)
Enable DR Primary on ClusterA
Based on the connectivity between your clusters, you might need to specify the DR Primary Cluster Address of the Load Balancer using the primary_cluster_addr argument in the below command.
$ vault write -f sys/replication/dr/primary/enable
Generate a Secondary Activation Token on ClusterA for ClusterB
$ vault write -f sys/replication/dr/primary/secondary-token id=ClusterB
Enable DR Secondary on ClusterB.
Based on the connectivity between your clusters, you might need to specify the DR Primary Cluster API Address of the Load Balancer using the primary_api_addr argument in the below command.
$ vault write -f sys/replication/dr/secondary/enable token=$TOKEN
We recommend to run DR Fail-Over exercises in DEV and PRE-PROD environments before executing in production.
Use Case
Provide a high-level overview of the steps for executing Vault DR Failover and Fail-back exercises.
Failover to DR Secondary Cluster:
- Demote DR Primary Cluster (ClusterA)
- Promote DR Secondary Cluster (ClusterB)
- Sync ClusterA to ClusterB, so ClusterA is DR Secondary of ClusterB.
- Test Vault access with your application while ClusterB is primary
Procedure:
Failover to DR Secondary:
1. Demote ClusterA and Promote ClusterB as Primary.
-
-
Take a backup of your DR Primary Cluster Cluster
-
Before starting failover check the replication status to ensure clusters are in sync
-
$ vault read sys/replication/status -format=json
- Demote DR Primary ClusterA. This will change ClusterA to a DR
Secondary Cluster
$ vault write -f sys/replication/dr/primary/demote
-
On ClusterB generate a DR Operation Token. Use the unseal/recovery keys of ClusterA the (Primary Cluster)
-
$ vault operator generate-root -dr-token -init
$ vault operator generate-root -dr-token # it will ask for unseal/recovery keys.
$ vault operator generate-root -dr-token -otp=$OTP -decode=ENCODED_TOKEN
-
Note: There is no need to generate operation token using these steps if batch DR operation token, which can be used to promote the DR secondary cluster even if it was generated by the DR primary cluster is available.
Promote DR Secondary ClusterB, using dr_operation_token
generated in above step.
-
$ vault write -f sys/replication/dr/secondary/promote dr_operation_token=$DR_TOKEN_1
Note: Client traffic needs to be redirected to ClusterB while it is promoted. This will help in reducing downtime to clients.
Note: In this scenario ClusterA was demoted, before promoting ClusterB. But in an actual outage, you will promote ClusterB before demoting ClusterA (as this cluster might be unavailable).
Note: If there is a LoadBalancer configured to route traffic to these cluster, rules on the LB should be modified to re-route traffic to the correct cluster during this activity. If you're using a DNS CNAME record, then that record will need to be modified to point to the new DR Primary Cluster.
Note: ClusterA will show the below WARNING message in its logs, which will be fixed after updating ClusterA with ClusterB secondary-activation token as its new Primary DR Cluster. The Activation-Token contains ClusterB cluster address for communication between the nodes.
2021-10-21T11:28:14.847+1100 [WARN] core: replicated cluster is a secondary
but with no primary address, not starting client
After testing and validating the new DR Primary Cluster (ClusterB), the next step is to update ClusterA to use ClusterB as its new DR Primary Cluster
2. Configure ClusterA as DR secondary to ClusterB (which is now the DR Primary).
- Generate secondary token on ClusterB
$ vault write -f sys/replication/dr/primary/secondary-token id=ClusterA
-
On ClusterA generate a DR Operation Token. Use unseal/recovery keys of ClusterA to generate DR operation token.
-
After Threshold number of keys are provided, ClusterA will output an Encoded Token.
$ vault operator generate-root -dr-token -init
$ vault operator generate-root -dr-token
# it will ask for unseal/recovery keys.$ vault operator generate-root -dr-token -otp=$OTP -decode=ENCODED_TOKEN
-
Note: To save time in an emergency situation you can create and store securely a Batch Token, ahead of time to avoid having to generate a DR Operation Token during an outage.
- Update ClusterA to use ClusterB as its DR Primary Cluster
This will set ClusterA as DR secondary of ClusterB. Which is required to synchronize the new data written to ClusterB (during the time ClusterB was Primary)
-
$ vault write sys/replication/dr/secondary/update-primary dr_operation_token=$DR_TOKEN_2 token="xxx"
-
Check replication status on both clusters:
-
$ vault read sys/replication/status -format=json
DR Fail-Back:
3. Currently ClusterB is a DR Primary and ClusterA is a DR Secondary. The next step is to fail-back to the original DR Primary Cluster (ClusterA).
-
Check DR Replication Status before start (check if clusters are in sync
-
$ vault read sys/replication/status -format=json
- Demote DR Primary on ClusterB. This will change
mode
tosecondary
in ClusterB.
$ vault write -f sys/replication/dr/primary/demote
- The below step requires a
dr_operation_token
of clusterA. Use thedr_operation_token
generated in step2 for clusterA. -
Promote ClusterA to DR primary, Redirect all vault client to ClusterA, as it will start serving traffic as soon as it is Primary.
-
$ vault write -f sys/replication/dr/secondary/promote dr_operation_token=$DR_TOKEN_2
Test and Validate ClusterA (original Primary).
4. Update ClusterB to use ClusterA as its DR Primary Cluster
$ vault write sys/replication/dr/secondary/update-primary dr_operation_token=$DR_TOKEN_2 token="xxx"
-
Check replication status on both clusters:
-
$ vault read sys/replication/status -format=json
- Generate Secondary Activation Token on ClusterA
$ vault write -f sys/replication/dr/primary/secondary-token id=ClusterB
-
On ClusterB generate a DR Operation Token. Use the unseal/recovery keys of ClusterA the (Original Primary Cluster)
-
$ vault operator generate-root -dr-token -init
$ vault operator generate-root -dr-token # it will ask for unseal/recovery keys.
$ vault operator generate-root -dr-token -otp=$OTP -decode=ENCODED_TOKEN - Update ClusterB to use ClusterA as its DR Primary Cluster
-
$ vault write sys/replication/dr/secondary/update-primary dr_operation_token=$DR_TOKEN_2 token="xxx"
Test and Validate ClusterA