Introduction
Expected Outcome
This KB will discuss steps to perform fire-drill to test vault DR setup.
Prerequisites (if applicable)
- This topic requires vault enterprise with DR replication enabled running vault version 1.4.x and higher.
- Two vault clusters with DR replication setup.
- 1 Primary cluster (ClusterA)
- 1 DR cluster (ClusterB)
- Setup DR (If not configured already)
-
Enable DR primary on ClusterA
vault write -f sys/replication/dr/primary/enable
-
Generate secondary token on ClusterA for ClusterB
vault write -f sys/replication/dr/primary/secondary-token id=ClusterB
-
Enable DR secondary on ClusterB. Based on your setup you might need to provide additional options in below command.
vault write -f sys/replication/dr/secondary/enable token=$TOKEN
-
- These steps should be tested in DEV/PRE-PROD setup first to get comfortable with these.
Use Case
This article is to discuss and provide steps to perform vault DR Fire-Drill. Here is summary of steps which should be performed to complete FireDrill of vault replication setup.
Failover to DR:
- Demote Primary cluster (ClusterA)
- Promote DR cluster (ClusterB)
- Sync ClusterA to ClusterB, so ClusterA is DR of ClusterB.
- Test vault access with your application while ClusterB is primary
Failback to Original Primary:
- Demote ClusterB
- Promote ClusterA
- Sync ClusterB to ClusterA, so ClusterA is Primary now.
- Test vault access with your application.
Procedure:
Failover to DR:
1. Promote ClusterB as Primary and Demote ClusterA.
-
-
Take a backup of your vault cluster as per https://learn.hashicorp.com/tutorials/vault/sop-backup.
-
Check replication status before start (check if clusters are in sync - https://learn.hashicorp.com/tutorials/vault/monitor-replication#are-my-dr-clusters-in-sync):
vault read sys/replication/status -format=json
-
Generate DR operation token on ClusterB. Use unseal/recovery keys of ClusterA to generate DR operation token. After Threshold number of keys are provided, it will output an Encoded token. Steps to generate dr operation token are found here too - https://learn.hashicorp.com/tutorials/vault/disaster-recovery#promote-dr-secondary-to-primary.
vault operator generate-root -dr-token -init
vault operator generate-root -dr-token
# it will ask for unseal/recovery keys.vault operator generate-root -dr-token -otp=$OTP -decode=ENCODED_TOKEN
-
Note: There is no need to generate operation token using these steps if batch DR operation token, which can be used to promote the DR secondary cluster even if it was generated by the DR primary cluster is available.
-
Promote ClusterB, using
dr_operation_token
generated in above step.vault write -f sys/replication/dr/secondary/promote dr_operation_token=$DR_TOKEN_1
-
Replication status on both clusters will show as
mode: primary
at time point. -
Demote DR Primary on ClusterA. This will change
mode
tosecondary
in ClusterA.vault write -f sys/replication/dr/primary/demote
Note: There will be short time (between promote of ClusterB and Demote of ClusterA) when both clusters will be primary, but client traffic can be redirected to ClusterB while it is promoted. This will help in reducing downtime to clients.
Note: Demote on ClusterA can be done before Promote on ClusterB, but in actual DR scenario, you might promote ClusterB before Demoting ClusterA (as this cluster might be unavailable).
Note: If There is LoadBalancer configured to route traffic to these cluster, rules on LB should be modified to re-route traffic to correct cluster during this activity. Same should be done for cases in DNS is updated to point to correct cluster.
Note: ClusterA will show below WARN messages in its logs. 2021-10-21T11:28:14.847+1100 [WARN] core: replicated cluster is a secondary
but with no primary address, not starting client
Test and Validate ClusterB (new Primary).
2. Setup ClusterA as DR secondary to ClusterB (which is Primary now).
-
Generate secondary token on ClusterB
vault write -f sys/replication/dr/primary/secondary-token id=ClusterA
-
Generate DR operation token on ClusterA. Use unseal/recovery keys of ClusterA to generate DR operation token. After Threshold number of keys are provided, it will output an Encoded token. Steps to generate dr operation token are found here too - https://learn.hashicorp.com/tutorials/vault/disaster-recovery#promote-dr-secondary-to-primary.
vault operator generate-root -dr-token -init
vault operator generate-root -dr-token
# it will ask for unseal/recovery keys.vault operator generate-root -dr-token -otp=$OTP -decode=ENCODED_TOKEN
Note: There is no need to generate operation token using these steps if batch DR operation token, which can be used to promote the DR secondary cluster even if it was generated by the DR primary cluster is available.
-
Enable DR secondary on ClusterA (using
update-primary
endpoint) usingdr_operation_token
generated in above step. Here token is Secondary token generated on ClusterB. This will set ClusterA as DR secondary of ClusterB. This steps is required to make sure all the changed data (during the time ClusterB was Primary) on ClusterB is replicated to ClusterA.vault write sys/replication/dr/secondary/update-primary dr_operation_token=$DR_TOKEN_2 token="xxx"
-
Check replication status on both clusters:
vault read sys/replication/status -format=json
DR FailBack.
3. Currently ClusterB is Primary and ClusterA is Secondary. We will now promote ClusterA (old Primary) as Primary.
-
Check replication status before start (check if clusters are in sync - https://learn.hashicorp.com/tutorials/vault/monitor-replication#are-my-dr-clusters-in-sync):
vault read sys/replication/status -format=json
- Below step requires
dr_operation_token
of clusterA. we will usedr_operation_token
generated in step2 for clusterA. -
Promote ClusterA to DR primary, Redirect all vault client to ClusterA, as it will start serving traffic as soon as it is Primary.
vault write -f sys/replication/dr/secondary/promote dr_operation_token=$DR_TOKEN_2
-
Replication status on both clusters will show as
mode: primary
at time point. -
Demote DR Primary on ClusterB. This will change
mode
tosecondary
in ClusterB.vault write -f sys/replication/dr/primary/demote
Test and Validate ClusterA (original Primary).
4. Setup ClusterB (original DR) as DR secondary to ClusterA (original primary)
-
-
Generate secondary token on ClusterA
vault write -f sys/replication/dr/primary/secondary-token id=ClusterB
- Below step requires
dr_operation_token
of clusterB. we will usedr_operation_token
generated in step1 for clusterB.
-
Enable DR secondary on ClusterB (using
update-primary
endpoint) usingdr_operation_token
generated in step 1 for ClusterB. Here token is Secondary token generated on ClusterA at the start of this section. This will set ClusterB as DR secondary of ClusterA and your setup should now looks same as it was at the start of DR activity.vault write sys/replication/dr/secondary/update-primary dr_operation_token=$DR_TOKEN_1 token="xxx"
-
Check replication status on both clusters:
vault read sys/replication/status -format=json
-
Additional Information
-
DR setup learn guide - https://learn.hashicorp.com/tutorials/vault/disaster-recovery?in=vault/enterprise
- /sys/replication/dr endpoint - https://www.vaultproject.io/api-docs/system/replication/replication-dr#sys-replication-dr