The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Introduction
This guide provides the required steps and best practices to migrate a Vault Disaster Recovery (DR) Cluster hosted on-premise (e.g., in Kubernetes) to a new, independent Standalone Vault Cluster hosted in Amazon Web Services (AWS), leveraging a Raft storage snapshot. The goal is to achieve a clean "lift-and-shift" transition, specifically promoting the former DR primary to a new, fully standalone primary in the AWS environment.
Expected Outcome
A new Vault cluster is running in AWS, initialized from the snapshot of the former DR primary cluster, with all data intact and operating in a standalone mode (replication fully disabled).
Prerequisites
-
The success of this migration is entirely dependent on meeting these critical prerequisites before beginning the migration process:
Component Requirement Detail/Best Practice Source Cluster Former DR Primary (replication disabled) Must be in the stream-walsstate with low or zero lag. The snapshot must be taken after cleanly disabling DR replication.Target Cluster New AWS Vault cluster New, uninitialized EC2 instance prepared to restore snapshot. Follow the Hardware Sizing for Vault Servers article linked below for recommended instance sizing. Seal Type Consistency Critical: Use identical Seal Type (Shamir, AWS KMS, etc.) Snapshot restore requires identical seal configuration and keys on the target as the source cluster. Storage Backend Same storage backend Snapshot restore requires the target cluster to use the identical storage backend as the source cluster. We recommend Integrated Storage (Raft) for all new clusters if applicable to your environment. Access Root/admin Token + Unseal Keys Full administrative access and unseal keys needed on both clusters for snapshot and restore operations. Vault Versions Must match exactly Vault versions must be identical on source (when snapshot was taken) and target (during restore).
Use Case
Issue: Migrating a Vault cluster and transitioning it out of a DR replication relationship simultaneously often leaves behind stale replication metadata, leading to configuration errors or unexpected behavior in the new environment.
Goal: Cleanly sever the replication link, capture a consistent data snapshot, and restore it to the AWS environment as a new, fully standalone Primary cluster, preventing stale replication metadata errors.
Procedure
The migration is split into two phases and consists of nine sequential steps to ensure a clean separation, successful migration, and full functionality check in the new AWS environment.
Phase 1: Preparation, Clean Shutdown, and Snapshot
This phase ensures a clean separation from the DR secondary before the snapshot is taken. Run all commands on the Source DR Primary Cluster.
Step 1: Validate Replication Status (Pre-Maintenance Check)
Confirm the source primary DR cluster is fully synced with the secondary DR cluster.
vault read -format=json sys/replication/status
Confirmation: Ensure the JSON output shows the dr mode as "primary" and that last_wal equals last_dr_wal (or within a few transactions) to confirm the replication link is completely caught up.
Step 2: Cleanly Tear Down DR Replication
Disabling DR replication before the snapshot is critical, as it prevents replication-specific metadata from being carried over. This can be done via the GUI or CLI.
GUI Method (On-Prem DR Primary)
-
Revoke the Secondary: Navigate to Monitoring -> Replication -> Disaster Recovery -> Secondaries. Click the three dots (...) next to the secondary cluster name, then select Revoke.
-
Disable DR Replication Mode: Navigate to Disaster Recovery -> Manage. Click Disable Replication and confirm by typing "Disaster Recovery" when prompted.
CLI Method
-
Revoke the Secondary: Run this on the DR Primary cluster. Use the
<SECONDARY_ID>obtained by inspecting the output ofvault read sys/replication/statusin Step 1. This cleanly breaks the DR link.vault write sys/replication/dr/primary/revoke-secondary id=<SECONDARY_ID>
-
Disable DR Mode on the Primary: Once the secondary is revoked, disable DR mode on the primary cluster to clear all DR metadata.
vault write -f sys/replication/dr/primary/disable
Verification: Run vault read sys/replication/status to confirm that replication is no longer active and the dr mode is disabled.
Note on DR Data Cleanup: Disabling DR replication does not automatically remove data from the secondary. Vault retains all replicated data in a read-only state. Manual cleanup must be performed if the secondary DR node is no longer needed.
Step 3: Snapshot the Storage Backend
Immediately after cleanly disabling replication, take a final, verified snapshot of the primary cluster's state.
# Snapshot command for Integrated Storage (Raft) vault operator raft snapshot save /tmp/VAULT_MIGRATION_SNAPSHOT.snap
Security Note: The
/tmppath is used for convenience inside Kubernetes containers but is not persistent. For production use, always store snapshots in a durable, secure, and accessible location.
Step 4: Securely Transfer the Snapshot to the AWS Target Cluster
This is a two-stage file transfer process:
-
Stage 1: Copy from the Container to the Local Workstation: Execute this command from your local machine's terminal (where
kubectlis installed).# Syntax: kubectl cp <VAULT_NAMESPACE>/<VAULT_POD_NAME>:<container-path> <local-path> kubectl cp VAULT_NAMESPACE/VAULT_POD_NAME:/tmp/VAULT_MIGRATION_SNAPSHOT.snap ./VAULT_MIGRATION_SNAPSHOT.snap
-
Stage 2: Transfer to the AWS Target Cluster: Use
scpto move the file to a persistent directory on the AWS target EC2 instance.scp -i ~/.ssh/AWS_SSH_KEY.pem ./VAULT_MIGRATION_SNAPSHOT.snap ec2-user@TARGET_EC2_IP_ADDRESS:/home/ec2-user/
| Placeholder | Description |
|---|---|
~/.ssh/AWS_SSH_KEY.pem |
The private key file for SSH access to the AWS target server. |
VAULT_MIGRATION_SNAPSHOT.snap |
The snapshot file you copied in Step 4.1. |
ec2-user@TARGET_EC2_IP_ADDRESS |
The target EC2 user and public IP address. |
/home/ec2-user/ |
The persistent directory on the target server. |
Step 5: Securely Transfer Unseal Keys
Ensure that you securely transfer or have readily accessible the original source cluster's unseal/recovery keys to the AWS target environment. These keys are mandatory for unsealing Vault after the restore operation.
Phase 2: Restore and Standalone Promotion (AWS)
This phase details the steps required to restore the snapshot and activate the new standalone cluster.
Step 6: Configure the New Vault Server
On the new EC2 instance, create the configuration file (e.g., /etc/vault/config.hcl). Configure the required storage backend (which must match the source cluster) and listener settings. Here's an example (note: your configuration will likely differ based on your specific environment and needs):
# File: /etc/vault/config.hcl
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
# Disables TLS, often used when Vault sits behind a TLS-terminating load balancer.
}
storage "raft" {
path = "/opt/vault/data"
node_id = "vault-node-1"
# Unique node ID within the cluster.
}
seal "awskms" {
# CRITICAL: This enables AWS KMS Auto-Unseal and must match the Seal Type of the source cluster.
kms_key_id = "arn:aws:kms:us-east-1:AWS_ACCOUNT_ID:key/KMS_KEY_ID_HERE"
region = "us-east-1"
# The EC2 instance must have an IAM role that allows kms:Decrypt permission on this specific KMS Key.
}
ui = true
# Enables the built-in web user interface.
disable_mlock = true
# Required when Vault is run as a non-root user and mlock permissions cannot be granted.
api_addr = "http://TARGET_EC2_IP_ADDRESS:8200"
# Advertised address for client requests (use the IP or hostname).
cluster_addr = "http://TARGET_EC2_IP_ADDRESS:8201"
# Advertised address for inter-node communication (Raft).
license_path = "/etc/vault.d/vault.hclic"
# Path to the Vault Enterprise license file.Step 7: Restore the Raft Snapshot (Self-Correcting Procedure)
The restore operation must be run against a stopped Vault server to ensure a clean data overwrite.
-
Stop Vault Service:
sudo systemctl stop vault
-
Restore the Snapshot: Execute the restore command. This will overwrite all data in the storage backend.
vault operator raft snapshot restore -force /home/ec2-user/VAULT_MIGRATION_SNAPSHOT.snap
Cluster ID Note: The restored cluster will inherit the exact Cluster ID and data from the source cluster, treating the new AWS instance as the continuation of the original primary.
-
Start Vault Service: Start the service for the unseal and activation process.
sudo systemctl start vault
Step 8: Unseal and Activate the Standalone Primary
The restored cluster must be unsealed.
-
Authenticate (if possible): Authenticate using a valid root token.
vault login <ROOT_TOKEN_HERE>
-
Unseal Vault: If manual unseal is required (i.e., not using AWS KMS auto-unseal), use the original source cluster's unseal keys.
# Repeat this command, supplying a different key each time until the unseal threshold is met. vault operator unseal
Troubleshooting: CLI Connection Errors
| Error Message Observed | Cause | Fix (Run this command) |
|---|---|---|
http: server gave HTTP response to HTTPS client |
CLI defaults to HTTPS, but Vault listener uses HTTP (no TLS). | export VAULT_ADDR='http://127.0.0.1:8200' |
dial tcp 127.0.0.1:8200: connect: connection refused |
Vault server not running or not started yet. | sudo systemctl start vault |
Code: 503. Errors: * Vault is sealed |
Server initialized but locked. | Proceed with unseal steps above. |
Step 9: Final Verification and Traffic Cutover
-
Confirm Standalone Status: Verify the status on the AWS Target Cluster. Output should confirm the cluster is operating as Primary standalone with no replication.
vault read sys/replication/status
Expected Output (key fields):
Key Value --- ----- dr map[mode:disabled] performance map[mode:disabled]
If the output shows
mode:disabledfor bothdrandperformance, the migration is complete, and the Vault cluster is a fully standalone primary.
-
Verify Data Integrity: Log in using the root token or user credentials and verify critical secrets or policies exist.
vault kv get secret/critical/SECRET_PATH
- Cut Over: Update application configs or load balancer settings to point all traffic to the new AWS Vault EC2 instance.
Additional Information
- Strategic Recommendation: You can leave the original primary DR cluster and secondary DR cluster intact without replication, build the AWS cluster in parallel, bootstrap it with a Raft snapshot taken from the current active primary DR cluster, and then simply redirect application traffic to the new AWS cluster. This leaves the on-prem clusters as a fallback.
- Vault Reference Architecture: Integrated Storage vs External Storage
- Hardware Sizing for Vault Servers Sizing Recommendations
- Consul Snapshot Restore: Consul Snapshot Restore Documentation