Introduction
This article provides step-by-step instructions to enable HashiCorp Vault Disaster Recovery (DR) replication between two Vault Enterprise clusters deployed in separate Kubernetes environments using the official Helm chart. It addresses a common issue where DR replication fails because port 8201, required for internal cluster-to-cluster gRPC communication, is not exposed by default in standard Helm ingress configurations.
Expected Outcome
By following this guide:
Both Vault clusters will be accessible over the required API (8200) and cluster (8201) ports.
DR replication between the primary and secondary clusters will be successfully enabled.
Network and load balancer configuration will meet Vault’s connectivity requirements for mTLS-enabled replication.
Prerequisites
Vault Enterprise license with DR replication feature enabled.
Two Vault Enterprise clusters deployed via the official Vault Helm chart on separate Kubernetes clusters or namespaces.
Administrative access to modify Kubernetes
values.yaml
, ingress, and service resources.Network connectivity between clusters (firewalls, security groups, or service meshes configured to allow traffic).
Familiarity with Helm upgrades and values.yaml customisation.
Use Case
Many Kubernetes-based Vault deployments rely on ingress controllers configured only for port 8200 (the Vault API port). Vault DR replication, however, also requires port 8201 for cluster-to-cluster communication. Without proper forwarding of port 8201, DR replication setup fails during the initial handshake. This guide helps you properly expose both ports and configure replication addresses.
Procedure
-
Verify Load Balancer Configuration
Each Vault cluster must be fronted by its own dedicated load balancer.
For the cluster port (8201), you must use a Layer 4 TCP load balancer (e.g., AWS NLB or equivalent).
The Layer 4 load balancer must point to the active node in the cluster. This is a mandatory requirement because cluster-to-cluster replication traffic must be routed to the active node to establish and maintain communication.
A Layer 7 load balancer such as AWS ALB or a standard HTTP(S) ingress controller (e.g., NGINX) cannot handle Vault’s cluster port traffic because it uses gRPC over mTLS, which is incompatible with HTTP-based L7 routing.
-
Expose Both Required Ports
Ensure your Kubernetes service exposes both 8200 and 8201 to the load balancer.
If external access is required, configure the load balancer to forward traffic for both ports to the Vault pods.
Avoid using Layer 7 ingress (NGINX or similar) for port 8201. If external access is required, use a separate Layer 4 TCP service or load balancer for cluster traffic.
-
Verify TLS and mTLS Requirements
Ensure mTLS traffic between clusters is not terminated or altered by intermediate components.
Avoid SSL termination or any packet/header modification in the load balancer path.
-
Reconfigure DR Replication
Disable any existing DR replication configuration on the secondary cluster.
Generate a new DR secondary token on the primary cluster, as these tokens are single-use.
-
Re-enable DR replication on the secondary cluster specifying:
primary_cluster_addr: Use the DNS name of the cluster’s Layer 4 load balancer with port 8201.
primary_api_addr: Use the DNS name of the cluster’s Layer 4 or Layer 7 load balancer with port 8200.
-
Example:
primary_cluster_addr =
https://vault-primary-nlb.example.com:8201
primary_api_addr =
https://vault-primary-nlb.example.com:8200
-
Validate Connectivity and Status
Confirm that the secondary cluster can reach the primary cluster on both ports.
Use the DR status API to verify replication status:
curl $VAULT_ADDR/v1/sys/replication/dr/status | jq