The NLB
The Network Load Balancer in AWS is the preferred method of load balancing in AWS due to the ability to pass through TLS connections so that the Vault nodes can handle TLS termination. The usage of Application Load Balancer(ALB) is discouraged due to TLS terminating at the load balancer level and Vault will need end to end TLS connections.
An example, without the certificate:
Target group
Configuration of the target groups is important. The load balancer will need to know where to send requests to Vault. The target group should perform its health checks on port 8200 using HTTPS pointing to the `/v1/sys/health` endpoint. Specific checks against other codes and statuses can be utilized by adding additional parameters to this endpoint. For example, if there is a need for the standby nodes to receive requests, `/v1/sys/health?standbyok=true` can be used to check if the node is a standby and return a healthy status for the load balancer. For further options, please refer to the API docs here.
The load balancer
The load balancer should have a TLS certificate installed on it, and should allow TCP traffic through the target groups on port 8200 to the Vault cluster. AWS NLBs do support TLS termination so it is important to ensure that this is not enabled in certain circumstances, specifically the certificate authentication method will need to terminate its connection directly on the Vault instance. It is best practice to not use a certificate on the NLB in this case.
Vault configuration
It is recommended to set the active node as the node that services requests. If the additional standby nodes are set to service requests, it may initiate a redirect loop of requests from the standby node to the load balancer. This is due to how the standby nodes handle requests. This is not an issue with performance standbys when using the enterprise version of Vault.
From our documentation here
"If the only access to the Vault servers is via the load balancer, the api_addr on each node should be the same: the address of the load balancer. Clients that reach a standby node will be redirected back to the load balancer; at that point hopefully the load balancer's configuration will have been updated to know the address of the current leader. This can cause a redirect loop and as such is not a recommended setup when it can be avoided."
Note: When using a load balancer with replication enabled, point the primary_cluster_addr to the load balancer for replication purposes when using primary replication.
Suggested reading and tutorials
- https://learn.hashicorp.com/tutorials/vault/monitor-replication#port-traffic-consideration-with-load-balancer
- https://www.vaultproject.io/docs/concepts/ha
- https://www.vaultproject.io/docs/concepts/ha#behind-load-balancers