This article will discuss how to configure and enable readiness and liveness probes for Vault running in Kubernetes and with replication.
Scenario
Customers may have readiness and liveness probes enabled before enabling replication.
If the customer already has probes setup and enables replication, the pods on the secondary will fail the health checks with a 503 since Vault will seal on the secondary when replication is enabled.
Recommendation
- Update Helm chart and set readiness and liveness probes to false on the secondary cluster
-
readinessProbe:
enabled: false
path: '/v1/sys/health?standbyok=true&perfstandbyok=true'
livenessProbe:
enabled: false
path: '/v1/sys/health?standbyok=true&perfstandbyok=true'
-
- Deploy new version of Helm chart for the secondary cluster
-
helm upgrade vault hashicorp/vault -f <vault-values>.yaml
-
- Reschedule pods on the secondary cluster
-
# Run for each pod in the cluster starting with the standbys before moving to the active
kubectl delete pod <vault-pod>
-
- Unseal pods on the secondary using secondary unseal keys. This process is automatic if auto-unseal is configured.
-
kubectl exec -ti <vault-pod> -- vault operator unseal $UNSEAL_KEY
-
- Enable replication (PR or DR)
- On the primary cluster
-
vault login <token>
kubectl exec -ti <vault-pod> -- vault write -f sys/replication/<replication_type>/primary/enable
kubectl exec -ti <vault-pod> -- vault write sys/replication/<replication_type>/primary/secondary-token id="secondary" -format=json
-
- On the secondary cluster
-
vault login <token>
kubectl exec -ti <vault-pod> -- vault write sys/replication/<replication_type>/secondary/enable token=$TOKEN
-
- On the primary cluster
- Update Helm chart and set readiness and liveness probes to true on the secondary. Note: the below path will need to be modified for your environment. Please refer to our sys/health doc for more details
-
readinessProbe:
enabled: true
path: '/v1/sys/health?standbyok=true&perfstandbyok=true'
livenessProbe:
enabled: true
path: '/v1/sys/health?standbyok=true&perfstandbyok=true'
-
- Deploy new version of Helm chart for the secondary cluster
-
helm upgrade vault hashicorp/vault -f <vault-values>.yaml
-
- Reschedule pods on the secondary cluster
-
# Run for each pod in the cluster starting with the standbys before moving to the active
kubectl delete pod <vault-pod>
-
- Unseal pods on the secondary using primary cluster unseal keys
-
kubectl exec -ti <vault-pod> -- vault operator unseal $UNSEAL_KEY
-