Introduction:
While using vault with TLS enabled (always recommended for production cluster), communication from client to server happens using TLS protocols. Post enabling this, there might be some issues that can show up in your environment, even if the Vault service is up and running.
By default, Vault TCP listeners only accept TLS 1.2 or 1.3 connections and will drop connection requests from clients using TLS 1.0 or 1.1 (see Default TLS Configuration). Forcing Vault to use specific TLS versions can be achieved by using tls_min_version and/or tls_max_version parameters in the listener stanza.
Common issues & troubleshooting:
1. Issues with specific clients unable to access the Vault through API/UI -
Error message to verify:
[INFO] http: TLS handshake error from 122.161.77.146:51530: tls: no cipher suite supported by both client and server
Cause:
The client attempting to use a TLS version/TLS cipher suite that the server does not support.
Solution:
Check the list of tls_cipher_suites defined in vault config; and verify if the client's TLS protocol does not have the supported cipher_suite capability.
2. Vault is up on vault nodes, but clients are unable to access vault on load balancer address with bad gateway errors -
Error message to verify:
http: TLS handshake error from LB-IP:9968: tls: client offered only unsupported versions: [303 302 301]
Cause:
The above error denotes that the load balancer is unable to communicate to vault nodes using the required TLS protocol (as per defined TLS versions in config; which can be the default, or as defined through tls_min_version/tls_max_version).
Below is what it shows while checking the Vault status on the terminal:
ubuntu@ip-172-31-22-19:~$vault status
Error checking seal status: Error making API request.
URL: GET https://LB_address/v1/sys/seal-status
Code: 502. Raw Message: 502 Bad Gateway
Solution:
Update the policy at the load balancer level to allow communication over TLS protocol as needed. For example; this is managed through security policy while configuring the AWS application load balancer.
3. Vault is operational, but TLS handshake related [INFO] level warning in operational logs is being seen for different clients-
Error message to verify:
http: TLS handshake error from 122.161.77.132:31478: read tcp4 172.31.22.19:8200->122.161.77.132:31478: i/o timeout
[INFO] http: TLS handshake error from 122.161.77.132:57632: EOF
Cause:
The above errors are likely from the health check of the load balancer or any other unintended traffic reaching out to vault nodes on ports other than API and cluster ports.
Solution:
Although these do not affect your vault cluster health/performance, you can consider looking for the source of these addresses and disable unnecessary health check polling vault nodes.
References: