If the Consul storage backend is used for Vault then it's important to consider the default Consul agent parameter:
The default value of
200 connections may be insufficient resulting in issues during high loads and subsequent parallelism to Consul from Vault that's in excess of the set maximum.
Some examples of related scenarios can include:
- High volume of requests to Vault or a new peak in the number of requests per second. The Vault Telemetry such as
vault.consul.*, as well as the Vault Audit Logs should help identify volumes and transaction rates. Consul specific matrices such as:
consul.rpc.requestcan also help.
- During Vault the startup / boot phase, iterative requests are abruptly stopping mid way resulting in the service restarting and cycling through similar events.
- Vault is restarted after a prolong period of being off-line where a large number of lease revocations need to be performed at a rate that's above the available connections and so resulting in a restart each time. This is particularly common to older versions of Vault prior to 1.7.x that do not have improvements in the revocation manager which help to prevent that from happening. In these cases some differences can be observed (different Lease IDs) in the revoked leases referenced from the offending mount; Vault continues to expunge expired leases with each restart before exceeding available connections limits and repeating the process with another restart again.
Set http_max_conns_per_client in accordance to the measured precedence using the Vault Audit Log or the Vault Operational Log to determine what's needed as well considering the hardware resources available to the Consul servers. For example an increased value of
300 may be sufficient if the mentioned scenarios being observed are intimidatory and only occurring during certain peaks and or other periods when a restart is made but several more restarts are then transpiring before the Vault service becomes stable.
To commence with the required increase begin by stopping both Vault and Consul agent services on the Vault hosts before making any adjustments to the Consul configuration file. It's also good to confirm the initial state of Consul members & peers so that a similar state can also be confirmed at the very end when all changes have been successfully applied.
# // On Vault host:
consul members ;
sudo systemctl stop vault ;
sudo systemctl stop consul ;
Add the new
limits stanza to the Consul HCL file:
# // contents of Consul agent conf '/etc/consul.d/consul.hcl' on Vault
server = false
data_dir = "/opt/consul"
node_name = "PR-US-vault1-agent"
# ... reset of conf ...
# // add:
http_max_conns_per_client = 300
Restart Consul so that the set parameter can take effect:
sudo systemctl start consul && sudo journalctl -u consul -f ;
consul members ;
Proceed to restart Vault:
sudo systemctl start vault && sudo journalctl -u vault -f ;
Continue to monitor all Telemetry and logs to verify that the the newly set increases in limits have sufficed.
Other CLI tools such as
lsof may also be used to get a count of connections open to Consul and in conjunction with the command
watch fluctuations in the number of connection can be monitored - an example of this can may be like:
# // consul connections on IPv4
sudo lsof -i4 | grep consul ;
# consul 15643 consul 8u IPv4 45227 0t0 TCP ...:8301 (LISTEN)
# consul 15643 consul 9u IPv4 45228 0t0 UDP ...:8301
# consul 15643 consul 10u IPv4 45229 0t0 UDP localhost:8600
# consul 15643 consul 11u IPv4 45231 0t0 TCP localhost:8600 (LISTEN)
# consul 15643 consul 12u IPv4 45233 0t0 TCP localhost:8500 (LISTEN)
# consul 15643 consul 13u IPv4 45279 0t0 TCP localhost:8500->localhost:42466 (ESTABLISHED)
# consul 15643 consul 14u IPv4 45349 0t0 TCP ....:45753->....:8300 (ESTABLISHED)
# consul 15643 consul 16u IPv4 45313 0t0 TCP ....:41505->....:8300 (ESTABLISHED)
watch 'sudo lsof -i4 | grep consul | wc -l' ;