Introduction
As of Vault 1.16.0, new installations of Vault Enterprise will include a default global quota with a max_leases value of 300000. This value is an intentionally low limit, intended to prevent runaway leases in the event that no other lease count quota is specified.
This limit will affect all new clusters with no pre-existing configuration. As with any other quota, the default can be directly increased, decreased, or removed using the lease-count-quotas endpoints.
The default may also be overridden by higher precedence quotas (specified for a namespace, mount, path, or role) as described in the Lease count quota precedence documentation.
However if the Vault cluster which has been created using a Vault version prior to Vault 1.16.0, or with an increased default global lease quota, it might suffer from a lease explosion which could exceed millions of leases. In both cases with or without the default global lease quota set, finding the origin of the lease explosion might be difficult. The intention of this article is to share some of the different ways to start identifying which mounts are the cause of the lease explosion.
Prerequisites
- Vault Enterprise Edition.
Overview of possible ways to identify the busiest mount(s) :
Using Telemetry
By Analyzing the Vault Audit logs
-
By Listing the leases in the sys/leases/lookup path
Using Telemetry
-
While using Telemetry, the following Vault metrics could be of relevance:
- vault.expire.num_leases
- vault.expire.lease_expiration
- vault.token.count
- vault.token.count.by_ttl
- vault.token.count.by_auth
-
While visualizing the above metrics in for example Grafana or any similar dashboard, the labels which can be used might be useful when for example a Vault Cluster or Vault Namespace contains multiple Authentication Methods or Secrets Engines of the same type.
In this case it might be useful to separate leases per mount point.
The vault.token.count.by_auth metric has the following labels: the token count by cluster, namespace, and authentication method.While vault-token-creation has the following labels: count by cluster, namespace, authentication method, mount point, time to live (TTL), and token type.
This allows one to drill down into more detail, which might be required in case multiple instances of the same Vault Authentication Method are being used.
By Analyzing the Vault Audit log
The Vault Audit logs can be used to identify the busiest mount point and possibly the remote_address of faulty consumers which are generating an excess of leases. The following knowledge base article contains several examples of
jqqueries which can be used to narrow the down the affected mount or busiest clients:
Vault Audit Log analysis using jq CLIIf a load balancer is used to front Vault with, the Vault audit logs might show the load balancer address instead of the faulty consumer in the remote address field.
When the load balancer proxies a TCP connection, it overwrites the client’s source IP address with its own when communicating with the backend server. However, when relaying HTTP messages, it can store the client’s address in a non-standard HTTP header used for the purpose such asX-Forwarded-For.The backend server can then be configured to read the value from that header to retrieve the client’s IP address. For more information please see the Vault TCP listener x_forwarded_for_authorized_addrs configuration parameter.
By listing the leases in sys/leases/lookup
Listing leases using for example the Vault CLI can be performed as follows:
vault list sys/leases/lookup/In case of Vault authentication methods:
vault list sys/leases/lookup/auth/<name of authentication mount>For example:
vault list /sys/leases/lookup/auth/approle/loginSome Vault Authentication methods differentiate roles used to authenticate with, which makes it possible to drill down even more. This is not the case for the Vault Approle Authentication method used in this example, as it stores all leases under the
<login>path.In case of Vault Secrets Engines:
vault list /sys/leases/lookup/<name of secrets engine mount>Depending on the Vault Secrets Engine used, it might be possible to drill down further using role names.
Please note that leases are stored under
/sys/lookup/leaseson a per namespace bases, meaning that if the Vault Cluster contains several namespaces identifying the mount which contains most leases using the Vault CLI might not be feasible.In case of a high number of leases, it might be required to increase the value specified for VAULT_CLIENT_TIMEOUT. For more information please refer to the Vault CLI documentation.
Additional Information
API Documentation: lease-count-quotas
Vault Documentation: Prevent Lease Explosions
Vault KB Article: Troubleshooting Lease Expiration
Vault KB Article: Warning lease count exceeds warning lease threshold in Vault Operational Logs
Vault KB Article: Vault Audit Log analysis using jq CLI
Vault Documentation: All Vault telemetry metrics