Introduction
As of Vault 1.16.0, new installations of Vault Enterprise will include a default global quota with a max_leases value of 300000. This value is an intentionally low limit, intended to prevent runaway leases in the event that no other lease count quota is specified.
This limit will affect all new clusters with no pre-existing configuration. As with any other quota, the default can be directly increased, decreased, or removed using the lease-count-quotas endpoints.
The default may also be overridden by higher precedence quotas (specified for a namespace, mount, path, or role) as described in the Lease count quota precedence documentation.
However if the Vault cluster which has been created using a Vault version prior to Vault 1.16.0, or with an increased default global lease quota, it might suffer from a lease explosion which could exceed millions of leases. In both cases with or without the default global lease quota set, finding the origin of the lease explosion might be difficult.
The Identifying mounts involved in Vault Lease Explosions KB Article explains how to identify the involved mounts. Once the affected mounts have been identified, it might be necessary to revoke leases for the affected mounts before the leases expire on their own based on their respective configured TTL. This article contains an example script which can be used to perform batch lease revocation in such scenario. The alternative is performing a prefix based lease revocation, which can be very intrusive for the Vault Cluster as revoking for example millions of leases in one go is very resource intensive and could leave the Vault cluster unresponsive for a longer period of time.
Prerequisites
- Vault Enterprise Edition.
Example script :
The script serves as a an example only, please make sure to optimize and test it prior to using it any production environment. Please note that the script isn't supported by HashiCorp Global support.
The example below will delete any leases for a particular Authentication Method or Secrets Engine, as a consequence of this Vault clients might have re-authenticate or request new secrets.
#!/bin/bash
AUTH_PATH="auth/approle2/login/"
LOOKUP_PATH="sys/leases/lookup/$AUTH_PATH"
echo "Fetching ALL lease IDs from Vault (this may take a moment)..."
# 1. Fetch the entire list into a local variable once
ALL_IDS=$(vault list -format=json "$LOOKUP_PATH" | jq -r '.[]')
# 2. Check if we got anything
if [ -z "$ALL_IDS" ]; then
echo "No leases found."
exit 0
fi
# 3. Convert the string list into a Bash array
readarray -t ID_ARRAY <<< "$ALL_IDS"
total=${#ID_ARRAY[@]}
echo "Total leases found: $total"
# 4. Loop through the array in chunks of 50
for ((i=0; i<total; i+=50)); do
# Get a slice of 50 IDs
batch=("${ID_ARRAY[@]:i:50}")
echo "Processing batch $((i/50 + 1)) (IDs $i to $((i+${#batch[@]})))..."
for id in "${batch[@]}"; do
vault lease revoke "${AUTH_PATH}${id}"
done
# Only sleep if there are more IDs left to process
if [ $((i+50)) -lt $total ]; then
echo "Batch complete. Waiting 30 seconds..."
sleep 30
fi
done
echo "All $total leases have been processed."
This script will revoke leases from the path sys/leases/lookup/auth/approle2/loginin batches of 50, followed by a 30 pauze. The path is set using the AUTH_PATH and LOOKUP_PATH
In case of a high number of leases, it might be required to increase the value specified for VAULT_CLIENT_TIMEOUT. For more information please refer to the Vault CLI documentation.
Additional Information
API Documentation: lease-count-quotas
Vault Documentation: Prevent Lease Explosions
Vault KB Article: Troubleshooting Lease Expiration
Vault KB Article: Warning lease count exceeds warning lease threshold in Vault Operational Logs
Vault KB Article: Vault Audit Log analysis using jq CLI
Vault KB Article: Identifying mounts involved in Vault Lease Explosions
Vault Documentation: All Vault telemetry metrics