Introduction
The expiration manager is an internal Vault component that is not directly exposed. The expiration manager owns the lease store and performs the following functions:
- Loads all lease entries from storage into memory at startup
- When a lease is created or restored, starts a goroutine that handles revocation when the TTL expires
- Is called when a lease is explicitly revoked before its TTL expires to cleanup the lease entry
- Periodically publishes
num_leases
metrics based on pending leases in memory
Revocation Mechanics: Sync, Failures and Forcing
User Revoke Requests
By default, revocations are done using the revoke endpoint. During the revocation, the TTL is set to 0, allowing the expiration manager to handle a lease as though it had expired normally. There is an undocumented sync option which instead asks for the lease to be fully revoked while the caller waits.
To revoke a lease, the expiration manager routes a revoke request to the backend that originated the lease using the path stored in the lease entry. This action reaches out to external systems and revokes the lease that was created. If the external revocation fails, the caller returns an error unless the
-force
flag was provided. If the -force
flag was set to true, the lease entries and any secondary indexes will be cleaned up, regardless of any external errors.
Internal Revoke on Expiry
When a TTL reaches 0 and the expiration manager needs to revoke a lease, the action performed is similar to when an explicit request is sent to the revoke endpoint. The key difference is in failure handling. When an error occurs in the external system's revocation, an error is logged, a sleep is applied with an exponential backoff, then a retry is initiated. Currently there are six attempts made, after which the lease entry is deleted despite the lease not being revoked externally.
Performance Standbys
Performance standbys also use the expiration manager to keep track of which leases are valid. In this case, when expiration occurs the lease is simply removed from memory, since the active node handles revocation and storage cleanup.
Performance
The number of leases can impact Vault's performance. Loading leases into memory during startup can generate significant read I/O. Additionally, a large number of leases expiring simultaneously can result in a high volume of expirations, potentially overloading the storage layer. Several of changes have been introduced to manage these scenarios more effectively, along with options to tune expiration manager behavior.
The default values are suitable for most workloads. These values should only be adjusted if you have a thorough understanding of the impact. These changes should be tested on clusters with comparable lease activity. Lease specific metrics and system level metrics should be monitored for changes in performance. A maintenance window should be scheduled when deploying these changes. As always, ensure that Vault is backed up prior to making any modifications.
1.7+ introduced throttling expirations automatically without user intervention to avoid overloading the storage layer. The environment variable VAULT_LEASE_REVOCATION_WORKERS
can be set to adjust the throttle. By default, this value is set to 200 and can be reduced.
1.8+ introduced enhanced expiration manager functionality to internally mark leases as irrevocable after 6 failed attempts at revocation. This provides a way to stop attempting revocation on leases which are identified as irrevocable. An HTTP API and CLI command are also available to assist you in identifying irrevocable leases.