Introduction
This knowledge base article addresses a specific issue where a HCP Vault Dedicated (HVD) authentication mount, such as AppRole or GitHub, enters a tainted state after an attempt to disable it. This usually occurs when the backend being disabled has a very large number of associated leases or tokens, and the cleanup process times out before completion. This document provides general guidelines and resolution steps for both customers and technical support to address and prevent the recurrence of such issues.
Problem
An authentication method mount (e.g., AppRole, GitHub) becomes unusable and enters a tainted state following the execution of the vault auth disable <auth_path> command. This happens because the token and lease revocation process, which is part of the disable operation, exceeds the default client and server-side timeouts.
Symptom:
Executing the disable command (e.g., vault auth disable github) results in a timeout, and the output indicates a tainted state. Pipelines or applications dependent on the affected authentication method fail because the backend is inaccessible or incomplete.
Solution
The core of the issue is the massive number of tokens/leases requiring cleanup within the default timeout limits. The resolution focuses on increasing client and server-side request durations to allow the cleanup operation to complete successfully.
1. Extending Request Durations (Recommended Alternative Resolution)
For large-scale cleanups involving tens of thousands of leases, the recommended approach is to temporarily increase the relevant timeouts in the HVD server side configuration and/or the client configuration. This allows the internal cleanup process to finish.
This solution involves coordination between technical support and engineering, as it may require changes to the Vault server configuration to extend request durations for the cleanup to successfully finish.
Customers can then run a script on their end in conjunction with the extended timeout, to revoke all secrets, then try disabling the auth method mount.
2. Migration and Subsequent Cleanup (When Tainted State is Unrecoverable)
If the authentication mount is already in an unrecoverable tainted state, the immediate solution is often to migrate to a new, identical authentication mount and update all dependent applications to use the new mount.
| Step | Action | Detail |
| 1. | Create New Auth Mount | Configure a new authentication mount with the required settings (e.g., a new AppRole path). |
| 2. | Update Application Config | Modify all affected applications/pipelines to authenticate via the newly created mount path. |
| 3. | Apply Recommended Limits | Ensure the configuration for the new mount includes supported limits and best practices to prevent similar issues (e.g., appropriate TTLs and max_leas_ttl). |
| 4. | Cleanup Tainted Mount | After migration, run a script to loop the vault auth disable command on the original tainted auth method. This command may eventually delete the keys and reduce snapshot size once the timeouts are extended. |
General Guidelines for Prevention
To prevent recurrence of this incident, it is crucial to manage token and lease lifetimes effectively. Ensure that configuration for authentication backends includes recommended limits to prevent an excessive buildup of unexpired leases:
- Use appropriate token_ttl and token_max_ttl values.
- For methods like AppRole, ensure Secret IDs have a reasonable expiration time to prevent an accumulation of unexpired secret_ids.
Verification
After applying the resolution, the following steps confirm successful recovery:
-
Check Mount Status: Verify that the original mount is successfully disabled (or the new mount is active).
- The vault auth list output should not show the original mount path, or the new mount should be listed as healthy.
- Verify Application Functionality: Confirm that the central business unit's applications or pipelines depending on the authentication method are now successfully authenticating and performing their functions (e.g., successfully generating map data).
Conclusion
The "tainted" state of an authentication mount after a disable operation is typically an artifact of a client- or server-side timeout when handling a large volume of tokens and leases. While the server itself may remain healthy, the cleanup process is incomplete. The primary fix involves temporarily increasing client and server timeouts to allow the disable/revocation operation to finish. If the mount is unrecoverable, the recommended path is migration to a new mount, with careful configuration of supported limits to prevent future lease accumulation.
References -
HCP Vault Dedicated Overview
- https://developer.hashicorp.com/hcp/docs/vault/what-is-hcp-vault
- https://developer.hashicorp.com/hcp/docs/vault/what-is-hcp-vault/security-overview
Vault Authentication Methods
- AppRole auth: Use AppRole authentication | Vault | HashiCorp Developer
- AppRole HCP tutorial: https://developer.hashicorp.com/vault/tutorials/get-started-hcp-vault-dedicated/vault-auth-method
- GitHub auth: GitHub - Auth Methods | Vault | HashiCorp Developer
Token and Lease Management
- Tune lease TTL: https://developer.hashicorp.com/vault/docs/troubleshoot/tune-lease-ttl
- Tokens concepts: https://developer.hashicorp.com/vault/docs/concepts/tokens
- Best Practices