Problem
The following log line is observed in Vault Operational Logs:
[WARN]failed to unseal core: error="stored unseal keys are supported, but none were found"
And the node is unable to unseal itself and join the Vault cluster.
Prerequisites
- This article assumes that you are using Vault Auto-Unseal feature. Please check the Vault Config and see the unseal mechanism being used.
-
vault status
states that Vault is initialized and sealed.
Cause
-
Vault is unable to unseal due to its reachability to the Cloud KMS or HSM provider.
- Vault is unable to unseal due to wrong key being specified or that the key has been changed/removed.
- Vault is unable to unseal due to Autopilot marking the node as unhealthy.
Solutions:
- Check the Vault Operational Logs for the impacted node and see if there are any errors in
finalizing seals
. Especially watch out for [ERROR] and see if there are issues reaching out to the KMS/HSM that is configured for your Vault environment. - If you are not observing any [ERROR] related to seals in the Vault Operational Log and that it is verified the Vault node has joined the cluster with
vault operator raft list-peers
as non-voter node. The next step would be looking at Vault Autopilot as the autopilot may have marked the node as unhealthy therefore it is unable to join the Vault cluster and unseal. There are multiple settings that could have contributed to the unhealthy node issue. Below is the list:- The max_trailing_logs is the most common case when Vault cluster is used extensively therefore it takes a long time for the new node to sync before it is considered as healthy. Setting this value from its default of
1000
to a higher value may be necessary. Runvault autopilot state
to validate the theLast Index
and set the value to be larger than the index. Note that setting this value could potentially impact the Vault cluster and cause it to freeze due to the large amount of data that is required for syncing a new node. Therefore, it is advisable to perform the action during off-peak hours to minimize the performance impact and be mindful of the impact. Note that this issue has been resolved in version 1.12.9, 1.13.5, 1.14.0 so it is strongly advisable to perform an upgrade of the cluster. You may refer to the article New Raft Nodes failed to Join Cluster for more details. - Sometimes, the Vault server might take a bit longer to become ready and the autopilot might have marked it as unhealthy before it becomes ready. Please look at the last_contact_threshold and dead_server_last_contact_threshold settings and ensure that there are enough time for the node to become ready.
- The max_trailing_logs is the most common case when Vault cluster is used extensively therefore it takes a long time for the new node to sync before it is considered as healthy. Setting this value from its default of
Additional Information
-
A few useful commands to check the autopilot state and configuration:
vault operator raft autopilot state
will list out whether the node is marked as healthy or unhealthy by the autopilot.vault operator raft autopilot get-config
will list out the current configuration settings the autopilot.
- The tutorial for autopilot is useful in getting to understand the features of autopilot.
- The tutorial for auto-unseal is useful in getting to understand how auto-unseal can be set-up from different KMS providers.