Introduction
Problem
Joining a new Vault node to an existing single node initialized and active Vault cluster fails with the following error message in Vault's Operational Logs:
... [ERROR] core: failed to retry join raft cluster: retry=2s
... err=
... | failed to send answer to raft leader node: error decrypting challenge: error decrypting seal wrapped val>
... | error decrypting using seal pkcs11: 2 errors occurred:
... | \t* pkcs11: 0xC0: CKR_SIGNATURE_INVALID
... | \t* error initializing fallback verify hmac operation: pkcs11: 0x60: CKR_KEY_HANDLE_INVALID
# ... = truncated date & journalctl process prefix
Prerequisites (if applicable)
- Vault Enterprise Edition
- a 3rd party HSM solution used for auto unseal.
Cause
In this particular case the Vault configuration file for the initialized and active Vault node and the configuration file for the joining Vault node both contained identical configuration parameters within the pkcs11 stanza used for the HSM configuration. The generate_key configuration parameter was set to true for both nodes. However the configuration file used by the the 3rd party pkcs11 library used to communicate with the 3rd party HSM, on the initialized and active Vault node was pointing to a different ip address than the configuration file used by the the 3rd party pkcs11 library used to communicate with the 3rd party HSM on the joining node and no replication between the separate HSM instances was configured. This resulted in 2 HSM keys being generated with the same key_label and hmac_key_label on two different HSM instances. Therefore both Vault nodes were effectively using a different HSM key and thus joining the Vault Cluster failed. It is worth mentioning that the CKR_SIGNATURE_INVALID and the CKR_KEY_HANDLE_INVALID error messages were returned by the the 3rd party pkcs11 library and not by Vault itself.
Overview of possible solutions (if applicable)
Solutions:
-
Making sure that the configuration file used by the the 3rd party pkcs11 library is configured to use the same HSM instance.
-
Making sure that HSM replication is properly configured, if the HSM infrastructure allows this kind of configuration
Outcome
The expected outcome is that Vault nodes are able to join an existing Vault Cluster.