DISCLAIMER: This article should be used only when the quorum in the Raft cluster is lost PERMANENTLY, and there is no way to recover enough nodes to reach quorum (elect a leader).
Introduction :
The integrated storage option (aka Raft) was introduced with Vault 1.4, more and more practitioners are adopting it as their main storage backend for Vault.
Until now, many practitioners have used Consul as their storage backend for Vault. Because of this, maintaining quorum within the cluster was a Consul task. With Integrated Storage within Vault, maintaining quorum must now be considered as part of your Vault environment.
The cluster quorum is updated dynamically, when more nodes are joined to the cluster. The quorum needed for a cluster to be able to perform read and write operations can be calculated by using the formula (n+1)/2
, where n
is the number of all nodes in the cluster. For example, if the number of all nodes in the cluster is 3, then (3+1)/2 = 2
, so 2 nodes need to be operational in order for the cluster to function properly.
Note: There is an exception to this rule if the
-non-voter
option is used while joining to the cluster, it is only available in Vault Enterprise.
Use Case :
The typical use case would be if in a three-node Vault Raft cluster, two of the nodes are permanently down and there is no way to recover them, but the third node is in perfectly working condition. One functioning node is not enough to reach the required quorum.
When a quorum is not reached (no leader in the cluster), no operations like reads and writes can be performed within the cluster, at least two nodes in this three-node cluster need to be functioning in order to reach a quorum.
The names and statuses of the cluster are reflected below :
vault1 - up and healthy
vault2 - down, not recoverable
vault3 - down, not recoverable
Procedure :
-
Login to the healthy node, in this case
vault1
. -
Locate the Raft storage directory, it is set inside the configuration file (.hcl) which Vault is using. The stanza looks like this:
storage "raft" { path = "/path/to/raft/data" node_id = "raft_node_id" }
Let’s assume it is
/vault/data
. -
Inside the storage directory, you should see a folder named
raft
. -
Within the
raft
directory create a file namedpeers.json
, so in this example, the file would be located in/vault/data/raft/peers.json
. -
Edit the file with the following content :
[ { "id": "vault1", "address": "https://192.168.0.1:8201", "non_voter": false } ]
Replace the
vault1
with theid
of your node, and192.168.0.1:8201
address with thecluster address
of the node (usually the one that uses 8201 port). -
Stop and start Vault, if you are using Systemd you can execute
systemctl restart vault
. Sending the SIGHUP signal to the Vault process will not work. - Unseal Vault and check the status. This step depends on the seal type that is being used, might be skipped if one of the auto-unseal methods is being used.
-
If the procedure worked so far, you should see this message in the system logs while starting Vault :
2020-06-18T09:55:05.012Z [TRACE] storage.raft: setting up raft cluster 2020-06-18T09:55:05.014Z [INFO] storage.raft: raft recovery initiated: recovery_file=peers.json 2020-06-18T09:55:05.019Z [INFO] storage.raft: raft recovery found new config: config="{[{Voter vault1 192.168.0.1:8201}]}" 2020-06-18T09:55:05.024Z [INFO] storage.raft: raft recovery deleted peers.json
-
Now, you should have a cluster consisting of one node (vault1), it is also active, a quorum is reached, reads and writes are allowed to the storage. You can verify that there is only one node in the cluster with
vault operator raft list-peers
.