DISCLAIMER: This article should be used only when the quorum in the Raft cluster is lost PERMANENTLY, and there is no way to recover enough nodes to reach quorum (elect a leader).
Introduction :
The integrated storage option (aka Raft) was introduced with Vault 1.4, more and more practitioners are adopting it as their main storage backend for Vault.
Until now, many practitioners have used Consul as their storage backend for Vault. Because of this, maintaining quorum within the cluster was a Consul task. With Integrated Storage within Vault, maintaining quorum must now be considered as part of your Vault environment.
The cluster quorum is updated dynamically, when more nodes are joined to the cluster. The quorum needed for a cluster to be able to perform read and write operations can be calculated by using the formula (n+1)/2
, where n
is the number of all nodes in the cluster. For example, if the number of all nodes in the cluster is 3, then (3+1)/2 = 2
, so 2 nodes need to be operational in order for the cluster to function properly.
Note: There is an exception to this rule if the
-non-voter
option is used while joining to the cluster, it is only available in Vault Enterprise. Voter status can be checked by referencing theVoter
column in the output ofvault operator raft list-peers
Use Case :
The typical use case would be if in a three-node Vault Raft cluster, two of the nodes are permanently lost and there is no method of recovery, but the third node is in perfect working condition. One functioning node is not enough to reach the required quorum.
When a quorum is not reached (no leader in the cluster), no operations like reads and writes can be performed within the cluster, as at least two nodes in this three-node cluster need to be functioning in order to reach a quorum.
The names and statuses of the nodes in the cluster are reflected below :
vault1 - up and healthy
vault2 - down, not recoverable
vault3 - down, not recoverable
Procedure :
-
Login to the healthy node, in this case
vault1
. - Stop the Vault service on
vault1
. If you are using systemd you can executesystemctl stop vault
. At this point the Vault service should be stopped on all nodes within the cluster, even if you have other healthy nodes. -
Locate the Raft storage directory, this is defined with the configuration file (typically vault.hcl) which Vault is using. The stanza looks like this:
storage "raft" { path = "/opt/vault/data" node_id = "vault1" }
We will follow the example above of
/opt/vault/data
. -
Inside the data directory, you should see a folder named
raft
:
Within the$ tree /opt/vault/data
/opt/vault/data ├── raft
│ ├── raft.db
│ └── snapshots
└── vault.db
2 directories, 2 filesraft
directory create a file namedpeers.json
. In this example, the full file path would be/opt/vault/data/raft/peers.json
:$ tree /opt/vault/data
/opt/vault/data ├── raft
│ ├── raft.db
│ ├── peers.json
| └── snapshots
└── vault.db
2 directories, 3 files - Edit the file with the following content :
[ { "id": "vault1", "address": "192.168.0.1:8201", "non_voter": false } ]
Update the
id
value so it matches the value for the *node_id
parameter specified in the Vault configuration file, and set theaddress
value to match thecluster_addr
parameter in the Vault configuration file. This value must either be an IP address or an FQDN that is appended with the cluster port, by default:8201
. The value entered here must be reachable from all other Vault nodes.
*Note that the node_id may also be specified in an automatically generated file called node_id
under the raft directory.
- As the recovery process will read and then delete the
peers.json
file we suggest making a copy of thepeers.json
file and saving it elsewhere on the filesystem, should another attempt at recovery be necessary. -
Start the Vault service, if you are using systemd you can execute
systemctl start vault
. Sending the SIGHUP signal to the Vault process will not work. -
Unseal Vault. If an auto-unseal method is being used this step is not necessary. Confirm that Vault is unsealed by running
vault status
and checking the value forSealed
isfalse
. -
If the procedure worked so far, you should see this message in the system logs while starting Vault :
Running2020-06-18T09:55:05.014Z [INFO] storage.raft: raft recovery initiated: recovery_file=peers.json 2020-06-18T09:55:05.019Z [INFO] storage.raft: raft recovery found new config: config="{[{Voter vault1 192.168.0.1:8201}]}" 2020-06-18T09:55:05.024Z [INFO] storage.raft: raft recovery deleted peers.json
journalctl -u vault --no-pager
on the server is a common method of obtaining these logs. Thepeers.json
file should no longer be present at this point as Vault has consumed the content. - Check the output of
vault status
to confirm the value forHA Mode
isactive
. -
Now, you should have a cluster consisting of one node (vault1) that is the active/leader node, a quorum of one is reached, reads and writes are allowed to the storage. You can verify that there is only one node in the cluster with
vault operator raft list-peers
. - Once additional nodes are ready to be joined to the Vault cluster they can be joined using either their existing
retry_join
configuration or by using thevault operator raft join
command.