Introduction
Problem
Performing a Vault Quorum recovery using a peers.json file fails with the following message:
[WARN] storage.raft: not part of stable configuration, aborting election
This is the relevant section of the Vault Operational Log:
[INFO] storage.raft: raft recovery initiated: recovery_file=peers.json
[INFO] storage.raft: raft recovery found new config: config="{[{Voter vault_4 192.168.56.9:8201}]}"
[INFO] storage.raft: snapshot restore progress: id=bolt-snapshot last-index=328 last-term=5 size-in-bytes=0 read-bytes=0 percent-complete="NaN%"
[INFO] storage.raft: raft recovery deleted peers.json
[INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault4\", NotifyCh:(chan<- bool)(0xc000d1fab0), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc0011810e0), NoSnapshotRestoreOnStart:true, skipStartup:false}"
[INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_4 Address:192.168.56.9:8201}]"
[INFO] storage.raft: entering follower state: follower="Node at 192.168.56.9:8201 [Follower]" leader-address= leader-id=
[WARN] storage.raft: not part of stable configuration, aborting election
[TRACE] storage.raft: reloaded raft config to set lower timeouts: config="raft.ReloadableConfig{TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000}"
[TRACE] storage.raft: finished setting up raft cluster
Prerequisites (if applicable)
- Vault Enterprise
- Vault is configured using the Integrated storage (Raft) backend
Cause
- This issue occurs if the (node) id specified in the peers.json file used to perform the quorum recovery, doesn't match with the actual node_id specified in the Vault configuration file.
- This issue could also occur if the node_id specified in the Vault configuration file doesn't match with the actual node_id stored in the local Integrated storage (Raft) backend used by the Vault instance.
Overview of possible solutions (if applicable)
Solutions:
-
The (node) id specified in the peers.json file has to match with the actual node_id specified in the Vault configuration file.
- The node_id specified in the Vault configuration file has to match with the actual node_id stored in Raft.
In most cases the node_id can be obtained from the Vault configuration file. You can override node_id with the VAULT_RAFT_NODE_ID environment variable. When the VAULT_RAFT_NODE_ID is not set and / or the node_id is not set in the Vault configuration file, Vault assigns a random GUID during initialization and writes the value to data/node-id in the directory specified by the path parameter.
Outcome
Once the (node) id specified in the peers.json has been corrected, the Vault Quorum recovery using a peers.json file should be successful and the respective Vault should become an active node.
Additional Information
-
Vault Tutorial: Vault cluster lost quorum recovery