Vault Raft quorum recovery fails with error: "not part of stable configuration, aborting election" – HashiCorp Help Center

Introduction

Problem

Performing a Vault Quorum recovery using a peers.json file fails with the following message:

[WARN] storage.raft: not part of stable configuration, aborting election

This is the relevant section of the Vault Operational Log:

[INFO] storage.raft: raft recovery initiated: recovery_file=peers.json
[INFO] storage.raft: raft recovery found new config: config="{[{Voter vault_4 192.168.56.9:8201}]}"
[INFO] storage.raft: snapshot restore progress: id=bolt-snapshot last-index=328 last-term=5 size-in-bytes=0 read-bytes=0 percent-complete="NaN%"
[INFO] storage.raft: raft recovery deleted peers.json
[INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault4\", NotifyCh:(chan<- bool)(0xc000d1fab0), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc0011810e0), NoSnapshotRestoreOnStart:true, skipStartup:false}"
[INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_4 Address:192.168.56.9:8201}]"
[INFO] storage.raft: entering follower state: follower="Node at 192.168.56.9:8201 [Follower]" leader-address= leader-id=
[WARN] storage.raft: not part of stable configuration, aborting election
[TRACE] storage.raft: reloaded raft config to set lower timeouts: config="raft.ReloadableConfig{TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000}"
[TRACE] storage.raft: finished setting up raft cluster

Prerequisites (if applicable)

Vault Enterprise
Vault is configured using the Integrated storage (Raft) backend

Cause

This issue occurs if the (node) id specified in the peers.json file used to perform the quorum recovery, doesn't match with the actual node_id specified in the Vault configuration file.
This issue could also occur if the node_id specified in the Vault configuration file doesn't match with the actual node_id stored in the local Integrated storage (Raft) backend used by the Vault instance.

Overview of possible solutions (if applicable)

Solutions:

The (node) id specified in the peers.json file has to match with the actual node_id specified in the Vault configuration file.
The node_id specified in the Vault configuration file has to match with the actual node_id stored in Raft.

In most cases the node_id can be obtained from the Vault configuration file. You can override node_id with the VAULT_RAFT_NODE_ID environment variable. When the VAULT_RAFT_NODE_ID is not set and / or the node_id is not set in the Vault configuration file, Vault assigns a random GUID during initialization and writes the value to data/node-id in the directory specified by the path parameter.

Outcome

Once the (node) id specified in the peers.json has been corrected, the Vault Quorum recovery using a peers.json file should be successful and the respective Vault should become an active node.

Additional Information

Vault Tutorial: Vault cluster lost quorum recovery