The information contained in this article has been verified as up‑to‑date on the date of the original publication of the article. HashiCorp endeavors to keep this information up‑to‑date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Introduction
This article addresses a specific error that occurs when attempting to rejoin a Vault Enterprise Integrated Storage (Raft) node to a cluster. The error indicates the node was previously removed from Raft membership, and its local Raft state is no longer valid. This prevents the node from joining until its local data is cleaned up. This article explains the cause, includes a reproducible test example, and provides the remediation steps.
Problem
When running:
vault operator raft join <leader-address>
Vault returns the following error:
Error joining the node to the Raft cluster: Error making API request. URL: POST http://127.0.0.1:8200/v1/sys/storage/raft/join Code: 500. Errors: * node has been removed from the HA cluster. All vault data for this node must be cleaned up before it can be added back
As a result:
- The node fails to join Raft
- The node remains sealed
- The node does not appear in
vault operator raft list-peers - Logs contain Raft membership failures
Prerequisites
This troubleshooting information applies to:
- Vault Enterprise
- Integrated Storage (Raft) deployments
- Clusters using Shamir or auto-unseal
- Environments where operators have:
- OS‑level access to the affected node
- Access to Vault logs
- Ability to restart Vault
- Ability to inspect
/vault/dataor/opt/vault/data
Cause
This error occurs when the node was previously removed from the Raft cluster. Vault will refuse to allow a removed node to rejoin if it still has stale or mismatched Raft metadata.
The most common causes are:
1. Manual Peer Removal
Example:
vault operator raft remove-peer vault1
2. Autopilot Pruning
This can be triggered by, but is not limited to:
- Consecutive health check failures
- Long downtime
- Network isolation
- Unresponsive Raft engine
3. Local Raft State Corruption
Including vault.db or raft.db corruption.
4. Node Restart With Outdated Raft Metadata
When any of these occur, the node’s local Raft state:
- No longer matches the leader
- Cannot participate in Raft consensus
- Cannot be reused for joining
Thus Vault requires complete removal of the node’s local Raft data.
Solution: Remove the node’s outdated local Raft state so Vault can rebuild it from healthy peers.
Before applying the solution, operators may optionally:
- Review the logs
- Confirm cluster health
- Reproduce the issue for testing
Step 1: Stop the Vault Service
Note: Adjust commands for your platform (systemd vs Kubernetes, etc.)
systemctl stop vault
Step 2: Back Up Existing Raft Data
This provides an additional safeguard in case restoration is needed after clearing and rebuilding Raft state.
Please note that this step is optional and may not be required depending on your use case. It creates a temporary backup, and once the process is complete and verified, you can safely remove the backup files generated during this step. Please also be mindful of disk space availability, as the size of these backup files may introduce storage constraints on the node.
Stop Vault on the node
sudo systemctl stop vault
Create file-level backups of vault.db and raft.db
cd /opt/vault/data cp vault.db vault.db.bak cp raft/raft.db raft.db.bak
Why this step matters
- These
.dbfiles will be removed in the next step to clear outdated Raft state. - Backups provide a rollback option if the node cannot rejoin the cluster.
- The example uses
cp(copy), but administrators may choose alternative methods such as the move command according to internal practices.
Step 3: Remove the Outdated Raft Files
With Vault stopped and backups created, remove the outdated Raft metadata:
cd /opt/vault/data rm -f vault.db rm -f raft/raft.db
This clears old Raft state so the node can rebuild from the leader upon restart.
Step 4: Restart Vault
systemctl start vault
Step 5: Rejoin the Cluster
vault operator raft join http://<leader>:8200
Expected result:
Joined true
Step 6: Unseal (If Using Shamir)
vault operator unseal
Step 7: Validate Cluster State
vault status vault operator raft list-peers
Expected:
Removed From Cluster: false- Node appears as a standby/follower node
- Raft commit/apply indices converge with the leader
Outcome
After removing local Raft files and restarting:
- The node unseals successfully
- Vault rebuilds
vault.dbandraft.dbautomatically - The node joins as a follower
- Raft replication resumes
- The error no longer appears
If the node still cannot join:
- Verify network connectivity to peers
- Confirm other nodes are healthy
- Review disk space and file permissions
- Check logs for snapshot or RPC errors
If the issue persists, collect:
- Logs from the affected node
-
vault debugoutput - Raft directory listings from all nodes
and provide to HashiCorp Support.