Vault Raft Join Fails With: “node has been removed from the HA cluster. All Vault data must be cleaned up before it can be added back” – HashiCorp Help Center

The information contained in this article has been verified as up‑to‑date on the date of the original publication of the article. HashiCorp endeavors to keep this information up‑to‑date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.

All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.

Introduction

This article addresses a specific error that occurs when attempting to rejoin a Vault Enterprise Integrated Storage (Raft) node to a cluster. The error indicates the node was previously removed from Raft membership, and its local Raft state is no longer valid. This prevents the node from joining until its local data is cleaned up. This article explains the cause, includes a reproducible test example, and provides the remediation steps.

Problem

When running:

vault operator raft join <leader-address>

Vault returns the following error:

Error joining the node to the Raft cluster: Error making API request.

URL: POST http://127.0.0.1:8200/v1/sys/storage/raft/join
Code: 500. Errors:

* node has been removed from the HA cluster. All vault data for this node must be cleaned up before it can be added back

As a result:

The node fails to join Raft
The node remains sealed
The node does not appear in vault operator raft list-peers
Logs contain Raft membership failures

Prerequisites

This troubleshooting information applies to:

Vault Enterprise
Integrated Storage (Raft) deployments
Clusters using Shamir or auto-unseal
Environments where operators have:
- OS‑level access to the affected node
- Access to Vault logs
- Ability to restart Vault
- Ability to inspect /vault/data or /opt/vault/data

Cause

This error occurs when the node was previously removed from the Raft cluster. Vault will refuse to allow a removed node to rejoin if it still has stale or mismatched Raft metadata.

The most common causes are:

1. Manual Peer Removal

Example:

vault operator raft remove-peer vault1

2. Autopilot Pruning

This can be triggered by, but is not limited to:

Consecutive health check failures
Long downtime
Network isolation
Unresponsive Raft engine

3. Local Raft State Corruption

Including vault.db or raft.db corruption.

4. Node Restart With Outdated Raft Metadata

When any of these occur, the node’s local Raft state:

No longer matches the leader
Cannot participate in Raft consensus
Cannot be reused for joining

Thus Vault requires complete removal of the node’s local Raft data.

Solution: Remove the node’s outdated local Raft state so Vault can rebuild it from healthy peers.

Before applying the solution, operators may optionally:

Review the logs
Confirm cluster health
Reproduce the issue for testing

Step 1: Stop the Vault Service

Note: Adjust commands for your platform (systemd vs Kubernetes, etc.)

systemctl stop vault

Step 2: Back Up Existing Raft Data

Before beginning this step, ensure you have a recent snapshot of the Vault leader node.
This provides an additional safeguard in case restoration is needed after clearing and rebuilding Raft state.

Please note that this step is optional and may not be required depending on your use case. It creates a temporary backup, and once the process is complete and verified, you can safely remove the backup files generated during this step. Please also be mindful of disk space availability, as the size of these backup files may introduce storage constraints on the node.

Stop Vault on the node

sudo systemctl stop vault

Create file-level backups of vault.db and raft.db

cd /opt/vault/data

cp vault.db vault.db.bak

cp raft/raft.db raft.db.bak

Why this step matters

These .db files will be removed in the next step to clear outdated Raft state.
Backups provide a rollback option if the node cannot rejoin the cluster.
The example uses cp (copy), but administrators may choose alternative methods such as the move command according to internal practices.

Step 3: Remove the Outdated Raft Files

With Vault stopped and backups created, remove the outdated Raft metadata:

cd /opt/vault/data

rm -f vault.db

rm -f raft/raft.db

This clears old Raft state so the node can rebuild from the leader upon restart.

Step 4: Restart Vault

systemctl start vault

Step 5: Rejoin the Cluster

vault operator raft join http://<leader>:8200

Expected result:

Joined    true

Step 6: Unseal (If Using Shamir)

vault operator unseal

Repeat until the threshold is met.

Step 7: Validate Cluster State

vault status

vault operator raft list-peers

Expected:

Removed From Cluster: false
Node appears as a standby/follower node
Raft commit/apply indices converge with the leader

Outcome

After removing local Raft files and restarting:

The node unseals successfully
Vault rebuilds vault.db and raft.db automatically
The node joins as a follower
Raft replication resumes
The error no longer appears

If the node still cannot join:

Verify network connectivity to peers
Confirm other nodes are healthy
Review disk space and file permissions
Check logs for snapshot or RPC errors

If the issue persists, collect:

Logs from the affected node
vault debug output
Raft directory listings from all nodes

and provide to HashiCorp Support.

Additional Information

HashiCorp Documentation

HashiCorp Support & Knowledge Base Articles