How do I resolve `error="remote error: tls: unrecognized name"` error in Vault server logs?
Introduction
Problem
Error in Vault leader node logs:
`[ERROR] storage.raft: failed to make requestVote RPC: target="{Voter vault01.local 10.10.10.55:8201}" error="remote error: tls: unrecognized name"`
Error in Vault standby node logs:
`Oct 29 08:52:24 server02.candycorn.com vault[1463]: 2021-10-29T08:52:24.587-0500 [ERROR] core: failed to elect as performance standby: error="rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing x509: certificate is valid for fw-af2d1a66-299c-a64d-29df-94472147552c, not fw-892fd277-411a-250d-49ef-01a5793ffce1""`
The combination of these two messages indicates that the raft database on `server02` is corrupt and will need to be cleaned and the node restarted. To resolve this issue use the following steps:
NOTE: please check you are performing below steps on correct node
- <List of which products, product versions, features, plug-ins, or environment this how-to applies to>
Solutions:
-
Stop Vault on the server node that is unable to rejoin the cluster.
-
systemctl stop vault
-
-
Clean raft directory on the stopped node.
-
#login as root
cd /vault/data/
ls -l
rm -rf *
# check if all files/directories are removed
ls -l
-
-
Restart Vault on the stopped node
-
systemcel start vault
-
The Vault node should be able to rejoin the cluster, depending on your configuration you may need to re-add it manually. You can verify that the node has rejoined the cluster by running `vault status` command on the server you just cleared and compare `Raft Committed Index`, `Raft Applied Index` from `vault status` output from other nodes. It should be same or very close on all nodes.
Summary
-
Deleting the data directory will delete the bolt.db that Vault uses for raft. After Vault re-joins the cluster the database will be re-populated.