Background
There might be a possibility while investigating the communication issues for any known or unknown reason among the cluster nodes, you may end up seeing the following errors all over in the Vault operational logs in large numbers:
[ERROR] storage.raft.autopilot: UpgradeVersionTag not found in server metadata:
id=REDACTED name="" address=REDACTED upgrade_version_tag=upgrade_version
Preceded by:
[INFO] storage.raft: starting autopilot:
config="&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}"
reconcile_interval=0s
Pre-requisite
- Vault Enterprise v1.11.x - Latest
Explanation/Cause
This entry is logged when the communication among the cluster nodes is impacted, especially when a leader node does not hear from its follower because the "id" in the log corresponds to the follower node's identity and actually originates from the raft-autopilot-enterprise
library. This is only emitted when you don’t have the autopilot_upgrade_version
key specified in the Vault storage config.
autopilot_upgrade_version
is an optional string that, if provided, will be used to report to autopilot as Vault's version. This is then used by autopilot when it makes decisions regarding automated upgrades and if omitted, the version of Vault currently in use will be used.
The preceded log entry that you see is because of the autopilot_reconcile_interval
which is the interval after which autopilot picks up any state changes.
Workaround
However, these are harmless logs but the workaround is to disable the "Upgrade Migration" in the autopilot config via the below command to stop the noise until there is any engineering development on this one.
vault operator raft autopilot set-config -disable-upgrade-migration=true
References