Introduction
This article highlights the occurrence of a panic
condition on Vault standby nodes.
The defect is already logged and a fix is included in Vault 1.12.3 and after.
Problem
Vault standby nodes panic
when an irrevocable lease gets deleted.
Prerequisites
- Vault Enterprise 1.12.0 - 1.12.2
- HCP Vault 1.12.0 - 1.12.2
Cause
Due to a known defect in the versions listed above, Vault standby nodes panic
during the invalidation of a lease that has already been flagged as irrevocable on the leader node.
The following is an example of the error that is reported in the operational logs of the standby nodes:
panic: interface conversion: interface {} is *vault.leaseEntry, not vault.pendingInfo
goroutine 21289 [running]:
github.com/hashicorp/vault/vault.(*ExpirationManager).invalidate(0xc026e801e0, {0xc00fdf619c, 0x3e})
/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/expiration.go:496 +0x8b7
github.com/hashicorp/vault/vault.entSysInvalidate.func1({0x70148d8, 0xc00e4437d0}, {0xc00fdf6195, 0x45})
/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/logical_system_helpers_ent.go:493 +0x4ca
github.com/hashicorp/vault/sdk/framework.(*Backend).InvalidateKey(0xc0013fbcc0?, {0x70148d8?, 0xc00e4437d0?}, {0xc00fdf6195?, 0x4?})
/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/sdk/framework/backend.go:393 +0x3e
github.com/hashicorp/vault/vault.(*Core).asyncInvalidateKey(0xc000a09b00, {0x7014830, 0xc017a04e80}, {0xc00fdf6180, 0x5a})
/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/replication_invalidation_ent.go:58 +0x30a
github.com/hashicorp/vault/vault.(*Core).asyncInvalidateHandler(0xc000a09b00, {0x7014830, 0xc017a04e80}, 0xc009356de0, 0xc00a3fe4e0)
/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/replication_invalidation_ent.go:71 +0x1d7
github.com/hashicorp/vault/vault.(*Core).waitForPerfStandby.func3()
/home/runner/actions-runner/_work/vault-enterprise/vault-enterprise/vault/ha_ent.go:132 +0x13b
github.com/oklog/run.(*Group).Run.func1({0xc007cd2660?, 0x67574a8?})
/home/runner/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38 +0x2f
created by github.com/oklog/run.(*Group).Run
/home/runner/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:37 +0x22a
Overview of possible solutions
A restart
of the Vault service might be necessary on the standby nodes if there is still an active/leader in the cluster.
Cluster lost quorum recovery might have to be performed if the cluster no longer has a leader (applies only in the case of Raft storage backend).
Outcome
After Vault has started, and a leader successfully elected, operations should return to normal provided no new irrevocable leases show up.