Preface:
There has been a critical issue discovered when consul-dataplane instances/containers are not able to populate upstream endpoints.
Affected releases:
1.16.0
1.16.1
The issue affects VM and K8s deployments.
Bug:
Trigger:
Snapshot restoration on a fresh Consul cluster
Symptom:
consul-dataplane instances created after the snapshot restore procedure, may not have upstream endpoints populated. Restart of the consul-dataplane instance does not help.
Example:
Description:
Service mesh in Consul versions 1.16.0 and 1.16.1 may have issues when a snapshot restore is performed and the servers are hosting xDS streams. When this bug triggers, it will cause Envoy to incorrectly populate upstream endpoints. Due to this issue, it is currently not recommended for service mesh users running agent-less workloads to upgrade Consul to these versions.
Fixed version:
1.16.2
Recovery:
Servers restart appears to be helpful on k8s and VMs. After it endpoints get populated.
If restart doesn't help, please escalate to backline or service-mesh engineering.