Problem
When upgrading Consul servers in Kubernetes from a Consul version <1.14 to >=1.14 (consul-k8s >= 1.0.0), the following errors appear in the connect-injector and mesh-gateway pods:
[ERROR] consul-server-connection-manager: ACL auth method login failed: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp <ip>:8502: connect: connection refused\""
[ERROR] consul-server-connection-manager: connection error: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp <ip>:8502: connect: connection refused\""
Cause
The issue stems from changes to the gRPC TLS configuration in Consul 1.14.x. These modifications lead to changes in the Consul Server configuration within Kubernetes.
Existing Consul servers continue to listen on port 8503 for gRPC, while newly upgraded Consul servers listen on port 8502.
Before the new servers stabilize, upgraded pods for mesh-gateway and connect-injector attempt to communicate with Consul servers via port 8502, which results in a connection refused error.
Solution
Wait until the new Consul server stabilizes. The consul-server-connection-manager component in the mesh-gateway and connect-injector will continue to switch servers until it establishes a successful connection to a server on port 8502.
Note
If the new Consul server is unstable, diagnose the cause and make the necessary corrections. For instance, if the Consul server pod is stuck in a CrashLoopBackoff state due to an OOMKilled event, consider increasing the resources spec.
NOTE: DO NOT decrease `server.updatePartition` until the new Consul server pod is operational. Undertaking this action prematurely could jeopardize the cluster's quorum and bring the cluster down.
Outcome
All new consul-k8s components will upgrade successfully.
Next Steps
Should the issue continue after following the steps outlined above, please reach out to HashiCorp support, and provide us with the following information:
kubectl -n <namespace> describe sts/consul-server
kubectl -n <namespace> logs deploy/consul-mesh-gateway -c mesh-gateway-init
kubectl -n <namespace> logs deploy/consul-connect-injector