Issue
When deploying Consul on Kubernetes, users may encounter errors related to Consul Connect Injector pods and existing API Gateway pods. The Consul Connect Injector pod may throw an error "Unexpected response code: 500 (Invalid Role: A Role with Name "managed-gateway-acl-role-api-gateway" already exists)", and the existing API Gateway pod may show "[INFO] Unable to find registered gateway; retrying".
Environment
- Consul version: 1.15.x and later
- Kubernetes version: Please check compatibility with this URL - https://developer.hashicorp.com/consul/docs/k8s/compatibility
Cause
The error occurs due to a conflict with existing roles in Consul when the Consul Connect Injector pod is restarted. The existing API Gateway pod error indicates that the gateway is unable to find the registered services, potentially due to the Consul Connect Injector pod issue.
Resolution
-
Temporary Workaround:
For the Consul server relying on VM and other components on Kubernetes or for the complete Consul cluster on Kubernetes, for both cases manually delete the existing role "managed-gateway-acl-role-api-gateway" in Consul before restarting the Consul Connect Injector pod. This can be done using the Consul CLI or API. -
Permanent Fix
This issue has been permanently fixed in Consul version 1.18.2 (consul-K8s version 1.4.2). Upgrading to these versions resolves the conflict with existing roles and ensures the smooth operation of the Consul Connect Injector pod and API Gateway pod.-
Upgrade Steps:
- Ensure you have a backup of your current configuration and data.
- Follow the upgrade instructions provided in the Consul documentation - https://developer.hashicorp.com/consul/docs/upgrading.
-
Upgrade Steps:
Impact
- Users may experience connectivity issues between services using Consul Connect.
- Restarting the Consul Connect Injector pod may lead to disruptions in service discovery.
Steps to Reproduce
- Deploy Consul cluster on Kubernetes.
- Have Consul servers rely on VM machines, if applicable.
- Restart Consul Connect Injector pod.
- Observe the below error in connect injector pod logs -"Unexpected response code: 500 (Invalid Role: A Role with Name \"managed-gateway-acl-role-api-gateway\" already exists)"}
- Observe errors in the API gateway pod - [INFO] Unable to find registered gateway; retrying
Disclaimer
-
For customers running OpenShift, consul engineering doesn't currently want to make the recommendation to upgrade to:
- hashicorp/consul-enterprise:1.18.2-ent-ubi
- hashicorp/consul-k8s-control-plane:1.4.2
- hashicorp/consul-dataplane:1.4.2
Conclusion
To resolve this issue permanently, upgrade to Consul version 1.18.2 (consul-K8s version 1.4.2). If an immediate upgrade is not possible, follow the temporary workaround provided.