The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Problem
Application pods in Consul Kubernetes are unable to start and timed out with CrashLoopBackOff
.
Container consul-connect-inject-init
is filled with the following logs
2023-07-12T05:22:33.744Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:34.748Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:35.754Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:36.759Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:37.764Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:38.768Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:39.789Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:40.797Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:41.814Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:42.818Z [INFO] Unable to find registered services; retrying
2023-07-12T05:22:42.818Z [INFO] Check to ensure a Kubernetes service has been created for this application. If your pod is not starting also check the connect-inject deployment logs.
consul-connect-injector
pod log shows the following error messages:
2023-07-12T06:16:05.815Z INFO controller.endpoints registering service with Consul {"name": "<service>", "id": "<service>-6ddbc5d9b-lt69p-<service>", "agentIP": "1.2.3.4"}
2023-07-12T06:16:05.819Z ERROR controller.endpoints failed to register service {"name": "<service>", "error": "Unexpected response code: 403 (ACL not found)"}
...
2023-07-12T06:16:05.819Z ERROR controller.endpoints failed to register services or health check {"name": "<service>", "ns": "default", "error": "Unexpected response code: 403 (ACL not found)"}
...
2023-07-12T06:16:05.908Z ERROR controller.endpoints failed to get service instances {"name": "<service>", "error": "Unexpected response code: 403 (ACL not found)"}
...
2023-07-12T06:16:05.908Z ERROR controller.endpoints failed to deregister endpoints on all agents {"name": "<service>", "ns": "default", "error": "Unexpected response code: 403 (ACL not found)"}
Cause
consul-connect-injector
is using a non-existent ACL token to register services. One potential cause is due to bug [0.49.x] remove livenessProbe from pods with preStop lifecycle hooks that delete ACL tokens #1914 affecting consul-k8s
releases before v0.49.5
The container consul-connect-inject-init
is waiting for 2 services: application
and sidecar-proxy
to be registered to Consul to complete the Connect initialization process. The service registration is done with the pod consul-connect-injector
and fails with the error 403 (ACL not found)
. This can happen when consul-connect-injector
liveness probe fails, which triggers the preStop
hook to call consul logout
. That call deletes the ACL token associated with the pod, which prevents the pod from reconciling the services.
Solution
- Upgrade
consul-k8s
to a release≥ v0.49.5
. The bug fix changelog is:
control-plane: fix issue where consul-connect-injector acl token was unintentionally being deleted and not recreated when a container was restarted due to a livenessProbe failure. [GH-1914]
- If unable to upgrade
consul-k8s
, apply the below immediate fix by restartingconsul-connect-injector
deployment
kubectl -n <namespace> rollout restart deployment consul-connect-injector
Verification
- After the restart, the errors should no longer appear in the
consul-connect-injector
pod's logs like in the below example:
2023-07-12T07:19:51.456Z INFO controller.endpoints registering service with Consul {"name": "<service>", "id": "<service-id>", "agentIP": "1.2.3.4"}
2023-07-12T07:19:51.459Z INFO controller.endpoints registering proxy service with Consul {"name": "<serivce>-sidecar-proxy"}
- The application pods should show that it started successfully and the
consul-connect-inject-init
logs should show the following lines:
2023-07-12T07:19:28.965Z [INFO] Consul login complete
2023-07-12T07:19:28.966Z [INFO] Checking that the ACL token exists when reading it in the stale consistency mode
2023-07-12T07:19:28.969Z [INFO] Successfully read ACL token from the server
2023-07-12T07:19:28.972Z [INFO] Registered service has been detected: service=<service>-sidecar-proxy
2023-07-12T07:19:28.972Z [INFO] Registered service has been detected: service=<service>
2023-07-12T07:19:28.973Z [INFO] Connect initialization completed
Resources
- CHANGELOG 0.49.5 (March 9, 2023)
- [0.49.x] remove livenessProbe from pods with preStop lifecycle hooks that delete ACL tokens #1914