Introduction
This article walks you through the general troubleshooting steps for
This article walks you through the general troubleshooting steps for
proxy-defaults
or any other consul CRDs errors in `consul-controller` or `consul-connect-injector` logs.
Expected results
Changes in consul cluster on K8s due to the cluster upgrade to helm chart or making any configuration changes in
Changes in consul cluster on K8s due to the cluster upgrade to helm chart or making any configuration changes in
values.yaml
file should not lead the CRDs to throw errors in consul-controller/connect-injector logs.
For consul K8s cluster, config entries should be managed by Kubernetes itself. Also, config entries that already exists in Consul must be migrated into a Kubernetes custom resource in order to manage it from Kubernetes.
Troubleshooting
Scenario 1
Either during fresh cluster installation or upgrade of cluster, sometime we see that in controller logs, we can find following error for
CRD
as helm chart didn't populated correctly.
2023-09-10T09:12:59.781Z ERROR pkg/mod/k8s.io/client-go@v0.22.2/tools/cache/reflector.go:167: Failed to watch *v1alpha1.ProxyDefaults: failed to list *v1alpha1.ProxyDefaults: the server could not find the requested resource (get proxydefaults.consul.hashicorp.com) 2023-09-10T09:12:59.981Z "error": "no matches for kind \\"ProxyDefaults\\" in version \\"consul.hashicorp.com/v1alpha1\\""
Even if we try to list all CRDs managed by K8s, we might see CRD is missing like in below output
proxydefaults.consul.hashicorp.com
is missing.
NAME CREATED AT addons.k3s.cattle.io 2023-08-26T09:50:17Z helmcharts.helm.cattle.io 2023-08-26T09:50:17Z helmchartconfigs.helm.cattle.io 2023-08-26T09:50:17Z traefikservices.traefik.containo.us 2023-08-26T09:51:38Z ingressroutes.traefik.containo.us 2023-08-26T09:51:38Z middlewares.traefik.containo.us 2023-08-26T09:51:38Z tlsstores.traefik.containo.us 2023-08-26T09:51:38Z tlsoptions.traefik.containo.us 2023-08-26T09:51:38Z ingressroutetcps.traefik.containo.us 2023-08-26T09:51:38Z middlewaretcps.traefik.containo.us 2023-08-26T09:51:38Z ingressrouteudps.traefik.containo.us 2023-08-26T09:51:38Z serverstransports.traefik.containo.us 2023-08-26T09:51:38Z exportedservices.consul.hashicorp.com 2023-09-10T09:10:02Z ingressgateways.consul.hashicorp.com 2023-09-10T09:10:02Z meshes.consul.hashicorp.com 2023-09-10T09:10:02Z servicedefaults.consul.hashicorp.com 2023-09-10T09:10:03Z serviceintentions.consul.hashicorp.com 2023-09-10T09:10:03Z serviceresolvers.consul.hashicorp.com 2023-09-10T09:10:03Z servicerouters.consul.hashicorp.com 2023-09-10T09:10:03Z servicesplitters.consul.hashicorp.com 2023-09-10T09:10:03Z terminatinggateways.consul.hashicorp.com 2023-09-10T09:10:03Z
Solutions:
In order to resolve this, try to upgrade cluster using
helm upgrade consul hashicorp/consul --values <values_file> --version <version> ---wait --debug
, then we can find CRD proxydefaults
being present.
NAME CREATED AT addons.k3s.cattle.io 2023-08-26T09:50:17Z helmcharts.helm.cattle.io 2023-08-26T09:50:17Z helmchartconfigs.helm.cattle.io 2023-08-26T09:50:17Z traefikservices.traefik.containo.us 2023-08-26T09:51:38Z ingressroutes.traefik.containo.us 2023-08-26T09:51:38Z middlewares.traefik.containo.us 2023-08-26T09:51:38Z tlsstores.traefik.containo.us 2023-08-26T09:51:38Z tlsoptions.traefik.containo.us 2023-08-26T09:51:38Z ingressroutetcps.traefik.containo.us 2023-08-26T09:51:38Z middlewaretcps.traefik.containo.us 2023-08-26T09:51:38Z ingressrouteudps.traefik.containo.us 2023-08-26T09:51:38Z serverstransports.traefik.containo.us 2023-08-26T09:51:38Z exportedservices.consul.hashicorp.com 2023-09-10T09:10:02Z ingressgateways.consul.hashicorp.com 2023-09-10T09:10:02Z meshes.consul.hashicorp.com 2023-09-10T09:10:02Z servicedefaults.consul.hashicorp.com 2023-09-10T09:10:03Z serviceintentions.consul.hashicorp.com 2023-09-10T09:10:03Z serviceresolvers.consul.hashicorp.com 2023-09-10T09:10:03Z servicerouters.consul.hashicorp.com 2023-09-10T09:10:03Z servicesplitters.consul.hashicorp.com 2023-09-10T09:10:03Z terminatinggateways.consul.hashicorp.com 2023-09-10T09:10:03Z proxydefaults.consul.hashicorp.com 2023-09-10T09:16:26Z
Scenario 2
After upgrading cluster to helm chart >=
1.0.0
, you might see following error message in connect-injector logs (since consul-controller functionalities have been merged with connect-injector itself).
ERROR controller.proxydefaults sync status unknown {"request": "default/global", "error": "updating config entry in consul: Unexpected response code: 403 (Permission denied: token with AccessorID '4ab33d4c-a69c-0d6b-62ad-ce03f6631fd8' lacks permission 'mesh:write')"} ERROR Reconciler error {"controller": "proxydefaults", "controllerGroup": "consul.hashicorp.com", "controllerKind": "ProxyDefaults", "ProxyDefaults": {"name":"global","namespace":"default"}, "namespace": "default", "name": "global", "reconcileID": "ced19f2c-4662-4e97-9180-837bf249bc71", "error": "updating config entry in consul: Unexpected response code: 403 (Permission denied: token with AccessorID '4ab33d4c-a69c-0d6b-62ad-ce03f6631fd8' lacks permission 'mesh:write')"} ERROR Reconciler error {"controller": "proxydefaults", "controllerGroup": "consul.hashicorp.com", "controllerKind": "ProxyDefaults", "ProxyDefaults": {"name":"global","namespace":"default"}, "namespace": "default", "name": "global", "reconcileID": "841ec3cf-141e-4e3d-92ef-7bcf276b61fb", "error": "Operation cannot be fulfilled on proxydefaults.consul.hashicorp.com \"global\": the object has been modified; please apply your changes to the latest version and try again"}
Also, upon upgrade status of CRD could show state as
Unknown
k get proxydefaults NAME SYNCED LAST SYNCED AGE global Unknown 0s 134m
Solutions:
- If ACLs are enabled, outdated ACL tokens will persist a result of the upgrade. You can manually delete the tokens to declutter your Consul environment.
Outdated connect-injector tokens have the following description:
You can also review the creation date for the tokens and only delete the injector tokens created before your upgrade, but do not delete all old tokens without considering if they are still in use. Some tokens, such as the server tokens, are still necessary.
token created via login: {"component":"connect-injector"}
. If you perform upgrade to helm chart >=1.0.0, do not delete the tokens that have a description where pod
is a key, for example token created via login: {"component":"connect-injector","pod":"default/consul-connect-injector-576b65747c-9547x"}
). The dataplane-enabled connect inject pods use these tokens.You can also review the creation date for the tokens and only delete the injector tokens created before your upgrade, but do not delete all old tokens without considering if they are still in use. Some tokens, such as the server tokens, are still necessary.
- You may try to perform rolling restart on the `connect-injector` ( for chart >=1.0.0) and `controller` (for chart <1.0.0), so that new ACL token will be generated with proper privileges/permission.
- You might also see above error if
server-acl-init-job
didn't complete successfully, which won't let all server pods to upgrade successfully and resulted into pods like controller, connect-injector etc. to be incrashloopbackoff
as rotation of token didn't take effect. Hereby, please validate server upgrade should be smooth.
Scenario 3
Apart from upgrade, we could see that controller/connect-injector could throw following error
config entry already exists in Consul
even if we list CRD we won't get it in K8s, that means config entry is being present in Consul itself and won't be managed by K8s at present.
ERROR controller.proxydefaults Reconciler error {"reconciler group": "consul.hashicorp.com", "reconciler kind": "ProxyDefaults", "name": "global", " namespace": "default", "error": "config entry already exists in Consul"}
$ kubectl get proxydefaults No resources found
Solutions
- Restart the controller pod or connect-injector pod ( in case chart >=1.0.0).
- Grab the config-entry like
proxy-defaults
here from consul-server usingconsul config read -kind proxy-defaults -name <proxy-default-name>
and make sure the value is entered into Kubernetes yaml for proxy-defaults by creating K8s construct for config-entry and ensure we are passing following annotations to the construct.
annotations: 'consul.hashicorp.com/migrate-entry': 'true'
- Follow the controller logs to see if there any
ERRORS
. - Run
kubectl get
proxydefaults.consul.hashicorp.com
to confirm the entry is inSYNC
- For detailed instruction on above steps, please refer to the link.
$ k get proxydefaults NAME SYNCED LAST SYNCED AGE global True 58s 30m
Conclusion
By following the above troubleshooting steps for different scenarios, you should be able to resolve these errors related to CRDs in consul-k8s setup, however there could be other scenarios as well, but most of the errors lies in the purview of cluster upgrade where change in chart could lead to functionalities changes which might throw errors like highlighted above. These article troubleshooting could be applied to all CRDs and config-entry (in case of its migration to K8s).
By following the above troubleshooting steps for different scenarios, you should be able to resolve these errors related to CRDs in consul-k8s setup, however there could be other scenarios as well, but most of the errors lies in the purview of cluster upgrade where change in chart could lead to functionalities changes which might throw errors like highlighted above. These article troubleshooting could be applied to all CRDs and config-entry (in case of its migration to K8s).
Additional References