Kubernetes Service Pods Stuck in Init:CrashLoopBackOff Due to Traffic Redirection Errors – HashiCorp Help Center

The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.

All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.

Introduction

This article describes an issue where Kubernetes service pods get stuck in the Init:CrashLoopBackOff state due to errors related to applying traffic redirection rules. This often occurs when integrating Kubernetes with external Consul servers and using IPVS mode for kube-proxy.

Problem

When deploying services within a Kubernetes cluster integrated with external Consul servers, pods may fail to start and remain in the Init:CrashLoopBackOff state. The init container logs will likely show errors in applying traffic redirection rules. This prevents the service from becoming available and functioning correctly.

The Consul Kubernetes components (webhook/injector) appear to be running:

default        consul-connect-injector-5645d5b8f4-dq4cx      1/1     Running      0                74m
default        consul-webhook-cert-manager-d445986b4-z9m92   1/1     Running      0                74m

When a new service pod is deployed/registered, the volumes get mounted correctly but the pod itself goes into a Init:CrashLoopBackOff state
In the service pod log, the following error is present

2023-10-10T16:30:16.102Z [INFO] Registered service has been detected: service=static-server-sidecar-proxy
2023-10-10T16:30:16.173Z [ERROR] error applying traffic redirection rules:
err=
| failed to run command: /sbin/iptables -t nat -N CONSUL_PROXY_INBOUND, err: exit status 3, output: modprobe: can't change directory to '/lib/modules': No such file or directory
| iptables v1.8.8 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
| Perhaps iptables or your kernel needs to be upgraded.

The firewalld has been disabled and the error only occurs when Consul is installed in Kubernetes.
The external Consul UI does display the service but is marked as failing all service checks.

Environmental Information

The error in this article was identified within the specified setup; nonetheless, it's worth noting that the error may persist in alternative configurations.

Consul Server 1.15.3+ent on a Virtual machine running Alma Linux 8
Kubernetes Cluster Consul-k8s 1.1.2. 1xController + 2xWorkers
Proxy mode IPVS (not ip_tables)

Cause

The root cause of this issue is a conflict between Consul's transparent proxy feature and the use of IPVS mode for kube-proxy in Kubernetes. Consul's transparent proxy relies on iptables rules to intercept and redirect traffic. However, when kube-proxy is running in IPVS mode, iptables rules are not used for service routing. IPVS uses its own mechanism, bypassing iptables and thus preventing Consul's transparent proxy from functioning as expected. This conflict leads to the "error applying traffic redirection rule" errors and the subsequent CrashLoopBackOff state.

Overview of possible solutions

Solution

The solution is to disable Consul's transparent proxy feature. This can be achieved by setting the connectInject.transparentProxy.defaultEnabled configuration option to false. This setting prevents Consul from attempting to use iptables-based redirection, resolving the conflict with IPVS.

Steps:

Modify Consul Configuration: Update your Consul configuration to include the following:

connectInject:
  transparentProxy:
    defaultEnabled: false

Apply Configuration Change: Apply the updated Consul configuration. The exact method for this will depend on how you manage your Consul deployment (e.g., using consul config set, configuration files, or Helm charts).
Redeploy Affected Pods: After applying the configuration change, redeploy the Kubernetes pods that were previously stuck in the Init:CrashLoopBackOff state. This will allow them to start correctly without encountering the traffic redirection errors.

Verification

After redeploying the pods, verify that they are running correctly. You can check the pod status using kubectl get pods and examine the logs of the init container to ensure there are no further errors. Additionally, check the Consul UI to confirm that the service is registered and healthy, with all health checks passing.

Outcome

Following the steps in this article, you should expect the following outcome:

Resolved CrashLoopBackOff: The Init:CrashLoopBackOff state for the affected pods will be resolved.
Functional Service: The service will be functional and accessible within the cluster.
Consul Confirmation: The Consul UI will display the service as running and healthy.

If, after implementing the solution, you still encounter issues:

Double-check Configuration: Verify that the connectInject.transparentProxy.defaultEnabled setting is correctly set to false in your Consul configuration.
Examine Pod Logs: Use kubectl logs <pod-name> to check for any remaining errors in the pod's logs.
Consul Logs: Review the Consul server logs for any related error messages.
Network Connectivity: Ensure that there are no network connectivity issues between the Kubernetes nodes and the Consul servers.

Additional Information

HashiCorp Discuss board reference: How can I configure Consul for k8S in IPVS mode
Transparent proxy overview