Introduction
Problem
When using Envoy sidecar, some downstream applications may encounter an immediate 503 error when trying to connect to an upstream. The error message logged will show as no healthy host for HTTP connection pool
.
Prerequisites
-
Access to Envoy's logs and Admin Interface
- Review the article Accessing and Setting Envoy Logs for Consul for additional information
Cause
This issue arises due to Envoy's outlier detection feature. By default, when Envoy detects 5 consecutive 5xx errors from a host, it ejects that host from the load balancing pool. As a result, the host remains ejected for a duration of 30 seconds, which leads to the aforementioned 503 error, if there are no other healthy hosts available.
- Envoy's debug logs will contain the below message
[debug] envoy.upstream(x) host <ip:port> in cluster <upstream_cluster> was ejected by the outlier detector
- Envoy stats endpoint will contain this entry
<upstream_cluster>.outlier_detection.ejections_consecutive_5xx: <positive_number>
Solutions
-
Verify Upstream Host Health
- Ensure that the upstream host is healthy and responsive.
- Check for any issues or errors on the host that might be causing it to return 5xx errors.
-
Adjust Envoy Configuration with Consul Config Entries
- If you want to prevent hosts from being ejected due to consecutive 5xx errors, you may update the Consul Service Defaults config entry by setting
PassiveHealthCheck.EnforcingConsecutive5xx
to0
. - Example
- If you want to prevent hosts from being ejected due to consecutive 5xx errors, you may update the Consul Service Defaults config entry by setting
--- apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceDefaults metadata: name: <name> namespace: <namespace> spec: upstreamConfig: defaults: passiveHealthCheck: enforcingConsecutive5xx: 0
Outcome
By following the solutions mentioned above, users should be able to mitigate the immediate 503 errors caused by Envoy's outlier detection feature. This ensures smoother upstream connections without interruptions due to host ejections.
If the issue persists after following the above setup and troubleshooting, please contact HashiCorp support and provide us with all of the troubleshooting data you have collected.