Preface
The goal of this document is to provide a list of steps to take in order to collect enough data for initial troubleshooting of Consul API Gateway in the K8s environment.
What is Consul API Gateway
Consul API Gateway is a HashiCorp implementation of the open standard Kubernetes Gateway API. Its purpose is to serve as an entry point for ingress connections and route them towards pods inside the K8s cluster. The open standard defines several K8s objects that together constitute the configuration of the actual gateway. The most important ones are: Gateway, HTTPRoute/TCPRoute and ReferenceGrant. Gateway is responsible for configuration of the listener which faces ingress connections. HTTPRoute/TCPRoute is responsible for routing logic. ReferenceGrant is the construct that allows cross-namespace references, such as when Routes may forward traffic to backends in other namespaces.
The Consul API Gateway solution has two main components:
- api-gateway-controller deployment - created automatically with the rest of the Consul components (if enabled in the values.yaml);
- api-gateway deployment - created by api-gateway-controller based on the supplied Gateway configuration object.
api-gateway-controller should be located in the same namespace with the rest of the Consul pods.
api-gateway can be created in any namespace.
Troubleshooting
Based on the schema above, there are several places where useful data can be extracted from.
client
- Screenshot of the error: Please share the image/text of the error you see on the client side
consul-server
- Services configuration. This will require to exec inside the consul-server pod [1][2][3]:
kubectl exec -it -n <consul_namespace> <consul-server_pod> -- sh
curl localhost:8500/v1/catalog/services | jq
curl localhost:8500/v1/catalog/service/<API_GW_service> | jq
curl localhost:8500/v1/catalog/service/<Destination_App> | jq
curl localhost:8500/v1/config/service-router/<Destination_App> | jq
curl localhost:8500/v1/config/service-splitter/<Destination_App> | jq
curl localhost:8500/v1/config/service-resolver/<Destination_App> | jq
curl localhost:8500/v1/health/connect/<Destination_App> | jq
curl localhost:8500/v1/connect/intentions | jq
curl localhost:8500/v1/status/leader | jq # Note the leader IP
- Logs. By default running on INFO level which should be enough to see errors. Can be elevated to DEBUG with Helm chart configuration [4]. In order to see the logs please execute:
kubectl get pods -n <consul_namespace> -o wide | grep <consul_leader_IP> # Note the pod name
kubectl logs -n <consul_namespace> <consul_leader_pod_name>
api-gateway-controller
- Pod details.
kubectl describe pod -n <consul_namespace> <api_gateway_controller_pod_name>
- Logs. By default running on INFO level which should be enough to see errors. Can be elevated to DEBUG with Helm chart configuration [4]. In order to see the logs please execute:
kubectl logs -n <consul_namespace> <api_gateway_controller_pod_name>
api-gateway pod [5]
- Pod details.
kubectl describe pod -n <api_gateway_namespace> <api_gateway_pod_name>
- Proxy configuration and logs. This requires to enable port-forwarding in order to curl the proxy container directly:
kubectl port-forward -n <api_gateway_namespace> <api_gateway_pod_name> 19000:19000
In another terminal please run:
curl localhost:19000/config_dump
curl localhost:19000/clusters
curl -X POST 'localhost:19000/logging?level=debug'
kubectl logs -n <api_gateway_namespace> <api_gateway_pod_name> -c consul-api-gateway
curl -X POST 'localhost:19000/logging?level=info'
app pod
- Pod description.
kubectl describe pod -n <app_namespace> <app_pod_name>
- Proxy configuration and logs. This requires to enable port-forwarding in order to curl the proxy container directly:
kubectl port-forward -n <app_namespace> <app_pod_name> 19000:19000
In another terminal please run:
curl localhost:19000/config_dump
curl localhost:19000/clusters
curl -X POST 'localhost:19000/logging?level=debug'
kubectl logs -n <app_namespace> <app_pod_name> -c envoy-sidecar
curl -X POST 'localhost:19000/logging?level=info'
Kubernetes Gateway API objects
- Gateways.
kubectl describe gateway -n <api_gateway_namespace>
- HTTPRoutes or TCPRoutes.
kubectl describe httproutes -n <api_gateway_namespace>
kubectl describe httproutes -n <app_namespace>
OR
kubectl describe tcproutes -n <api_gateway_namespace>
kubectl describe tcproutes -n <app_namespace>
- ReferenceGrants.
kubectl describe referencegrants -n <app_namespace>
Other K8s objects
- CRDs.
kubectl get crd
- ServiceDefaults.
kubectl describe servicedefaults -n <api_gateway_namespace>
kubectl describe servicedefaults -n <app_namespace>
- ProxyDefaults.
kubectl describe proxydefaults -n <api_gateway_namespace>
kubectl describe proxydefaults -n <app_namespace>
Tcpdump (packet capture)
The ways how to collect packet capture on K8s cluster can vary depending on the environment and security policies applied.
- API Gateway. Tcpdump taken on the API GW pod should include all ingress and egress traffic. The ways how to accomplish this can vary depending on the environment and security policies applied.
Additional information
Apart from all the above, any data related to the setup configuration is also helpful. In particular:
- Setup details (on-prem k8s, RedShift, EKS etc);
- Kubernetes version;
- <values>.yaml used for Consul deployment;
- <Gateway>.yaml,<HTTPRoute>.yaml, <referenceGrant>.yaml.
- Network diagram
- HLD/Use case scenario
Important!
Please, collect Envoy logs and Tcpdump at the same time as generating failing/erroring requests, so the support team will be able to examine the processing flow.
________________
[1] In this article we assume that Consul UI is available on http://localhost:8500 (default). If you have a different configuration, please adjust the protocol/port accordingly.
[2] In the case ACLs are enabled, please add --header "X-Consul-Token: <ACL_token>" before trailing | jq to every curl command running on consul-server.
[3] In the case Consul Namespaces are enabled, please add --header "X-Consul-Namespace: <Namespace>" before trailing | jq to every curl command running on consul-server.
[4] Changing log-level on the consul-server or api-gateway-controller side requires pods re-creating which can be undesirable in production.
[5] The actual api-gateway pod will have a name matching the Gateway k8s object. The pods within that deployment will have a name prefixed with the deployment name.