Preface
The goal of this document is to provide a list of steps to take in order to collect enough data for initial troubleshooting of the Consul Terminating Gateway in K8s environment.
What is Consul Terminating Gateway
Terminating gateways are egress proxies that provide connectivity to external destinations by terminating mTLS connections, enforcing Consul intentions, and forwarding requests to appropriate destination services
Consul Terminating Gateway K8s deployment consists of two main components:
- terminating-gateway container - a data plane, an Envoy proxy that serves connections;
- consul-sidecar container - an ancillary container responsible for a TGW service registration
Details on configuration Terminating Gateway can be found here: https://developer.hashicorp.com/consul/docs/k8s/connect/terminating-gateways
Troubleshooting
Based on the schema above, there are several places where useful data can be extracted from.
consul-server
- Services configuration. This will require to exec inside the consul-server pod [1][2]:
kubectl exec -it -n <consul_namespace> <any_consul-server_pod> – sh
curl -k http://0:8500/v1/catalog/service/<external_service> | jq
curl -k http://0:8500/v1/catalog/service/<TGW_service> | jq
curl -k http://0:8500/v1/catalog/service/nodes | jq
curl -k http://0:8500/v1/catalog/service/<internal_app> | jq
curl -k http://0:8500/v1/health/connect/<external_service> | jq
curl -k http://0:8500/v1/connect/intentions | jq
curl localhost:8500/v1/config/service-router/<external_service> | jq
curl localhost:8500/v1/config/service-splitter/<external_service> | jq
curl localhost:8500/v1/config/service-resolver/<external_service> | jq
- Logs. By default running at INFO level which should be enough to see errors. Can be elevated to DEBUG with Helm chart configuration [3]. Please, collect logs from ALL Consul servers in the cluster. In order to see the logs please execute:
kubectl logs -n <consul_namespace> <consul-server_pod>
terminating-gateway pod
- Pod description.
kubectl describe pod -n <consul_namespace> <terminating_gateway_pod_name>
- Proxy configuration and logs. This requires to enable port-forwarding in order to curl the proxy container directly:
kubectl port-forward -n <consul_namespace> <terminating_gateway_pod_name> 19000:19000
In another terminal please run:
curl localhost:19000/config_dump
curl localhost:19000/clusters
curl -X POST localhost:19000/logging?level=debug
kubectl logs -n <consul_namespace> <terminating_gateway_pod_name> -c terminating-gateway
curl -X POST localhost:19000/logging?level=info
internal-app pod
- Error. Please, share the error message that the internal app throws.
- Pod description.
kubectl describe pod -n <internal_app_namespace> <internal_app_pod_name>
- Proxy configuration and clusters dump. This requires to enable port-forwarding in order to curl the proxy container directly:
kubectl port-forward -n <internal_app_namespace> <internal_app_pod_name> 19000:19000
In another terminal please run:
curl localhost:19000/config_dump
curl localhost:19000/clusters
curl -X POST localhost:19000/logging?level=debug
kubectl logs -n <internal_app_namespace> <internal_app_pod_name> -c envoy-sidecar
curl -X POST localhost:19000/logging?level=info
K8s Custom Resources
- CRDs.
kubectl get crd
- ServiceDefaults.
kubectl describe servicedefaults -n <consul_namespace>
kubectl describe servicedefaults -n <internal_app_namespace>
- ProxyDefaults.
kubectl describe proxydefaults -n <consul_namespace>
kubectl describe proxydefaults -n <internal_app_namespace>
Tcpdump (packet capture)
The ways to collect packet capture on K8s cluster can vary depending on the environment and security policies applied.
- Terminating Gateway. Tcpdump taken in the Terminating GW pod should include all ingress and egress traffic. The ways to accomplish this can vary depending on the environment and security policies applied.
- Internal app. Tcpdump taken in the internal app pod should include all ingress, egress and local (lo) traffic. The ways to accomplish this can vary depending on the environment and security policies applied.
Apart from all the above, any data related to the setup configuration is also helpful. In particular:
- Setup details (on-prem k8s, RedShift, EKS etc);
- Kubernetes version;
- <values>.yaml used for Consul deployment;
- <external-service>.json, <terminating-gw>.hcl .
- Network diagram
- HLD/Use case scenario
Important!
Please, collect Envoy logs and Tcpdump at the same time as generating failing/erroring requests, so the support team will be able to examine the processing flow.
________________
[1] In the example we assume that Consul UI is available on http://localhost:8500 (default). If you have a different configuration, please adjust the protocol/port accordingly.
[2] In the case ACLs are enabled, please add --header "X-Consul-Token: <ACL_token>" before trailing | jq to every curl command running on consul-server.
[3] In the case Consul Namespaces are enabled, please add --header "X-Consul-Namespace: <Namespace>" before trailing | jq to every curl command running on consul-server.
[4] Changing log-level on the consul-server or api-gateway-controller side requires pods re-creating which can be undesirable in production.