The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Introduction
This guide demonstrates how to use cloud
auto-join
(Consul version 1.14.x and above) instead of a LoadBalancer for a Consul cluster on Kubernetes to connect with external servers hosted on VMs. Cloud auto-join
provides fault tolerance similar to a LoadBalancer by selecting nodes with specific tags in a random sequence, utilizing the go-netaddr
library.
In earlier Consul versions (before 1.14.x), cloud
auto-join
used the go-addr
library, which selected nodes sequentially. If the first server node was down, this would cause the entire Kubernetes cluster components installation to fail.
Use-Case
In Consul version 1.14.x, the client agent has been removed from Consul K8s. When Consul servers run outside the Kubernetes cluster, it has been observed that the
consul-dataplane
container (application sidecar) only has a single IP address.
... - args: - -addresses - 10.162.34.52 - -grpc-port=8503 - -proxy-service-id-path=/consul/connect-inject/proxyid ...
It appears from the output that the
consul-dataplane
container simply takes the first IP address listed in the Helm values.yaml
file.... externalServers: enabled: true hosts: ["10.162.34.52","10.162.34.53","10.162.34.54"] ...
What happens if cluster nodes defined in
consul-dataplane
arg will fail? How this will be handled and what impact it will have for the meshed application?Possible Solutions
-
[Recommended] We can add these cluster nodes behind LoadBalancer and use its IP/DNS in
externalServers.hosts
in the helm chart. It provides benefits like health probes which means that request will always be sent to heathy node where consul service is running.- However, use of LoadBalancer also required to configure
core-dns
through itsconfigMap
to forward consul domain query to the LB endpoint which covers consul server agent VMs.
- However, use of LoadBalancer also required to configure
- For the use case above, starting with Consul version 1.14.x, we can leverage cloud
auto-join
to achieve similar functionality to a LoadBalancer. Cloud `auto-join` will randomly select a Consul server agent and continue attempting to connect with other nodes if the initially selected node is down.
Procedure
In a lab setup on AWS
- Consul Version
1.13.9
and helm chart0.49.8
- 3 Consul Servers on EC2 VMs
- Clients & other components (like controller, mesh-gateway etc.) on EKS cluster.
-
Instead of creating LB, we used cloud
auto-join
like below invalues.yaml
to let servers hosted on EC2 discover each other with sametag_key
andtag_values
.- Please ensure proper networking and routing is intact between EKS cluster and EC2 instances.
... externalServers: enabled: true hosts: - 'provider=aws tag_key=Server tag_value=true' k8sAuthMethodHost: 'https://16E7A4DAC528AE39C477031B4732DF12.gr7.ap-south-1.eks.amazonaws.com' #Address of the Kubernetes API server client: enabled: true join: - 'provider=aws tag_key=Server tag_value=true' exposeGossipPorts: true ...
With above
1.13.x
version setup, consul-clients were using cloud auto-join
using go-addr
where it picked server from externalServer.hosts
sequentially and if first server node is down, then it would result into failure of entire K8s cluster components installation.However, starting with version
1.14.x
, since there is no dependency on Consul clients, cloud auto-join
utilizes the go-netaddr
library. This library discovers addresses in externalServer.hosts
in a random sequence, providing the IP address of a functioning Consul server.
In order to test that, upgrade cluster to
1.14.9
using guides for Server upgrade on VM and K8s Cluster upgrade to Dataplane. Then, modify the cloud auto-join
library from go-addr
to go-netaddr
... externalServers: enabled: true hosts: - 'exec=discover -q addrs provider=aws tag_key=Server tag_value=true' k8sAuthMethodHost: 'https://16E7A4DAC528AE39C477031B4732DF12.gr7.ap-south-1.eks.amazonaws.com' #Address of the Kubernetes API server client: enabled: true join: - 'exec=discover -q addrs provider=aws tag_key=Server tag_value=true' exposeGossipPorts: true ...
With
1.14.x
setup we can test that when we stop the first server instances out of 3, and re-perform a fresh installation of consul-k8s
cluster, then the K8s component, having consul-dataplane
, discovered the other server instances and entire cluster installation was successful.
... consul-dataplane: Container ID: containerd://080705a71b7e9784ccbde18cd6762c2fe9e6c5c757ab02b65c8bad2035f16b82 Image: hashicorp/consul-dataplane:1.0.5 Image ID: docker.io/hashicorp/consul-dataplane@sha256:b5a7e0f22dec65a90d2c3aff338c661e04d423c0060e77887ad5759f7e2b7b6b Port: <none> Host Port: <none> Args: -addresses exec=discover -q addrs provider=aws tag_key=Server tag_value=true -grpc-port=8502 -proxy-service-id-path=/consul/connect-inject/proxyid -log-level=info ...
Reference