The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Audience: Platform engineers and SREs running Consul Enterprise on Kubernetes
Purpose: Show three supported methods to send traffic to a healthy peer when a service becomes unavailable in the local cluster.
Introduction
This article outlines three distinct methods for platform engineers and SREs operating Consul Enterprise on Kubernetes to ensure traffic is directed to a healthy peer when a service in the local cluster experiences unavailability. By leveraging Consul's Sameness Groups, Service Resolvers, and Prepared Queries, you can implement robust failover strategies tailored to your specific needs.
Expected Outcome
Upon implementing the procedures described in this article, you will be able to configure your Consul Enterprise on a Kubernetes environment such that:
Pattern | Configuration Location | Best For |
---|---|---|
A (Global Sameness Group Failover) |
SamenessGroup CRD only (defaultForFailover: true ) |
Simple, cluster-wide failover—including DNS look-ups. |
B (Service-Specific Failover) |
ServiceResolver CRD referencing a sameness group |
Granular control—only certain services (or subsets) fail over. |
C (Prepared-Query Failover) | Prepared Query object referencing a sameness group |
DNS / HTTP look-ups outside the mesh, or WAN-federated topologies. |
These configurations will enhance the resilience and availability of your applications running on Consul Enterprise across multiple Kubernetes clusters.
Prerequisites
- Consul Enterprise version 1.16 or later is installed and running on at least two Kubernetes clusters.
- Cluster peering has been successfully set up between the Kubernetes clusters.
- Each cluster hosts services with the same name and within the same namespace.
- You have
kubectl
access to all participating Kubernetes clusters. - You possess a Consul ACL token with the necessary permissions to create and manage Consul configuration entries (Sameness Groups, ServiceResolvers, Prepared Queries).
Use Case
Consider a scenario where you have a critical "backend" service running across three Kubernetes clusters (dc1
, dc2
, and dc3
) peered with Consul Enterprise. You want to ensure that if the "backend" service in your primary cluster (dc1
) becomes unavailable, traffic is automatically routed to healthy instances in dc2
, and subsequently to dc3
if dc2
also experiences issues. This article provides three different approaches to achieve this, catering to different levels of granularity and application access patterns.
Procedure
Fail-over Pattern Compatibility Matrix
Pattern ↓ / Pattern → |
Pattern A: Global Sameness Group
|
Pattern B: ServiceResolver + Sameness Group |
Pattern B: Prepared Query + Sameness Group |
Pattern A: Global Sameness Group
|
✅ Compatible The ServiceResolver overrides the global rule for the specific service, while all other services still use the Global Sameness Group. DNS look-ups continue to follow the Global Sameness Group. |
✅ Compatible Prepared Queries always use the Sameness Group referenced inside the query; they do not interfere with the global rule. |
|
Pattern B: ServiceResolver + Sameness Group |
✅ Compatible The ServiceResolver takes precedence for that service; Global Sameness Group applies elsewhere. |
✅ Compatible Prepared Queries operate independently of ServiceResolvers. |
|
Pattern C: Prepared Query + Sameness Group |
✅ Compatible Prepared Queries and the Global Sameness Group can coexist; each affects only its own lookup path. |
✅ Compatible Prepared Queries operate independently of ServiceResolvers. |
Key points
-
Precedence – When both a global SG (
defaultForFailover:true
) and aServiceResolver.failover.samenessGroup
exist for the same service, ServiceResolver wins for mesh traffic; DNS look-ups still honor the global SG. - Prepared Queries are always independent; they only use the SG referenced in the query payload and do not interfere with mesh behaviour.
- All three patterns can coexist in the same cluster; choose the most specific object for each traffic path you need to control.
Create a Sameness Group
Perform this step on every Kubernetes cluster participating in the failover:
-
Save the following YAML content as
sameness-group.yaml
:apiVersion: consul.hashicorp.com/v1alpha1 kind: SamenessGroup metadata: name: peering-group # reuse this name later spec: defaultForFailover: false # switched per pattern (see next sections) includeLocal: true # prefer local partition first members: # order matters; failover follows this list - peer: dc2 - peer: dc3
-
Edit the
members
list in thesameness-group.yaml
file on each cluster to reflect your specific environment and desired failover order. The order of peers in this list is crucial as it dictates the failover sequence.
-
-
Apply the
sameness-group.yaml
file to the Consul namespace in each cluster usingkubectl
:kubectl --context $CLUSTER1 apply -f sameness-group.yaml -n consul kubectl --context $CLUSTER2 apply -f sameness-group.yaml -n consul # …repeat for each cluster
Pattern A — Global Failover for Every Service
-
Edit the
sameness-group.yaml
file on every cluster and modify thespec
section to:spec: defaultForFailover: true includeLocal: true # prefer local partition first members: - peer: dc2 - peer: dc3
-
Re-apply the updated
sameness-group.yaml
file to the Consul namespace in every cluster:kubectl --context $CLUSTER1 apply -f sameness-group.yaml -n consul kubectl --context $CLUSTER2 apply -f sameness-group.yaml -n consul # …repeat for each cluster
What happens?
-
Mesh traffic: When a service within the mesh has no healthy local instances, Envoy will automatically retry the request against the first healthy service instance found in the
peering-group
member list. -
DNS look-ups: DNS queries for services (e.g.,
backend.service.consul
,backend.virtual.consul
,backend.default.svc.cluster.local
) will also resolve to the addresses of healthy instances in the failover order defined in the Sameness Group.
-
Mesh traffic: When a service within the mesh has no healthy local instances, Envoy will automatically retry the request against the first healthy service instance found in the
Pattern B — Service-Specific Failover with a ServiceResolver
-
Ensure that the
defaultForFailover
field in yoursameness-group.yaml
remains set tofalse
. -
On the source cluster (the cluster from which the calls to the failing service originate), create a file named
backend-resolver.yaml
with the following content:apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceResolver metadata: name: backend spec: failover: '*': # applies to every subset samenessGroup: peering-group
-
Apply this
backend-resolver.yaml
file to the Consul namespace on the source cluster:kubectl --context $SOURCE_CLUSTER apply -f backend-resolver.yaml -n consul
With this configuration, only the
backend
service will now fail over according to thepeering-group
. All other services in the source cluster will continue to resolve to local instances unless you define additional ServiceResolvers for them.
Pattern C — Prepared-Query Failover (DNS / API)
Note: Prepared Queries work for applications that query Consul directly (DNS or HTTP API)
-
Save the following JSON content as
pq-backend.json
. Adjust thePartition
andNamespace
fields if your Consul setup uses custom partitions or namespaces:{ "Name": "backend-query", "Service": { "Service": "backend", "SamenessGroup": "peering-group", "Partition": "default", "Namespace": "consul" } }
-
From any Consul client pod within your Kubernetes clusters (or using
curl
from a machine with access to the Consul HTTP API), create the prepared query:curl --request POST --data @pq-backend.json "${CONSUL_HTTP_ADDR}/v1/query"
-
You can now query this prepared query via DNS. From a pod in your cluster, you can use
dig
:dig @<pod-ip> -p 8600 backend-query.query.consul SRV
Replace
<pod-ip>
with the IP address of a Consul client pod in your cluster. The resulting SRV record will point to the first healthy instance of thebackend
service found within thepeering-group
according to the defined failover order.
Export Services & Intentions (Required for Mesh Traffic)
If the target service that you are failing over to resides in a different cluster, you need to explicitly export the service and create service intentions to allow traffic from the source cluster. Perform these steps on each cluster that hosts the destination service:
-
Export the service: Create a file (e.g.,
export-backend.yaml
) with the following content. Adjust thename
to match the partition name if you are not using the default partition:apiVersion: consul.hashicorp.com/v1alpha1 kind: ExportedServices metadata: name: default # or the partition name spec: services: - name: backend namespace: consul consumers: - samenessGroup: peering-group
Apply this file:
kubectl --context $DESTINATION_CLUSTER apply -f export-backend.yaml -n consul
-
Create service intentions (allow rules): Create a file (e.g.,
intentions-backend.yaml
) to allow traffic from thepeering-group
to thebackend
service:apiVersion: consul.hashicorp.com/v1alpha1 kind: ServiceIntentions metadata: name: backend spec: destination: name: backend namespace: consul sources: - name: '*' namespace: consul action: allow samenessGroup: peering-group
Apply this file:
kubectl --context $DESTINATION_CLUSTER apply -f intentions-backend.yaml -n consul
Repeat these export and intention steps on every cluster where the
backend
service might be the target of a failover.
Additional Information
Troubleshooting Quick-Checks
Additional Resources
- Failover with Sameness Groups – Consul documentation
-
Sameness groups configuration reference – field definitions (
defaultForFailover
,includeLocal
) - ServiceResolver failover – per-service configuration examples
- Prepared Query API – JSON schema and usage
Need help?
Open a ticket with the Consul Support team and include:
-
Sameness Group, ServiceResolver, or Prepared Query YAML/JSON.
-
Output of
kubectl get samenessgroups,serviceresolvers,exportedservices -A -o yaml
. -
The failing DNS or proxy request logs.