Configuring Failover Between Peered Consul-K8s Clusters with Sameness Groups – HashiCorp Help Center

The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.

All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.

Audience: Platform engineers and SREs running Consul Enterprise on Kubernetes
Purpose: Show three supported methods to send traffic to a healthy peer when a service becomes unavailable in the local cluster.

Introduction

This article outlines three distinct methods for platform engineers and SREs operating Consul Enterprise on Kubernetes to ensure traffic is directed to a healthy peer when a service in the local cluster experiences unavailability. By leveraging Consul's Sameness Groups, Service Resolvers, and Prepared Queries, you can implement robust failover strategies tailored to your specific needs.

Expected Outcome

Upon implementing the procedures described in this article, you will be able to configure your Consul Enterprise on a Kubernetes environment such that:

Pattern	Configuration Location	Best For
A (Global Sameness Group Failover)	`SamenessGroup` CRD only (`defaultForFailover: true`)	Simple, cluster-wide failover—including DNS look-ups.
B (Service-Specific Failover)	`ServiceResolver` CRD referencing a sameness group	Granular control—only certain services (or subsets) fail over.
C (Prepared-Query Failover)	Prepared Query object referencing a sameness group	DNS / HTTP look-ups outside the mesh, or WAN-federated topologies.

These configurations will enhance the resilience and availability of your applications running on Consul Enterprise across multiple Kubernetes clusters.

Prerequisites

Consul Enterprise version 1.16 or later is installed and running on at least two Kubernetes clusters.
Cluster peering has been successfully set up between the Kubernetes clusters.
Each cluster hosts services with the same name and within the same namespace.
You have kubectl access to all participating Kubernetes clusters.
You possess a Consul ACL token with the necessary permissions to create and manage Consul configuration entries (Sameness Groups, ServiceResolvers, Prepared Queries).

Use Case

Consider a scenario where you have a critical "backend" service running across three Kubernetes clusters (dc1, dc2, and dc3) peered with Consul Enterprise. You want to ensure that if the "backend" service in your primary cluster (dc1) becomes unavailable, traffic is automatically routed to healthy instances in dc2, and subsequently to dc3 if dc2 also experiences issues. This article provides three different approaches to achieve this, catering to different levels of granularity and application access patterns.

Procedure

Fail-over Pattern Compatibility Matrix

Pattern ↓ / Pattern →

Pattern A:

Global Sameness Group

defaultForFailover:true

Pattern B:

ServiceResolver + Sameness Group

Pattern B:

Prepared Query + Sameness Group

Pattern A:

Global Sameness Group

defaultForFailover:true

✅ Compatible

The ServiceResolver overrides the global rule for the specific service, while all other services still use the Global Sameness Group. DNS look-ups continue to follow the Global Sameness Group.

✅ Compatible

Prepared Queries always use the Sameness Group referenced inside the query; they do not interfere with the global rule.

Pattern B:

ServiceResolver + Sameness Group

✅ Compatible

The ServiceResolver takes precedence for that service; Global Sameness Group applies elsewhere.

✅ Compatible

Prepared Queries operate independently of ServiceResolvers.

Pattern C:

Prepared Query + Sameness Group

✅ Compatible

Prepared Queries and the Global Sameness Group can coexist; each affects only its own lookup path.

✅ Compatible

Prepared Queries operate independently of ServiceResolvers.

Key points

Precedence – When both a global SG (defaultForFailover:true) and a ServiceResolver.failover.samenessGroup exist for the same service, ServiceResolver wins for mesh traffic; DNS look-ups still honor the global SG.
Prepared Queries are always independent; they only use the SG referenced in the query payload and do not interfere with mesh behaviour.
All three patterns can coexist in the same cluster; choose the most specific object for each traffic path you need to control.

Create a Sameness Group

Perform this step on every Kubernetes cluster participating in the failover:

Save the following YAML content as sameness-group.yaml:

apiVersion: consul.hashicorp.com/v1alpha1
kind: SamenessGroup
metadata:
  name: peering-group          # reuse this name later
spec:
  defaultForFailover: false    # switched per pattern (see next sections)
  includeLocal: true           # prefer local partition first
  members:                     # order matters; failover follows this list
    - peer: dc2
    - peer: dc3

Edit the members list in the sameness-group.yaml file on each cluster to reflect your specific environment and desired failover order. The order of peers in this list is crucial as it dictates the failover sequence.

Apply the sameness-group.yaml file to the Consul namespace in each cluster using kubectl:

kubectl --context $CLUSTER1 apply -f sameness-group.yaml -n consul
kubectl --context $CLUSTER2 apply -f sameness-group.yaml -n consul
# …repeat for each cluster

Pattern A — Global Failover for Every Service

Edit the sameness-group.yaml file on every cluster and modify the spec section to:

spec:
  defaultForFailover: true
  includeLocal: true           # prefer local partition first
  members:
    - peer: dc2
    - peer: dc3

Re-apply the updated sameness-group.yaml file to the Consul namespace in every cluster:
```
kubectl --context $CLUSTER1 apply -f sameness-group.yaml -n consul
kubectl --context $CLUSTER2 apply -f sameness-group.yaml -n consul
# …repeat for each cluster
```
What happens?
- Mesh traffic: When a service within the mesh has no healthy local instances, Envoy will automatically retry the request against the first healthy service instance found in the peering-group member list.
- DNS look-ups: DNS queries for services (e.g., backend.service.consul, backend.virtual.consul, backend.default.svc.cluster.local) will also resolve to the addresses of healthy instances in the failover order defined in the Sameness Group.

Pattern B — Service-Specific Failover with a ServiceResolver

Ensure that the defaultForFailover field in your sameness-group.yaml remains set to false.

On the source cluster (the cluster from which the calls to the failing service originate), create a file named backend-resolver.yaml with the following content:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceResolver
metadata:
  name: backend
spec:
  failover:
    '*':  # applies to every subset
      samenessGroup: peering-group

Apply this backend-resolver.yaml file to the Consul namespace on the source cluster:
```
kubectl --context $SOURCE_CLUSTER apply -f backend-resolver.yaml -n consul
```
With this configuration, only the backend service will now fail over according to the peering-group. All other services in the source cluster will continue to resolve to local instances unless you define additional ServiceResolvers for them.

Pattern C — Prepared-Query Failover (DNS / API)

Note: Prepared Queries work for applications that query Consul directly (DNS or HTTP API)

Save the following JSON content as pq-backend.json. Adjust the Partition and Namespace fields if your Consul setup uses custom partitions or namespaces:

{
  "Name": "backend-query",
  "Service": {
    "Service": "backend",
    "SamenessGroup": "peering-group",
    "Partition": "default",
    "Namespace": "consul"
  }
}

From any Consul client pod within your Kubernetes clusters (or using curl from a machine with access to the Consul HTTP API), create the prepared query:
```
curl --request POST --data @pq-backend.json "${CONSUL_HTTP_ADDR}/v1/query"
```
You can now query this prepared query via DNS. From a pod in your cluster, you can use dig:
```
dig @<pod-ip> -p 8600 backend-query.query.consul SRV
```
Replace <pod-ip> with the IP address of a Consul client pod in your cluster. The resulting SRV record will point to the first healthy instance of the backend service found within the peering-group according to the defined failover order.

Export Services & Intentions (Required for Mesh Traffic)

If the target service that you are failing over to resides in a different cluster, you need to explicitly export the service and create service intentions to allow traffic from the source cluster. Perform these steps on each cluster that hosts the destination service:

Export the service: Create a file (e.g., export-backend.yaml) with the following content. Adjust the name to match the partition name if you are not using the default partition:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ExportedServices
metadata:
  name: default # or the partition name
spec:
  services:
    - name: backend
      namespace: consul
      consumers:
        - samenessGroup: peering-group

Apply this file:

kubectl --context $DESTINATION_CLUSTER apply -f export-backend.yaml -n consul

Create service intentions (allow rules): Create a file (e.g., intentions-backend.yaml) to allow traffic from the peering-group to the backend service:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: backend
spec:
  destination:
    name: backend
    namespace: consul
  sources:
    - name: '*'
      namespace: consul
      action: allow
      samenessGroup: peering-group

Apply this file:

kubectl --context $DESTINATION_CLUSTER apply -f intentions-backend.yaml -n consul

Repeat these export and intention steps on every cluster where the backend service might be the target of a failover.

Additional Information

Troubleshooting Quick-Checks

Symptom	Likely Cause	Fix
`SERVFAIL` from DNS	`defaultForFailover` not set to `true`	Edit the Sameness Group and re-apply it to all clusters.
Mesh calls never leave the local cluster	Missing `ServiceResolver` or incorrect subset name	Ensure a `ServiceResolver` exists for the service (Pattern B) or switch to Pattern A.
Prepared Query returns empty	Partition/namespace mismatch or incorrect ACL token	Verify JSON fields in the Prepared Query and use a token with read permissions on the remote partition.
Traffic loops back locally	`includeLocal: false` but local partition appears first in `members`	Remove the local partition from the `members` list or set `includeLocal: true`.

Additional Resources

Failover with Sameness Groups – Consul documentation
Sameness groups configuration reference – field definitions (defaultForFailover, includeLocal)
ServiceResolver failover – per-service configuration examples
Prepared Query API – JSON schema and usage

Need help?
Open a ticket with the Consul Support team and include:

Sameness Group, ServiceResolver, or Prepared Query YAML/JSON.
Output of kubectl get samenessgroups,serviceresolvers,exportedservices -A -o yaml.
The failing DNS or proxy request logs.

Introduction

Expected Outcome

Prerequisites

Use Case

Procedure

Fail-over Pattern Compatibility Matrix

Create a Sameness Group

Pattern A — Global Failover for Every Service

Pattern B — Service-Specific Failover with a ServiceResolver

Pattern C — Prepared-Query Failover (DNS / API)

Export Services & Intentions (Required for Mesh Traffic)

Additional Information

Troubleshooting Quick-Checks

Additional Resources

Articles in this section

Related articles