The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.

All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.

Background

When attempting to remove a Datacenter (DC) from the federation, issues may arise with Consul contacting the removed DC, especially in scenarios where two or more federated DCs exist.

Consul employs a caching mechanism to retain LAN and WAN membership information for agents within each DC connected to the federation.

Typically, addressing this issue involves adjusting the reconnect_timeout_wan parameter. This parameter serves as the WAN equivalent of the reconnect_timeout parameter, which determines the duration it takes for a failed server to be entirely removed from the WAN pool.

Solution

Please note that the warnings/errors mentioned herein are not anticipated to disrupt the functionality of Consul, and no further action is required. By default, the caching mechanism currently implemented automatically refreshes every 72 hours, ensuring that these specific logs will cease to appear.

Nonetheless, if the presence of these messages in the logs triggers unnecessary alerts, such as those related to Consul Health Checks or Monitoring, or if they excessively occupy log space, there are practical measures available to address and mitigate these concerns.

The number of secondary DCs in the environment will determine which options are available.

Within a federated environment encompassing two or more secondary clusters, it's important to note that an immediate cessation of messages is not currently available. The singular recourse at your disposal involves adjusting the cache time. This can be achieved by introducing the reconnect_timeout_wanparameter and configuring it to a minimum of 8 hours. This strategic adjustment offers the most viable option to address and potentially expedite the handling of these messages within the federated setup.
- See Add the reconnect_timeout_wan Parameter
In a scenario featuring just two federated data centers (one primary and one secondary), and the above option #1 is not adequate, you can intentionally interrupt the connection between these datacenters on port 8302. Subsequently, you will utilize the Consul force-leave command to effectively remove the specified member.
- See Removing WAN Federation between TWO Consul Clusters

Examples of the Log Messages

2022-11-09T15:15:23.222Z [WARN] agent.server.rpc: RPC request for DC is currently failing as no path was found: datacenter=prod-floky-v1-1 method=Internal.ServiceDump

RPC failed to server: method=Internal.ServiceDump server=100.x.x.x:8300 error="rpc error making call: No path to datacenter"

Recommendations

Before implementing this in a production environment, it is highly recommended to conduct testing in a controlled environment such as a sandbox or staging setup. This practice serves to validate seamless functionality and mitigate any potential issues that may be specific to your environment, ensuring a smoother transition to production.

Add the reconnect_timeout_wan Parameter

Add the reconnect_timeout_wan parameter to the configuration file for all the server nodes in each datacenter (for example, setting it to 8 hours)
```
reconnect_timeout_wan = "8h"
```
Initiate a phased restart of the primary servers, starting with the followers and ending with the leader
- Note: Starting with Consul v1.15.x there is an option to transfer the leadership to one of the followers using the transfer-leader command
  - Upon the successful reintegration of one of the followers into the cluster as a voter, you may transfer the leadership to one of your choosing
Follow your process for removing the secondary DC
Confirm the status of the specified secondary data center by executing the following command to ensure it is indicated as "left" or "failed"
```
consul members -wan
```
Examine the logs after the designated period to ensure that error/warning messages no longer appear

Removing WAN Federation between TWO Consul Clusters

NOTE: WAN traffic will experience disruption during this procedure, but all internal cluster traffic should remain unaffected.

Use iptables rules to drop all traffic between the two WAN federated clusters. This will cause both clusters to think nodes in the other cluster have failed.
- Attempt this with root access:
```
sudo iptables -A OUTPUT -p tcp --dport 8302 -j REJECT
sudo iptables -A INPUT -p tcp --dport 8302 -j REJECT
sudo iptables -A OUTPUT -p udp --dport 8302 -j REJECT
sudo iptables -A INPUT -p udp --dport 8302 -j REJECT
```
- If you’re working remotely via SSH, you might need to open port 22
  - -I inserts it before all other rules in INPUT
```
iptables -I INPUT -p tcp --dport 22 -j ACCEPT
```
    - If your SSH service is listening on another port, you’ll have to use that port instead of 22
After the clusters have been cleanly separated, you'll need to remove the retry_join_wan parameter in the configuration file on each consul node accordingly
- Parameter example
```
retry_join_wan = ["dc2-server-1", "dc2-server-2", "dc2-server-3"]
```
- Note: The value can contain IPv4, IPv6, or DNS addresses.
Reboot each node to update these values
Run the force-leave CLI command to separate the two WAN federated clusters cleanly
```
consul force-leave [options] node
```
- If you have ACLs enabled and need to pass a token, use the -token=<value> in the options before specifying the node name

To re-open port 8302 using iptables, use the same command but instead of using REJECT, add ACCEPT

sudo iptables -A INPUT -p tcp --dports 8302 -j ACCEPT
sudo iptables -A INPUT -p udp --dports 8302 -j ACCEPT
sudo iptables -A OUTPUT -p tcp --dports 8302 -j ACCEPT
sudo iptables -A OUTPUT -p udp --dports 8302 -j ACCEPT

You can also simply remove the rules by using the -D flag rather than the -A flag

sudo iptables -D INPUT -p tcp --dports 8302 -j ACCEPT
sudo iptables -D INPUT -p udp --dports 8302 -j ACCEPT
sudo iptables -D OUTPUT -p tcp --dports 8302 -j ACCEPT
sudo iptables -D OUTPUT -p udp --dports 8302 -j ACCEPT

Additional Resources

Remove WAN Federation between Consul Clusters - Kubernetes (K8s)

Remove WAN Federation between Consul Clusters - Virtual Machines (VMs)

Background

Solution

Examples of the Log Messages

Add the reconnect_timeout_wan Parameter

Removing WAN Federation between TWO Consul Clusters

Additional Resources

Articles in this section

Background

Solution

Examples of the Log Messages

Add the reconnect_timeout_wan Parameter

Removing WAN Federation between TWO Consul Clusters

Additional Resources

Articles in this section

Related articles