Introduction
This guide outlines the procedure for disabling gossip communication for Consul admin partitions that are not currently in use (i.e., have no active client nodes). The goal of this optimization is to reduce unnecessary resource consumption, specifically CPU usage and network connections associated with gossip protocols, in Consul environments with a large number of admin partitions.
Expected Outcome
By following this guide, you will be able to disable gossip for selected admin partitions, leading to:
- Reduced CPU utilization on Consul servers.
- Fewer "flood join" events in Consul logs.
- A decrease in the number of
TIME_WAIT
network connections related to Serf gossip. - Improved overall performance and stability of the Consul cluster, especially in environments with numerous admin partitions.
Prerequisites
-
- Consul Version Compatibility: The ability to disable gossip on a per-partition basis was introduced in Consul Enterprise version 1.18.1. Ensure your Consul cluster meets this minimum version requirement to utilize this feature.
-
jq
: A lightweight and flexible command-line JSON processor. -
curl
: A command-line tool for making HTTP requests. - Consul CLI access: The Consul command-line interface must be installed and configured to communicate with your Consul server(s).
- Consul ACL Token (if enabled): If Access Control Lists (ACLs) are enforced in your Consul environment, you will need a valid ACL token with the necessary permissions. This typically includes partition write privileges or global operator capabilities.
Use Case
This procedure is particularly useful in scenarios where:
- You are gradually or rapidly adding a significant number of admin partitions to your Consul cluster.
- You observe an increase in CPU usage on your Consul servers that correlates with the number of partitions.
- Consul server logs contain messages indicating "flood joins" related to Serf gossip for various partitions.
- You have admin partitions that are created for specific purposes or future use but do not currently have any active client nodes registered within them.
Procedure
Follow these steps to disable gossip for an inactive admin partition:
Step 1: Verify the Absence of Active Clients
First, confirm that the target admin partition has no active client nodes. You can achieve this using either the Consul API or the Consul UI.
-
Using the Consul API: Execute the following command in your terminal, replacing
<partition name>
with the actual name of the admin partition you want to check:curl -s localhost:8500/v1/agent/members | jq '.[] | select(.Tags.ap == "<partition name>")'
If this command returns an empty output, it indicates that no active Consul agents (clients) are currently members of that partition's local agent.
-
Using the Consul UI: Navigate to your Consul UI and inspect the nodes registered under the specific admin partition. If no nodes are listed, it confirms the absence of active clients. Alternatively, you can use the following Consul API endpoint:
curl -s localhost:8500/v1/catalog/nodes?partition=<partition name>
An empty JSON array (
[]
) in the response signifies no registered nodes in the partition.
Step 2: Disable Gossip for the Partition
Once you have confirmed that the admin partition has no active clients, you can disable gossip using the Consul CLI (or API). Execute the following command, replacing <partition name>
with the name of the target partition:
consul partition update -name <partition name> -disable-gossip=true
Ensure that your Consul CLI is configured to communicate with a Consul server in your cluster. If ACLs are enabled, make sure you are using a Consul ACL token with sufficient privileges to modify partitions. You can specify the token using the -token
flag if necessary.
Step 3: Verify the Gossip Status
After running the update command, verify that the DisableGossip
flag has been successfully set for the partition. Use the following Consul CLI command, again replacing <partition name>
:
consul partition list -format=json -show-meta | jq '.[] | select(.Name == "<partition name>")'
The output should be a JSON object containing details about the partition, including the DisableGossip
field set to true
, similar to the example below:
{
"Name": "<partition name>",
"CreateIndex": 4497,
"ModifyIndex": 5592,
"DisableGossip": true
}
This confirms that gossip has been successfully disabled for the specified admin partition.
Additional Information
-
Symptoms of Excessive Partitions: In environments with a large number of admin partitions, you might observe the following symptoms if gossip is not managed efficiently:
- Increased CPU Usage: Consul servers may experience higher CPU utilization due to the overhead of managing gossip across numerous partitions.
-
High
TIME_WAIT
Connections: A large number of network connections in theTIME_WAIT
state can occur due to frequent Serf flood joins and gossip storms as servers attempt to establish and maintain membership in many partition gossip pools.
- Partition Scaling Considerations: It is important to monitor the number of active partitions in your Consul cluster. As a general guideline provided by HashiCorp, approximately 800 partitions can consume around 2 CPU cores. Plan your infrastructure and partition strategy accordingly. Remember that this is just a guideline, and actual resource consumption can vary based on cluster size, network latency, and other factors.