Introduction
Consul's ability to maintain a stable and functioning cluster relies heavily on its leader election process. This process, powered by the Raft consensus protocol, requires a quorum of server nodes, known as "voters," to elect a leader and ensure cluster-wide agreement on its state.
In an ideal scenario, all server nodes actively participate in this election. However, there are circumstances where a Consul server might be demoted or removed as a voter, impacting the cluster's stability and fault tolerance.
Understanding the dynamics of voter demotion and removal is crucial for maintaining a healthy Consul cluster. These dynamics differ significantly between open-source Consul and Consul Enterprise. In open-source deployments, cluster maintenance is primarily manual, with unhealthy agents being the most common cause for voter removal. Consul Enterprise, on the other hand, introduces automated features that can demote or remove voters under specific conditions.
This article explores the various scenarios and mechanisms that can lead to voter demotion or removal in both open-source and Enterprise Consul, providing valuable insights for administrators and operators.
Scenario
Unexpected voter changes:
Imagine you're managing a Consul cluster and notice that the number of voters has decreased unexpectedly. You're unsure why a particular server node is no longer participating in the leader election. This article will guide you through the possible causes of such an event, helping you diagnose whether it was due to an agent failure, network issues, or automated mechanisms within Consul Enterprise.
OSS vs Enterprise
Open Source:
Consul voters will only be removed if the node fails, or leaves as determined via gossip.
Node status observed through gossip
Enterprise:
Consul Enterprise, with its Autopilot features, introduces additional scenarios where a server node might be demoted or removed as a voter. These automated actions aim to enhance cluster stability and self-healing. However, similar to open-source Consul, Enterprise also removes or demotes voters when a node is deemed unhealthy or has left the cluster.
Node status observed through gossip
Autopilot's Impact on Voter Demotion/Removal
Autopilot, Consul Enterprise's automation feature, introduces several scenarios where voters might be demoted or removed to maintain cluster health and stability:
-
Automated Upgrades:
- Rolling Upgrades: During automated upgrades, older version server nodes are demoted as newer version nodes are promoted to voters. This ensures a smooth transition with minimal disruption. For details on the timing of this process, refer to the "Add new servers" section in the Automate Upgrades with Consul Enterprise tutorial.
-
Upgrade Migration Control: You can control this automatic voter rotation during upgrades using the
disable_upgrade_migration
option. If set totrue
, voters won't be automatically demoted. To enable or disable this feature, use theconsul operator autopilot set-config -disable-upgrade-migration=false
command.
-
Redundancy Zone Management:
- Maintaining Zone Balance: Autopilot strives to maintain one voter per redundancy zone. If a zone has extra voters, they will be demoted once healthy replacements become available in other zones.
- Zone Failure Recovery: In case of a total zone failure, Autopilot prioritizes maintaining the overall desired number of voters. This might lead to multiple voters existing in one zone temporarily. However, once the failed zone recovers, the extra voters will be demoted to restore balance.
-
Unhealthy Server Handling:
- Proactive Demotion/Removal: Autopilot proactively demotes or removes unhealthy servers in a zone to prevent disruptions.
- Delayed Removal (When Autopilot is Disabled): When Autopilot is disabled, there's a 72-hour delay before unhealthy servers are removed. This grace period allows for potential recovery or manual intervention.
Key Takeaways:
- Autopilot automates voter management to optimize cluster health and availability.
- Understanding these scenarios helps you anticipate and interpret voter changes in your Consul Enterprise deployments.
- Refer to the Consul documentation for detailed information on Autopilot's behavior and configuration options.
FAQ
Question |
Answer |
---|---|
How do I know if a server agent is currently a voter in the cluster? |
On a server agent run the command:
Simply review the “Voter” column from the output of this command.
|
What kind of agent is eligible to become a voter? |
Only server agents are able to become voters in Consul. |
Why can’t client agents participate in the leader election? |
Because a client agent is not able to become a leader within the cluster it is ineligible to vote in the leader election. All participating voters in an election are candidates to become the leader. Any requests sent to client agents will be forwarded to the leader server agent of the cluster. |
How can I remove a voter from the cluster? |
To remove a voter from the cluster and raft, you will need to use the Consul CLI remove-peer command. |
Additional Information
- Consensus Protocol
- Gossip Protocol
- Autopilot Enterprise features
- Consul Operator Autopilot
- Redundancy Zones
- Autopilot Dead server cleanup