Introduction
This document addresses a specific error that can occur in a multi-datacenter Consul environment when the primary datacenter is upgraded to a newer version before the secondary datacenters. The error message is related to ACL replication and is a signal of a version incompatibility between the datacenters.
Problem
After upgrading a primary Consul datacenter to a new version, secondary datacenters still running the older version report the following error in their logs:
failed to update local ACL policies: Failed to apply policy upserts: Changing the Rules for the builtin global-management policy is not permitted
This error indicates that ACL replication has failed, and the secondary datacenters are no longer synchronized with the primary.
Prerequisites
- Consul multi-datacenter setup (WAN federated) with ACL replication enabled.
- The primary datacenter has been upgraded to a newer version.
- Secondary datacenters are still running the older version.
Cause
This issue is a direct result of version incompatibility. While the core function of the built-in global-management
policy remains the same across versions, a new bug fix or feature can cause a subtle, non-disruptive change to the policy's internal structure or content. Exact diff can we viewed here
Consul agents are hard-coded to reject any changes to the global-management
policy. When they receive the updated policy information from the new primary datacenter (e.g., 1.21.x), they interpret it as an unauthorized attempt to modify the policy's rules, which triggers the error and halts ACL replication.
A change in the global management policy is causing errors.
Overview of Possible Solutions
The only recommended and safe solution is to complete the upgrade process across all datacenters. Operating with mixed versions and failed ACL replication creates an inconsistent and potentially insecure state.
The correct procedure is a phased upgrade:
- Temporarily Stop ACL Replication: On all secondary datacenters, modify the configuration to prevent them from attempting to replicate ACLs.
- Upgrade Secondary Datacenters: Perform a rolling upgrade of all secondary datacenters to the new version (e.g., 1.21.x).
- Re-enable ACL Replication: Restore the configuration on the secondary datacenters to re-enable replication and allow them to synchronize with the primary.
Warning: It is not recommended to simply ignore the errors. This will lead to an inconsistent security state, potential service failures, and significant noise in your logs and monitoring systems.
Outcome
After following the recommended upgrade procedure, all Consul datacenters will be running the same version. ACL replication will resume successfully, and the secondary datacenters will be fully synchronized with the primary. All related error messages will cease, and the entire Consul environment will be in a healthy, secure, and consistent state.
Additional Information
To temporarily stop ACL replication on a secondary datacenter:
- Edit the Consul server configuration file on each node in the secondary datacenter.
-
Remove or comment out the
primary_datacenter
setting within theacl
block. For example, change:acl = { primary_datacenter = "your_primary_datacenter_name" }
to:
acl = { # primary_datacenter = "your_primary_datacenter_name" }
- Restart the Consul agent on each server for the change to take effect.
Once the upgrade is complete, you can re-add this configuration line and restart the agents to re-enable replication. For more details, always refer to the official HashiCorp Consul upgrade documentation for your specific versions.