Upgrading Legacy ACL Multi-Datacenter Deployment
Introduction
This guide explains how to best upgrade a multi-datacenter Consul deployment that’s using
Legacy ACLs (i.e., versions < 1.4.0). Due to changes to the ACL system, you need to make
sure you’re upgrading from at a version no earlier than 1.2.4 to the latest version in
the 1.6.x series. The 1.6.x series is the last series that had support for legacy ACL tokens.
A migration process is required after upgrading and this process requires a version of
Consul that recognizes these tokens. As such, upgrading to a 1.6.x version is required
before upgrading to newer versions. Before we get started, here is some documentation that may be useful for reference during this process:
- ACL System in Legacy Mode - You can find
information about legacy configuration options and differences between modes here. - Configuration - You can find more details
around legacy ACL and new ACL configuration options here. Legacy ACL config options
will be listed as deprecates as of 1.4.0.
In this guide, we’ll be using an example with three datacenters (DCs) and will be
referring to them as DC1, DC2, and DC3. DC1 will be the primary datacenter.
Assumptions
This guides makes the following assumptions:
- You have at least two datacenters configured and have ACL replication enabled.
- All Consul servers are on version 1.2.4.
Considerations
There are a couple things to be aware of when moving from version 1.2.4 to the latest
1.6.x release aside from the ACL changes mentioned in this document.
- 1.6.2 introduced more strict JSON decoding. Invalid JSON that was previously ignored might result in errors now (e.g.,
Connect: null
in service definitions). See [GH#6680]. - 1.6.3 introduced the http_max_conns_per_client limit. This defaults to 200. Prior to this, connections per client were unbounded. [GH#7159]
Procedure
1/ Check replication status in DC1 by running the following curl command from a
consul server in that DC:
curl -s -H 'X-Consul-Token: $MASTER_TOKEN' localhost:8500/v1/acl/replication | jq
You should see output that looks like this:
{
"Enabled": false,
"Running": false,
"SourceDatacenter": "",
"ReplicatedIndex": 0,
"LastSuccess": "0001-01-01T00:00:00Z",
"LastError": "0001-01-01T00:00:00Z"
}
2/ Check replication status in DC2 by running the following curl command from a
consul server in that DC:
curl -s -H 'X-Consul-Token: $MASTER_TOKEN' localhost:8500/v1/acl/replication | jq
You should see output that looks like this:
{
"Enabled": true,
"Running": true,
"SourceDatacenter": "dc1",
"ReplicatedIndex": 24,
"LastSuccess": "2020-09-08T15:09:05Z",
"LastError": "0001-01-01T00:00:00Z"
}
3/ Upgrade DC2 & DC3 agents to the latest version of the 1.6.x series. Leave all DC1 agents at 1.2.4. You should start seeing log messages like this after that:
2020/09/08 15:51:29 [DEBUG] acl: Cannot upgrade to new ACLs, servers in acl datacenter have not upgraded - found servers: true, mode: 3
NOTE: It’s important to upgrade your primary datacenter (the one specified in acl_datacenter
)
last. If you upgrade the primary datacenter first, it will break replication between your
other datacenters. If you upgrade your other datacenters first, they will run in legacy mode and
replication from your primary datacenter will continue working.
4/ Check to see if replication is still working in DC3.
From a Consul server in DC3:
curl -s -H 'X-Consul-Token: $MASTER_TOKEN' localhost:8500/v1/acl/replication | jq
curl -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/list | jq
From a Consul server in DC1:
curl -X PUT -H "X-Consul-Token: $MASTER_TOKEN" -d @/policies/ui-policy.json localhost:8500/v1/acl/create
From a Consul server in DC3:
curl -s -H 'X-Consul-Token: $MASTER_TOKEN' localhost:8500/v1/acl/replication | jq
curl -H "X-Consul-Token: $MASTER_TOKEN" localhost:8500/v1/acl/list | jq
ReplicatedIndex
should have incremented and you should see the new token listed. If you try using CLI ACL commands you’ll see this error:
Failed to retrieve the token list: Unexpected response code: 500 (The ACL system is currently in legacy mode.)
This is because Consul is running in legacy mode. ACL CLI commands won’t work and you have to hit the old ACL HTTP endpoints (which is why curl
is being used above rather than the consul
CLI client).
5/ Upgrade DC1 agents to the latest version of the 1.6.x series.
6/ Verify that everything is in a good state by running consul members
and consul operator raft list-peers
along with watching your logs.
7/ Migrate your legacy ACL tokens to the new system by following the instructions in our ACL Token Migration guide.
Post-Upgrade Configuration Changes
Part of the ACL system changes that were introduced involved renaming many configuration options.
You will need to update your Consul server configs with the new options before upgrading to server
versions more recent than 1.6.x. These are the changes you’ll need to make:
acl_datacenter
is now namedprimary_datacenter
(see docs for more info)acl_*_token
options are now specified like this (see docs for more info):tokens { master = "..." agent = "..." agent_master = "..." replication = "..." default = "..." }
acl_default_policy
,acl_down_policy
,acl_ttl
, andenable_acl_replication
options are now specified like this (see docs for more info):acl { enabled = true/false default_policy = "..." down_policy = "..." policy_ttl = "..." role_ttl = "..." enable_token_replication = true/false enable_token_persistence = true/false }
You can verify your changes using consul validate $CONFIG_FILE_PATH
to ensure they’re correct
before restarting Consul to pick them up.