After making a change in an environment, a previously working ACL process is now giving an error like
error creating bootstrap configuration for Connect proxy sidecar: exit status 1
This error message is perhaps one of the symptoms of this issue.
It was noticed that the connect_inject_init container was not starting even with full write permissions in the ACL policy for the sidecar proxy service.
Prerequisites (if applicable)
- Using Consul Connect
Consul version in use was
This version supports legacy ACL modes and the client nodes, upon restart, started up the ACL in legacy modes while the rest of the cluster was using ACL in non-legacy mode after restart thereby leaving them in a broken state.
The trace logs did not show any sign of the ACL check loop running to transition out of ACL legacy mode.
Remediation was to
consul leave on SERVER-1, removing it from the gossip ring. Gossip propagated the leave correctly and quickly, allowing clients to come back up with the correct ACL setting.
The best way to confirm this discrepancy in ACL modes is by verifying the output of the command
consul members -detailed.
acls = 2 signifies Legacy Mode.
broken-client% consul members --detailed Node Address Status Tags ... SERVER-1 [127.0.0.1:8301](http://127.0.0.1:8301/) alive acls=2,ap=default,build=1.9.14:5b3be833, <DETAILS REMOVED>
This GH card explains the issue.
This won't be an issue for Consul
1.11.x and up as legacy mode was completely removed and the server will start directly in the new mode.