Introduction
This guide helps troubleshoot Cloud Auto-Join issues in Consul caused by outdated AWS SDKs. Older Consul versions may fail to discover EC2 instances due to limited support for IMDSv2 and IAM credential resolution, even when roles are correctly configured.
Problem
Consul version 1.7.0 uses an outdated AWS SDK that does not fully support EC2 metadata service (IMDSv2) or proper IAM credential chain resolution. As a result, AWS discovery fails even if IAM roles are properly configured.
Cause
Outdated AWS SDK in Older Consul Versions
A key reason this issue occurs is that older Consul versions ship with an outdated AWS SDK that does not fully support EC2 metadata service (IMDSv2) or proper IAM credential chain resolution. As a result, AWS discovery fails even if IAM roles are properly configured.
You can confirm the embedded AWS SDK version by running from the terminal:
$ consul version && go version -m $(which consul) | grep aws
Example (failing version):
Consul v1.6.10
Protocol 2 spoken by default, understands 2 to 3
dep github.com/aws/aws-sdk-go v1.15.24 h1:xLAdTA/ore6xdPAljzZRed7IGqQgC+nY+ERS5vaj4Ro=
Example (working version):
Consul v1.7.10
Protocol 2 spoken by default, understands 2 to 3
dep github.com/aws/aws-sdk-go v1.25.41 h1:/hj7nZ0586wFqpwjNpzWiUTwtaMgxAZNZKHay80MdXw=
Note: The AWS library for consul was updated on v1.7.0
Error Messages
Log excerpts from affected Consul agents may include:
Error 1:
[ERR] agent: Cannot discover LAN provider=aws tag_key=abc tag_value=xyz: discover-aws: GetInstanceIdentityDocument failed: EC2MetadataRequestError: failed to get EC2 instance identity document caused by: EC2MetadataError: failed to make EC2Metadata request caused by:
Error 2:
[DEBUG] discover: Using provider "aws"
[DEBUG] discover-aws: Creating session...
[WARN] serf: Failed to re-join any previously known node
[ERR] agent: Cannot discover LAN provider-aws tag_key=consul tag_value=cluster-member region=ca-central-1: discover-aws: DescribeInstancesInput failed: NoCredentialProv
iders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Solutions
Upgrade to a Newer Consul Binary (Primary Fix)
The solution is to upgrade Consul to a version that has the latest AWS SDK (IMDSv2). Older versions (e.g., v1.6.10) embed a deprecated SDK that does not support IMDSv2 or proper IAM credential chain resolution, leading to discovery failures.
Recommended Version: Consul v1.7.10 or later
Steps:
- Download the latest Consul binary (OSS or Enterprise) from the official site.
- Replace the existing binary on your EC2 instance.
- Restart the Consul agent.
- Verify successful AWS Cloud Auto-Join in logs.
Use manual join (Workaround)
If upgrading is not possible (due to operational constraints or licensing), configure manual joining as a workaround.
Manual join allows agents to discover peers via static IPs or DNS names.
Ensure the OS firewall, VPC network ACLs, and security group rules must allow traffic between the agents on relevant Consul ports (default: TCP/8301, TCP/8300, TCP/8500, etc.)
Example:
consul join 10.0.1.12 10.0.1.13
Refer to the official documentation: https://developer.hashicorp.com/consul/commands/join
If the Issue persists After Upgrading
If the issue still occurs after upgrading Consul, check the following:
1. Validate EC2 metadata service access
Ensure the EC2 instance has access to its metadata service. Ensure no OS firewall, VPC network ACLs, or security groups are preventing metadata access.
2. Confirm IAM Role and Permissions
Check that the EC2 instance has an IAM role attached and that it grants sufficient permissions to discover other EC2 instances.
Minimum required policy:
{ "Effect": "Allow", "Action": ["ec2:DescribeInstances"], "Resource": "*" }
Confirm that the IAM role is correctly attached and active in your EC2 configuration.
Outcome
After applying the above steps:
- Consul agents should successfully perform AWS-based peer discovery and cluster join.
- Errors related to metadata or credentials should no longer appear in logs.