Issue Summary:
Vault versions 1.19.6 and above in the 1.19.x series, and all versions in 1.20.x, encounter failures using AWS cloud auto_join if the region does not support AWS EC2 dual stack API endpoints, or if network controls prevent access to these endpoints. This impacts both fresh installations and upgrades for Vault clusters leveraging the auto_join function.
Symptoms:
- Vault nodes fail to discover peers using cloud auto_join in affected AWS regions.
-
Error messages appear in Vault logs, such as:
error="failed to parse addresses from auto-join metadata: discover-aws: DescribeInstancesInput failed: operation error EC2: DescribeInstances, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://ec2.[region].api.aws/\": dial tcp: lookup ec2.[region].api.aws: no such host"
Cause:
The issue is caused by an underlying change in the go-discover module, which switched Vault’s auto_join discovery logic to require EC2 dual-stack API endpoints.
Affected Versions:
- Vault 1.19.6,Vault 1.19.7 and Vault 1.19.8
- Vault 1.20.0, Vault 1.20.1 and Vault 1.20.2
Fixed Version:
- Vault 1.19.9
- Vault 1.20.3
Note: Clusters where outgoing network traffic to dual-stack endpoints is blocked will still fail in AWS Cloud auto-join in the above fixed versions. For such use cases, the only available solution is to allow connectivity to the dual-stack API endpoint currently.
Impact:
- Fresh setups and upgrades in affected regions will not be able to use cloud auto_join and will fail during instance discovery.
- Regions lacking dual stack endpoint support, or clusters blocked by proxy/network rules, will see discovery failures.
Workarounds:
- Replace auto_join with the
retry_join
stanza, using static IP addresses or DNS names to configure cluster joining manually. - If feasible, implement a DNS or network rule to route dual-stack endpoint requests to the region’s IPv4 endpoint, or allow outgoing traffic to dual-stack endpoints in supported regions.
Resources: