Prerequisites
- vault version 1.6.0 and higher, [cloud auto join](https://learn.hashicorp.com/tutorials/vault/raft-storage-aws?in=vault/raft#cloud-auto-join) is introduced in this version..
- Setting up Vault cluster in AWS using [auto_join](https://www.vaultproject.io/docs/configuration/storage/raft#retry_join-stanza) for discovery of nodes.
- awscli running on aws nodes, which vault nodes (ec2 instances) have [describe instance](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instances.html) permissions.
- 3 EC2 instances are provisioned with correct tags in correct aws region.
- One of the vault node is initialized and other 2 nodes should join to this vault node forming a 3 node vault cluster using Integrated storage.
Cause
- Vault uses
auto_join
which takes cloud provider specific configurations as input. Whenauto_join
is configured, Vault will automatically attempt to discover and resolve potential leader address to setup cluster. - Sometimes
auto_join
in vault integrated storage stanza does not discover nodes as expected. Vault logs show how vault tried to discover other vault node instances and what IPAddress were discovered. - In this example I am using below vault config file using `auto_join = "provider=aws addr_type=private_v4 tag_key=auto_join tag_value=vault-prd region=ap-southeast-1"`
$ sudo cat /etc/vault.d/vault.hcl
storage "raft" {
path = "/vault/xxx
node_id = "xxx"
retry_join {
auto_join_scheme = "http"
auto_join = "provider=aws addr_type=private_v4 tag_key=auto_join tag_value=vault-prd region=ap-southeast-1"
}
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = true
}
.
.
seal "awskms" {
.
.
}
.
- In below logs, this specific vault nodes could not discover any IP address `[DEBUG] discover-aws: Found ip addresses: []` based on the filter configured in `auto_join` i.e. `core: [INFO] discover-aws: Filter instances with auto_join=vault-prd"` and `core: [INFO] discover-aws: Region is ap-southeast-1"`. As this node could not discover already initialized node, it could not join the cluster (and unseal itself).
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.352Z [ERROR] core: failed to retry join raft cluster: retry=2s"
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.352Z [INFO] core: [DEBUG] discover-aws: Found ip addresses: []"
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.352Z [INFO] core: [DEBUG] discover-aws: Found 0 reservations"
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.307Z [INFO] core: [INFO] discover-aws: Filter instances with auto_join=vault-prd"
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.307Z [INFO] core: [DEBUG] discover-aws: Creating session..."
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.307Z [INFO] core: [INFO] discover-aws: Region is ap-southeast-1"
"Sep 29 16:28:19 vault: 2021-09-29T16:28:19.307Z [INFO] core: [DEBUG] discover-aws: Using environment variables, shared credentials or instance role
.
.
"Sep 29 16:46:47 vault: 2021-09-29T16:46:47.806-0400Z [INFO] core: security barrier not initialized"
"Sep 29 16:46:47 vault: 2021-09-29T16:46:47.377-0400Z [INFO] core: security barrier not initialized"
"Sep 29 16:46:47 vault: 2021-09-29T16:46:47.379-0400Z [INFO] core: stored unseal keys supported, attempting fetch
"Sep 29 16:46:47 vault: 2021-09-29T16:46:47.387-0400Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
- Below command can be used to verify if above filter configured in `auto_join` results in discovering any EC2 instances using awscli. For this login to one of the EC2 instances (on which vault process is running) and check if this node is able to discover other vault nodes. Below logs show no aws nodes were discovered/filtered using tag used above.
# tag "auto_join=vault-prd' does not result in any IP addresses discovered.
$ aws ec2 describe-instances --filters "Name=tag:auto_join,Values=vault-prd" --region ap-southeast-1 | jq '.Reservations[].Instances[].PublicIpAddress'
You haven’t been granted access to the specific page by an administrator
-
After checking and verification of tags on these node. It was found the tag is "auto_join=vault-prod". Using aws CLI with correct tags could discover IPAddresses of EC2 instances.
# modify aws command to using correct tag which resulted in IPaddress of vault nodes.
# grab Private IP address of vault nodes with tags "auto_join=vault-prod"
$ aws ec2 describe-instances --filters "Name=tag:auto_join,Values=vault-prod" --region ap-southeast-1 | jq '.Reservations[].Instances[].PrivateIpAddress'
"10.0.101.23"
"10.0.101.24"
"10.0.101.22"
# grab Public IP address of vault nodes with tags "auto_join=vault-prod"
$ aws ec2 describe-instances --filters "Name=tag:auto_join,Values=vault-prod" --region ap-southeast-1 | jq '.Reservations[].Instances[].PublicIpAddress'
"54.255.241.88"
"13.228.29.39"
"13.229.54.249"
Solutions:
- Once correct tag is identified, modify vault config file with correct values. Correct vault.hcl file looks as below now:
$ sudo cat /etc/vault.d/vault.hcl
storage "raft" {
path = "/vault/xxx
node_id = "xxx"
retry_join {
auto_join_scheme = "http"
auto_join = "provider=aws addr_type=private_v4 tag_key=auto_join tag_value=vault-prod region=ap-southeast-1"
}
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = true
}
.
.
.
- Vault logs show discovered IPs and nodes joined already initialised node to form a cluster.
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [DEBUG] discover: Using provider "aws"
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [DEBUG] discover-aws: Using region=ap-southeast-1 tag_key=auto_join tag_value=vault-prod addr_type
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [DEBUG] discover-aws: No static credentials
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [DEBUG] discover-aws: Using environment variables, shared credentials or instance role
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [INFO] discover-aws: Region is ap-southeast-1
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [DEBUG] discover-aws: Creating session...
Oct 01 00:16:02 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:02.959Z [INFO] core: [INFO] discover-aws: Filter instances with auto_join=vault-prod
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.146Z [INFO] core: [DEBUG] discover-aws: Found 3 reservations
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.147Z [INFO] core: [DEBUG] discover-aws: Reservation r-0e7fbaf5c0f3f5321 has 1 instances
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.147Z [INFO] core: [DEBUG] discover-aws: Found instance i-0cb82a603a6bd53ed
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.147Z [INFO] core: [INFO] discover-aws: Instance i-0cb82a603a6bd53ed has private ip 10.0.101.23
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.147Z [INFO] core: [DEBUG] discover-aws: Reservation r-09ae06171405fe029 has 1 instances
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.147Z [INFO] core: [DEBUG] discover-aws: Found instance i-020197b58ded5c18c
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.147Z [INFO] core: [INFO] discover-aws: Instance i-020197b58ded5c18c has private ip 10.0.101.24
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.148Z [INFO] core: [DEBUG] discover-aws: Reservation r-08b60e698586ed69b has 1 instances
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.148Z [INFO] core: [DEBUG] discover-aws: Found instance i-099f0c7b8d026f3a5
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.148Z [INFO] core: [INFO] discover-aws: Instance i-099f0c7b8d026f3a5 has private ip 10.0.101.22
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.148Z [INFO] core: [DEBUG] discover-aws: Found ip addresses: [10.0.101.23 10.0.101.24 10.0.101.22]
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.148Z [INFO] core: security barrier not initialized
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.148Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://10.0.101.23:8200
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.166Z [INFO] core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.169Z [INFO] core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.172Z [INFO] storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, Electi
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.174Z [INFO] storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_2 Address:10.0.101.22:82
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.174Z [INFO] core: successfully joined the raft cluster: leader_addr=""
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.174Z [INFO] storage.raft: entering follower state: follower="Node at 10.0.101.24:8201 [Follower]" leader=
Oct 01 00:16:03 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:03.325Z [WARN] storage.raft: failed to get previous log: previous-index=329 last-index=1 error="log not found"
Oct 01 00:16:07 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:07.959Z [INFO] core: stored unseal keys supported, attempting fetch
Oct 01 00:16:07 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:07.962Z [WARN] core: cluster listener is already started
Oct 01 00:16:07 ip-10-0-101-24 vault[8374]: 2021-10-01T00:16:07.962Z [INFO] core: vault is unsealed
Outcome
- We should be able to identify correct tags/region to use in `auto_join` stanza in Integrated Storage. Once correct tags are set in `auto_join` vault node should be able to join and form a cluster.