Introduction:
Vault uses several storage backends to store its secrets and other configuration data. Integrated Storage (ie. Raft) provides several advantages like operational efficiency, network efficiency, etc. over the external storage backend.
There is a guide to cover the steps required to perform storage migration. In this guide we will cover one such scenario where Vault storage migration from Consul to Raft will lead to Vault nodes getting into active-active
state, and how we could avoid such cases.
Setup:
Our lab consists of a minimalistic setup of two EC2 nodes (where both consul and vault agents are running with minimal configuration). Additionally, we will be using an auto-unseal mechanism (AWS KMS) to generate recovery keys/shards.
On "vault-1" node:-
-
Run a consul agent:-
# consul agent -server -bootstrap=true -data-dir=/etc/consul/data/ -bind=172.31.17.225 -client=172.31.17.225
-
We need to create one AWS KMS key for auto-unseal purposes.
# aws kms create-key --region ap-southeast-1
# aws kms describe-key --key-id 89659ef2-d21b-4f93-b8e8-75df5e314573 --region ap-southeast-1
{
"KeyMetadata": {
"AWSAccountId": "266349568266",
"KeyId": "89659ef2-d21b-4f93-b8e8-75df5e314573",
"Arn": "arn:aws:kms:ap-southeast-1:266349568266:key/89659ef2-d21b-4f93-b8e8-75df5e314573",
"CreationDate": "2024-06-28T03:38:32.839000+00:00",
"Enabled": true,
"Description": "",
"KeyUsage": "ENCRYPT_DECRYPT",
"KeyState": "Enabled",
"Origin": "AWS_KMS",
"KeyManager": "CUSTOMER",
"CustomerMasterKeySpec": "SYMMETRIC_DEFAULT",
"KeySpec": "SYMMETRIC_DEFAULT",
"EncryptionAlgorithms": [
"SYMMETRIC_DEFAULT"
],
"MultiRegion": false
}
}
-
Create a vault configuration file by passing the required stanza of
storage
andseal
and then run a vault agent with these config files.
storage "consul" {
address = "172.31.17.225:8500"
path = "vault/"
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = 1
}
seal "awskms" {
region = "ap-southeast-1"
kms_key_id = "89659ef2-d21b-4f93-b8e8-75df5e314573"
}
api_addr = "http://172.31.17.225:8200"
cluster_addr = "http://172.31.17.225:8201"
disable_mlock = true
# vault server -config=./vault_config.hcl
==> Vault server configuration:
Administrative Namespace:
Api Address: http://172.31.17.225:8200
Cgo: disabled
Cluster Address: https://172.31.17.225:8201
Environment Variables: BASH_FUNC_which%%, GODEBUG, HISTSIZE, HOME, HOSTNAME, LANG, LESSOPEN, LOGNAME, LS_COLORS, MAIL, OLDPWD, PATH, PWD, SHELL, SHLVL, SUDO_COMMAND, SUDO_GID, SUDO_UID, SUDO_USER, SYSTEMD_COLORS, S_COLORS, TERM, USER, _, which_declare
Go Version: go1.21.5
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level:
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: consul (HA available)
Version: Vault v1.15.5, built 2024-01-26T14:53:40Z
Version Sha: 0d8b67ef63815f20421c11fe9152d435af3403e6
==> Vault server started! Log data will stream in below:
2024-06-28T06:32:45.761Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2024-06-28T06:32:45.811Z [INFO] incrementing seal generation: generation=1
2024-06-28T06:32:45.812Z [INFO] core: Initializing version history cache for core
2024-06-28T06:32:45.812Z [INFO] events: Starting event system
2024-06-28T06:32:45.812Z [INFO] core: stored unseal keys supported, attempting fetch
...
-
Check the vault-1 node status, which HA mode shows
Active
state.
[root@ip-172-31-17-225 ec2-user]# vault status
Key Value
--- -----
Seal Type awskms
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.15.5
Build Date 2024-01-26T14:53:40Z
Storage Type consul
Cluster Name vault-cluster-7c302d46
Cluster ID 83898ee0-4d32-463c-663f-10bd9b13f10e
HA Enabled true
HA Cluster https://172.31.17.225:8201
HA Mode active
Active Since 2024-06-28T13:20:37.474663412Z
On "vault-2" node:-
-
Run a consul agent with the following consul config file.
consul_config.hcl
bind_addr = "172.31.31.122"
client_addr = "172.31.31.122"
data_dir ="/etc/consul/data/"
retry_join = ["provider=aws region=ap-southeast-1 tag_Name=Name tag_value=vault"]
# consul agent -config-file=./consul_config.hcl
-
Vault configuration file for the vault-2 node by passing the same
storage
andseal
stanzas.
storage "consul" {
address = "172.31.31.122:8500"
path = "vault/"
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = 1
}
seal "awskms" {
region = "ap-southeast-1"
kms_key_id = "89659ef2-d21b-4f93-b8e8-75df5e314573"
}
api_addr = "http://172.31.31.122:8200"
cluster_addr = "http://172.31.31.122:8201"
disable_mlock = true
# vault server -config=./config.hcl
==> Vault server configuration:
Administrative Namespace:
Api Address: http://172.31.31.122:8200
Cgo: disabled
Cluster Address: https://172.31.31.122:8201
Environment Variables: BASH_FUNC_which%%, GODEBUG, HISTSIZE, HOME, HOSTNAME, LANG, LESSOPEN, LOGNAME, LS_COLORS, MAIL, OLDPWD, PATH, PWD, SHELL, SHLVL, SUDO_COMMAND, SUDO_GID, SUDO_UID, SUDO_USER, SYSTEMD_COLORS, S_COLORS, TERM, USER, _, which_declare
Go Version: go1.21.5
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level:
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: consul (HA available)
Version: Vault v1.15.5, built 2024-01-26T14:53:40Z
Version Sha: 0d8b67ef63815f20421c11fe9152d435af3403e6
==> Vault server started! Log data will stream in below:
2024-06-28T06:33:12.124Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
2024-06-28T06:33:12.179Z [INFO] incrementing seal generation: generation=1
...
-
Check the vault-2 node status, as it joins
vault-1
node so its state is showingstandby
[root@ip-172-31-31-122 ec2-user]# vault status
Key Value
--- -----
Seal Type awskms
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.15.5
Build Date 2024-01-26T14:53:40Z
Storage Type consul
Cluster Name vault-cluster-7c302d46
Cluster ID 83898ee0-4d32-463c-663f-10bd9b13f10e
HA Enabled true
HA Cluster https://172.31.17.225:8201
HA Mode standby
Active Node Address http://172.31.17.225:8200
Now let us perform the storage migration from consul
to raft
On "vault-1" node:-
-
We will create the following
migrate.hcl
file to specify the source and destination storage backend for the Vault. Need to pass this file on tovault operator migrate -config=<file_name>
[root@ip-172-31-17-225 multiple_active_node_issue]# cat migrate.hcl
storage_source "consul" {
address = "172.31.17.225:8500"
path = "vault"
}
storage_destination "raft" {
path = "/etc/vault/raft/"
node_id = "vault-1"
}
cluster_addr = "http://172.31.17.225:8201"
[root@ip-172-31-17-225 multiple_active_node_issue]# vault operator migrate -config=migrate.hcl
2024-06-28T13:24:46.482Z [WARN] appending trailing forward slash to path
2024-06-28T13:24:46.496Z [INFO] creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault-1\", NotifyCh:(chan<- bool)(0xc002a5ea80), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.intLogger)(0xc002c99b80), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2024-06-28T13:24:46.497Z [INFO] initial configuration: index=1 servers="[{Suffrage:Voter ID:vault-1 Address:172.31.17.225:8201}]"
2024-06-28T13:24:46.497Z [INFO] entering follower state: follower="Node at vault-1 [Follower]" leader-address= leader-id=
2024-06-28T13:24:55.236Z [WARN] heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2024-06-28T13:24:55.236Z [INFO] entering candidate state: node="Node at vault-1 [Candidate]" term=2
2024-06-28T13:24:55.241Z [INFO] election won: term=2 tally=1
2024-06-28T13:24:55.241Z [INFO] entering leader state: leader="Node at vault-1 [Leader]"
2024-06-28T13:24:55.256Z [INFO] copied key: path=core/auth
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/leader/04122fdb-a630-e024-575d-1d73cf11c08d
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/cluster/local/info
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/hsm/barrier-unseal-keys
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/index-header-hmac-key
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/audit
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/cluster/feature-flags
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/local-auth
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/keyring
2024-06-28T13:24:55.258Z [INFO] copied key: path=core/local-audit
2024-06-28T13:24:55.267Z [INFO] copied key: path=logical/65928e8a-a4c5-6fbc-54ad-8281644bda7c/oidc_provider/provider/default
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/mounts
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/master
2024-06-28T13:24:55.267Z [INFO] copied key: path=logical/65928e8a-a4c5-6fbc-54ad-8281644bda7c/oidc_provider/assignment/allow_all
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/recovery-key
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/local-mounts
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/seal-config
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/versions/1.15.5
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/wrapping/jwtkey
2024-06-28T13:24:55.267Z [INFO] copied key: path=core/recovery-config
2024-06-28T13:24:55.272Z [INFO] copied key: path=logical/65928e8a-a4c5-6fbc-54ad-8281644bda7c/oidc_tokens/named_keys/default
2024-06-28T13:24:55.275Z [INFO] copied key: path=sys/token/id/hd5e41c784c681de5fff6642add8123eaf7aee401370c53eadce92622a2905929
2024-06-28T13:24:55.275Z [INFO] copied key: path=sys/token/salt
2024-06-28T13:24:55.275Z [INFO] copied key: path=sys/token/accessor/dd33fd85917775f9ed82b1195aa6d56827d41430
2024-06-28T13:24:55.275Z [INFO] copied key: path=sys/policy/response-wrapping
2024-06-28T13:24:55.275Z [INFO] copied key: path=sys/policy/control-group
2024-06-28T13:24:55.275Z [INFO] copied key: path=sys/policy/default
Success! All of the keys have been migrated.
-
Make changes in the Vault config file to replace the storage backend from Consul to Raft and use
auto-join
to let nodes identify and join each other usinggo-discover
library.
[root@ip-172-31-17-225 multiple_active_node_issue]# cat vault_config.hcl
storage "raft" {
path = "/etc/vault/raft/"
node_id = "vault-1"
retry_join {
auto_join = "provider=aws region=ap-southeast-1 tag_key=Name tag_value=vault"
auto_join_scheme = "http"
}
}
#storage "consul" {
#address = "172.31.17.225:8500"
#path = "vault/"
#}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_disable = 1
}
seal "awskms" {
region = "ap-southeast-1"
kms_key_id = "89659ef2-d21b-4f93-b8e8-75df5e314573"
}
api_addr = "http://172.31.17.225:8200"
cluster_addr = "http://172.31.17.225:8201"
disable_mlock = true
-
Stop & restart the Vault agent to load these changes.
[root@ip-172-31-17-225 multiple_active_node_issue]# vault server -config=./vault_config.hcl
==> Vault server configuration:
Administrative Namespace:
Api Address: http://172.31.17.225:8200
Cgo: disabled
Cluster Address: https://172.31.17.225:8201
Environment Variables: BASH_FUNC_which%%, GODEBUG, HISTSIZE, HOME, HOSTNAME, LANG, LESSOPEN, LOGNAME, LS_COLORS, MAIL, OLDPWD, PATH, PWD, SHELL, SHLVL, SUDO_COMMAND, SUDO_GID, SUDO_UID, SUDO_USER, SYSTEMD_COLORS, S_COLORS, TERM, USER, _, which_declare
Go Version: go1.21.5
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level:
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.15.5, built 2024-01-26T14:53:40Z
Version Sha: 0d8b67ef63815f20421c11fe9152d435af3403e6
==> Vault server started! Log data will stream in below:
2024-06-28T13:27:15.479Z [INFO] proxy environment: http_proxy="" https_proxy="" no_proxy=""
-
Check the "vault status" output to verify that the storage backend has been changed to
Raft
now.
[root@ip-172-31-17-225 multiple_active_node_issue]# vault status
Key Value
--- -----
Seal Type awskms
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.15.5
Build Date 2024-01-26T14:53:40Z
Storage Type raft
Cluster Name vault-cluster-7c302d46
Cluster ID 83898ee0-4d32-463c-663f-10bd9b13f10e
HA Enabled true
HA Cluster https://172.31.17.225:8201
HA Mode active
Active Since 2024-06-28T13:27:20.721214948Z
Raft Committed Index 56
Raft Applied Index 56
On "vault-2" node:-
If we would be running migrate.hcl
file on both this node as well, then it would re-trigger the election, which makes the standby node as active
. (refer to the logs below)
# vault operator migrate -config=migrate.hcl
2024-06-28T06:28:58.261Z [WARN] appending trailing forward slash to path
2024-06-28T06:28:58.276Z [INFO] creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault-2\", NotifyCh:(chan<- bool)(0xc000127f10), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.intLogger)(0xc002e90c80), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2024-06-28T06:28:58.278Z [INFO] initial configuration: index=1 servers="[{Suffrage:Voter ID:vault-2 Address:172.31.31.122:8201}]"
2024-06-28T06:28:58.278Z [INFO] entering follower state: follower="Node at vault-2 [Follower]" leader-address= leader-id=
2024-06-28T06:29:04.285Z [WARN] heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2024-06-28T06:29:04.286Z [INFO] entering candidate state: node="Node at vault-2 [Candidate]" term=2
2024-06-28T06:29:04.291Z [INFO] election won: term=2 tally=1
2024-06-28T06:29:04.291Z [INFO] entering leader state: leader="Node at vault-2 [Leader]"
2024-06-28T06:29:04.318Z [INFO] copied key: path=core/leader/4883e328-f5f6-660b-2fe4-e19bdf5c645b
2024-06-28T06:29:04.319Z [INFO] copied key: path=core/audit
2024-06-28T06:29:04.319Z [INFO] copied key: path=core/auth
...
In the Vault config file, if we change the storage stanza to Raft
and then restart the agent, we will see that this node will be active
as well.
[root@ip-172-31-31-122 vault]# vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
vault-2 172.31.31.122:8201 leader true
[root@ip-172-31-31-122 vault]# vault status
Key Value
--- -----
Seal Type awskms
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.15.5
Build Date 2024-01-26T14:53:40Z
Storage Type raft
Cluster Name vault-cluster-736b08f7
Cluster ID 0e3d2fce-6b2b-1251-aa81-aaeaf4553421
HA Enabled true
HA Cluster https://172.31.31.122:8201
HA Mode active
Active Since 2024-06-28T05:48:18.792686329Z
Raft Committed Index 62
Raft Applied Index 62
The solution to avoid such cases:
We could avoid such an issue by ensuring the following steps are executed while performing storage migration.
-
With Vault nodes having storage backend as consul, where two vault nodes are in
active
andstandby
mode. -
On the
active
node, runmigrate.hcl
file to change the storage backend fromConsul
toRaft
-
Make changes in the Vault configuration file, change the storage backend from
Consul
toRaft
, then take a restart of the vault agent on the active node. -
On the
standby
node, simply make changes in the Vault configuration file to change the storage backend fromconsul
toraft
and then restart the standby node.
Note: Make sure that we don't need to run
migrate.hcl
file on both active and standby node. It would be required to run only on active node.
So, now on vault-2
if we simply make changes to the Vault configuration file to change the storage backend to raft
without performing migrate.hcl
step, followed by a restart of the Vault agent, then we will see that node will join the vault-1
as standby
.
[root@ip-172-31-31-122 vault]# vault status
Key Value
--- -----
Seal Type awskms
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 5
Threshold 3
Version 1.15.5
Build Date 2024-01-26T14:53:40Z
Storage Type raft
Cluster Name vault-cluster-7c302d46
Cluster ID 83898ee0-4d32-463c-663f-10bd9b13f10e
HA Enabled true
HA Cluster https://172.31.17.225:8201
HA Mode standby
Active Node Address http://172.31.17.225:8200
Raft Committed Index 61
Raft Applied Index 61
# vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
vault-1 172.31.17.225:8201 leader true
vault-2 172.31.31.122:8201 follower true