Introduction
Automated upgrades allow for automatically upgrading a cluster of Vault nodes to a new version as updated nodes join the cluster. This how-to walks through implementing Automated Upgrades in an AWS auto-scaling group.
Expected Outcome
Implement the Vault Enterprise Automated Upgrades feature on a Raft (Integrated Storage) backed Vault cluster in an AWS Auto Scaling Group.
Prerequisites:
-
Read Upgrading Vault
- A Vault cluster running Vault Enterprise 1.11.0 or later, in an AWS Auto Scaling Group.
- Said cluster should already have the necessary configurations to successfully scale while maintaining quorum, i.e.
retry_join / auto_join
. - Knowledge of configuring and managing a Vault cluster.
- Basic knowledge of AWS Auto Scaling Groups
Use Case
I want to take advantage of Vault Enterprise Automated Upgrades for my Vault cluster running in an AWS ASG.
Procedure
Before you begin:
It's always a good idea to try to ensure that the upgrade will be successful in your environment. The ideal way to do this is to take a snapshot of your data and load it into a test cluster. However, if you are issuing secrets to third party resources (cloud credentials, database credentials, etc.) ensure that you do not allow external network connectivity during testing, in case credentials expire. This prevents the test cluster from trying to revoke these resources along with the non-test cluster.
1. Existing "old version" Vault nodes should be running in the ASG.
2. Update the ASG to deploy new nodes with the desired version of Vault (e.g. userdata script, custom AMI).
3. Scale out the ASG to at least double the current number of nodes. For example, if 3 nodes are deployed, scale out to 6.
(The Autopilot subsystem within Vault needs the number of nodes running the newer Vault version to equal or exceed the number of pre-existing nodes in order to begin its process of promoting new nodes to voters.)
4. Via the Vault CLI, monitor the Auto Pilot status until it reaches await server
removal
.
Example command: watch -n 0.5 'curl -H "X-Vault-Token: $VAULT_TOKEN"
$VAULT_ADDR/v1/sys/storage/raft/autopilot/state | jq -r ".data.upgrade_info"'
The output from the above watch
command will look like this:
{
"other_version_non_voters": [
"vault_2",
"vault_3",
"vault_4"
],
"status": "await-server-removal",
"target_version": "1.12.0.1",
"target_version_voters": [
"vault_5",
"vault_6",
"vault_7"
]
}
(Note that your node-id's will be different.)
4. Remove old versions (other_version_non_voters
) from raft peers. Runvault operator remove-peer <node id>
for each node to be removed.
5. Add scale in protection to new version nodes (target_version_voters
)
6. Scale in ASG to the original number of nodes.
7. Remove scale in protection.
Additional Information:
https://developer.hashicorp.com/vault/docs/enterprise/automated-upgrades
https://developer.hashicorp.com/vault/tutorials/raft/raft-upgrade-automation