If Vault is hosted via a cloud provider, auto_join
can be used to find nodes via tags. This article will help troubleshoot the Vault auto_join
parameter within the retry_join
stanza, specific to Google Cloud.
When using Vault with the Integrated Storage (Raft) backend, Vault's data is persistent. Data will be replicated across all nodes via the Raft Consensus Algorithm. Within the Raft storage stanza of the Vault configuration a retry_join
stanza can be added to automatically find other nodes in the cluster. When the Raft cluster is getting bootstrapped, if the connection details of all the nodes are known beforehand, then specifying this config stanzas enables the nodes to automatically join a Raft cluster. All the nodes would mention all other nodes that they could join using this config. When one of the nodes is initialized, it becomes the leader and all the other nodes will join the leader node to form the cluster. Instead of using a specific leader_api_addr
within the retry_join
stanza, we can use auto_join
for cloud auto-join configurations. This uses the go-discover syntax.
project_name: The name of the project. discovered if not set
tag_value: The tag value for filtering instances
zone_pattern: A RE2 regular expression for filtering zones, e.g. us-west1-.*, or us-(?west|east).*
credentials_file: The path to the credentials file. See below for more details
The credentials for a GCE Service Account are required and are searched in the following locations:
1. Use credentials from "credentials_file", if provided.
2. Use JSON file from GOOGLE_APPLICATION_CREDENTIALS environment variable.
3. Use JSON file in a location known to the gcloud command-line tool. On Windows, this is %APPDATA%/gcloud/application_default_credentials.json. On other systems, $HOME/.config/gcloud/application_default_credentials.json.
4. On Google Compute Engine, use credentials from the metadata server. In this final case any provided scopes are ignored.
retry_join
stanza within the storage
stanza in use:storage "raft" {
path = "/opt/vault/data"
node_id = "<Raft-Node-ID>"
retry_join {
auto_join = "provider=gce project_name=<GCP-Project-ID> tag_value=usc1-raft-prod zone_pattern=us-central1-.*"
}
}
Troubleshooting
- Verify the correct tags are set to the instance(s) in question. We can do this by running the
gcloud
command$ gcloud compute instances describe <InstanceName>
.
tags
section we should see the tag used within the auto_join
parameter: $ gcloud compute instances describe prod-instance-1
...
tags:
fingerprint: qQ7o42GRuLE=
items:
- usc1-raft-prod
- If the
tag_value
set does not discover any nodes, the Vault operational logs will show which regions are being searched, but are not showing any instance IPs found.
discover-gce: Zone "us-central1-*"
log entries are not returning any instance IPs:2022-10-12T01:29:43.129Z [INFO] core: security barrier not initialized
2022-10-12T01:29:43.129Z [INFO] core: [DEBUG] discover: Using provider "gce"
2022-10-12T01:29:43.129Z [INFO] core: [INFO] discover-gce: Project name is "hc-d09fcbe9e5aa48fa83649eb111f"
2022-10-12T01:29:43.129Z [INFO] core: [INFO] discover-gce: Looking up zones matching us-central1-.*
2022-10-12T01:29:43.488Z [INFO] core: [INFO] discover-gce: Found zones [us-central1-c us-central1-a us-central1-f us-central1-b]
2022-10-12T01:29:43.556Z [INFO] core: [INFO] discover-gce: Zone "us-central1-c" has []
2022-10-12T01:29:43.629Z [INFO] core: [INFO] discover-gce: Zone "us-central1-a" has []
2022-10-12T01:29:43.682Z [INFO] core: [INFO] discover-gce: Zone "us-central1-f" has []
2022-10-12T01:29:43.746Z [INFO] core: [INFO] discover-gce: Zone "us-central1-b" has []
2022-10-12T01:29:43.746Z [ERROR] core: failed to retry join raft cluster: retry=2s
- If the service account in use does not have the correct permissions to search for instances within the project, zone, or tag. Vault will log a
403
as such:
[ERROR] core: error in retry_join stanza, will not use it for raft join:
error=
| failed to parse addresses from auto-join metadata: discover-gce: googleapi: Error 403: Request had insufficient authentication scopes.
| Details:
| [
| {
| "@type": "type.googleapis.com/google.rpc.ErrorInfo",
| "domain": "googleapis.com",
| "metadatas": {
| "method": "compute.v1.ZonesService.List",
| "service": "compute.googleapis.com"
| },
| "reason": "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
| }
| ]
|
| More details:
| Reason: insufficientPermissions, Message: Insufficient Permission
- Here is an example of the Vault operational logs when instance(s) are found based on the
auto_join
configured.
discover-gce: Zone "us-central1-*"
log entries are now returning instance IPs with the tag
specified, and then joining them to the cluster:2022-10-12T01:29:58.648Z [INFO] core: security barrier not initialized
2022-10-12T01:29:58.648Z [INFO] core: [DEBUG] discover: Using provider "gce"
2022-10-12T01:29:58.648Z [INFO] core: [INFO] discover-gce: Project name is "hc-d09fcbe9e5aa48fa83649eb111f"
2022-10-12T01:29:58.648Z [INFO] core: [INFO] discover-gce: Looking up zones matching us-central1-.*
2022-10-12T01:29:59.072Z [INFO] core: [INFO] discover-gce: Found zones [us-central1-c us-central1-a us-central1-f us-central1-b]
2022-10-12T01:29:59.130Z [INFO] core: [INFO] discover-gce: Zone "us-central1-c" has []
2022-10-12T01:29:59.228Z [INFO] core: [INFO] discover-gce: Zone "us-central1-a" has [10.128.0.2]
2022-10-12T01:29:59.329Z [INFO] core: [INFO] discover-gce: Zone "us-central1-f" has []
2022-10-12T01:29:59.383Z [INFO] core: [INFO] discover-gce: Zone "us-central1-b" has []
2022-10-12T01:29:59.383Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://10.128.0.2:8200
2022-10-12T01:29:59.385Z [ERROR] core: failed to retry join raft cluster: retry=2s
2022-10-12T01:30:01.385Z [INFO] core: security barrier not initialized
2022-10-12T01:30:01.385Z [INFO] core: [DEBUG] discover: Using provider "gce"
2022-10-12T01:30:01.386Z [INFO] core: [INFO] discover-gce: Project name is "hc-d09fcbe9e5aa48fa83649eb111f"
2022-10-12T01:30:01.386Z [INFO] core: [INFO] discover-gce: Looking up zones matching us-central1-.*
2022-10-12T01:30:01.907Z [INFO] core: [INFO] discover-gce: Found zones [us-central1-c us-central1-a us-central1-f us-central1-b]
2022-10-12T01:30:01.962Z [INFO] core: [INFO] discover-gce: Zone "us-central1-c" has []
2022-10-12T01:30:02.073Z [INFO] core: [INFO] discover-gce: Zone "us-central1-a" has [10.128.0.210.128.0.3]
2022-10-12T01:30:02.173Z [INFO] core: [INFO] discover-gce: Zone "us-central1-f" has []
2022-10-12T01:30:02.243Z [INFO] core: [INFO] discover-gce: Zone "us-central1-b" has []
2022-10-12T01:30:02.243Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://10.128.0.2:8200
2022-10-12T01:30:02.243Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://10.128.0.3:8200
2022-10-12T01:30:02.245Z [ERROR] core: failed to retry join raft cluster: retry=2s
2022-10-12T01:30:02.247Z [ERROR] core: failed to get raft challenge: leader_addr=http://10.128.0.3:8200
error=
| error during raft bootstrap init call: Error making API request.
|
| URL: PUT http://10.128.0.3:8200/v1/sys/storage/raft/bootstrap/challenge
| Code: 503. Errors:
|
| * Vault is sealed