The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Introduction
consul-snapshot-agent
runs as a container (>=1.14.x) in server.snapshotAgent
on a Kubernetes (K8s) cluster, and in Consul Version (<=1.13.x) where consul-snapshot-agent
runs as a container in client.snapshotAgent
.Expected Outcome
consul-snapshot-agent
running on a consul-k8s cluster should take the snapshot at certain intervals and store the snapshot in a specific GCP bucket while it retains n
number of snapshots at any point in time.Prerequisites
- Install
kind
,helm
andkubectl
- Create a consul-k8s cluster using any Consul version
- As highlighted earlier in the
Introduction
section, please make sure to define theconsul-snapshot-agent
configuration in accordance with the helm chart version
- As highlighted earlier in the
- Ensure that a
service-account
is created on GCP with enough privileges to access the Cloud Storage Bucket (read
orwrite
)- In our example, we will create a
service-account
with theStorage Admin
role.
- In our example, we will create a
Procedure & Configuration
- Create a
kind
cluster locally with the followingkind_config.yaml
file to create 1 control-plane node and 3 worker nodes
kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker - role: worker - role: worker
$ kind create cluster --name snapshot-gcp --config kind-config.yaml --image kindest/node:v1.26.6
- Use
snapshot-agent-config
below to create a K8s secretconsul-snapshot-config
which we are going to refer to later in the helm chart `values.yaml` file underserver.snapshotAgent.secretName
andserver.snapshotAgent.secretKey
{ "snapshot_agent": { "log": { "level": "TRACE", "enable_syslog": false, "syslog_facility": "LOCAL0" }, "snapshot": { "interval": "5m", "retain": 5, "stale": false, "service": "consul-snapshot", "deregister_after": "30m", "lock_key": "consul-snapshot/lock", "max_failures": 3 }, "google_storage": { "bucket": "snapshot-bucket-112233" }, "http_addr": "127.0.0.1:8500", "datacenter": "dc1" } }
$ kubectl create secret generic consul-snapshot-config --from-file='config=snapshot_agent_config'
- Create a
Service Account
on GCP Portal underIAM & Admin
and passStorage Admin
role to the service-account to allow full permission to the storage bucket - Once created, use the downloaded
Service Account
credentials file to create a K8s secret, which we are going to mount to containers in the path/consul/userconfig/<secretName>/<secretKey>
under:-
- For Consul version >=1.14.x -
server.snapshotAgent.extraVolumes
-
For Consul Version <=1.13.x -
client.snapshotAgent.extraVolumes
- For Consul version >=1.14.x -
- Used in this example
river-sky-405814-54c342cb8bd3.json
$ cat river-sky-405814-54c342cb8bd3.json { "type": "service_account", "project_id": "river-sky-405814", "private_key_id": "--REDACTED--", "private_key": "--REDACTED--", "client_email": "--REDACTED--", "client_id": "--REDACTED--", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/snapshot-sa%40river-sky-405814.iam.gserviceaccount.com", "universe_domain": "googleapis.com" }
$ kubectl create secret generic snapshot-adc-sa --from-file='creds=river-sky-405814-54c342cb8bd3.json'
-
- For the Google Cloud storage option, we need to set the environment variable
GOOGLE_APPLICATION_CREDENTIALS
with credentials to authorize it to the GCP bucket- In the
values.yaml
file set the same env variable throughserver.extraEnvironmentVars
and pass the reference of the K8s secretsnapshot-adc-sa
in container locally under the path/consul/userconfig/snapshot-adc-sa/creds
- In the
- Once the K8s secrets are created, use the
values.yaml
below to install the helm chart
global: metrics: enabled: true name: consul image: 'hashicorp/consul-enterprise:1.16.1-ent' enterpriseLicense: secretName: 'consul-ent-license' secretKey: 'key' enableLicenseAutoload: true server: enabled: true replicas: 3 extraVolumes: - type : secret name : snapshot-adc-sa load : true extraEnvironmentVars: GOOGLE_APPLICATION_CREDENTIALS: "/consul/userconfig/snapshot-adc-sa/creds" snapshotAgent: enabled: true interval: 5m configSecret: secretName: consul-snapshot-config secretKey: config connectInject: enabled: true
helm install consul hashicorp/consul -n consul --values values.yaml --version 1.2.1 --wait --debug
- Under
consul-snapshot-agent
container logs, we can see that the container could not find the default credentials
% kubectl logs consul-server-0 -c consul-snapshot-agent -f ==> Consul snapshot agent running! Version: 1.16.1+ent Datacenter: "dc1" Interval: "5m0s" Retain: 5 Stale: false Local Scratch: /tmp Mode: daemon Service: "consul-snapshot" Deregister After: "30m0s" Lock Key: "consul-snapshot/lock" Max Failures: 3 Snapshot Storage: Google Cloud Storage -> Bucket: "snapshot-bucket-112233" ==> Log data will now stream in as it occurs: 2023-11-28T06:24:20.256Z [INFO] snapshot: Waiting to obtain leadership... 2023-11-28T06:24:20.261Z [INFO] snapshot: Obtained leadership 2023-11-28T06:24:20.456Z [DEBUG] snapshot: Taking a snapshot... 2023-11-28T06:24:20.756Z [ERROR] snapshot: Snapshot failed (will retry at next interval): error="dialing: google: could not find default credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information."
- The reason for the above failure is in the helm chart itself under
templates/server-statefulset.yaml
where it loadsserver.extraVolumes
andserver.extraEnvironmentVars
to only theconsul
container but not to theconsul-snapshot-agent
container
% helm template consul hashicorp/consul --values values.yaml --version 1.2.1 -s templates/server-statefulset.yaml | yq '.spec.template.spec.containers[] | select (.name=="consul") | .env' - name: ADVERTISE_IP valueFrom: fieldRef: fieldPath: status.podIP - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP - name: CONSUL_DISABLE_PERM_MGMT value: "true" - name: CONSUL_LICENSE_PATH value: /consul/license/key - name: GOOGLE_APPLICATION_CREDENTIALS value: "/consul/userconfig/snapshot-adc-sa/creds" % helm template consul hashicorp/consul --values values.yaml --version 1.2.1 -s templates/server-statefulset.yaml | yq '.spec.template.spec.containers[] | select (.name=="consul-snapshot-agent") | .env' - name: CONSUL_HTTP_ADDR value: http://127.0.0.1:8500 - name: CONSUL_LICENSE_PATH value: /consul/license/key
- The reason for the above failure is in the helm chart itself under
- As a workaround, manually edit
consul-server
StatefulSets by running the commandkubectl edit statefulsets/consul-server -n <consul_namespace>
to add theenv
variableGOOGLE_APPLICATION_CREDENTIALS
and itsvolumeMounts
to theconsul-snapshot-agent
container
... - command: - /bin/sh - -ec - | exec /bin/consul snapshot agent \ -interval=5m \ -config-dir=/consul/user-config \ env: - name: CONSUL_HTTP_ADDR value: http://127.0.0.1:8500 - name: CONSUL_LICENSE_PATH value: /consul/license/key - name: GOOGLE_APPLICATION_CREDENTIALS value: /consul/userconfig/snapshot-adc-sa/creds image: hashicorp/consul-enterprise:1.16.1-ent imagePullPolicy: IfNotPresent name: consul-snapshot-agen ... volumeMounts: - mountPath: /consul/user-config name: snapshot-agent-user-config readOnly: true - mountPath: /consul/license name: consul-license readOnly: true - mountPath: /consul/userconfig/snapshot-adc-sa name: userconfig-snapshot-adc-sa readOnly: true dnsPolicy: ClusterFirst ...
- The above changes would fix
consul-snapshot-agent
to load the env variable with the right credential file.
% kubectl logs -f consul-server-0 -c consul-snapshot-agent ==> Consul snapshot agent running! Version: 1.16.1+ent Datacenter: "dc1" Interval: "5m0s" Retain: 5 Stale: false Local Scratch: /tmp Mode: daemon Service: "consul-snapshot" Deregister After: "30m0s" Lock Key: "consul-snapshot/lock" Max Failures: 3 Snapshot Storage: Google Cloud Storage -> Bucket: "https://storage.googleapis.com/snapshot-bucket-112233/" ==> Log data will now stream in as it occurs: 2023-11-24T20:08:51.949Z [INFO] snapshot: Waiting to obtain leadership... 2023-11-24T20:08:52.151Z [INFO] snapshot: Obtained leadership 2023-11-24T20:08:52.165Z [DEBUG] snapshot: Taking a snapshot... 2023-11-24T20:08:56.139Z [INFO] snapshot: Saved snapshot: id=1700737733384006596 2023-11-24T20:08:56.660Z [DEBUG] snapshot: Rotated snapshots: number_deleted=0 2023-11-24T20:13:56.664Z [DEBUG] snapshot: Taking a snapshot... 2023-11-24T20:13:57.939Z [INFO] snapshot: Saved snapshot: id=170073804589252534 2023-11-24T20:13:58.554Z [DEBUG] snapshot: Rotated snapshots: number_deleted=0
- The above changes would fix
Conclusion
Our Engineering team is working on a plan to provide a fix in an upcoming release of consul-k8s
which let us set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` for GCP Cloud storage in the consul-snapshot-agent
container through server.extraVolumes
and server.extraEnvironmentVars
.