Problem
This article introduces ways to troubleshoot Vault auto snapshot Issues.
For configuring auto snapshots in Vault, please refer to the article for GCP or AWS S3 or our official docs.
Prerequisites
- Vault Enterprise
- Vault Automated Snapshots
Overview of possible solutions
Solutions:
- First, do a current list on Automated Snapshot configuration and see what is configured first:
vault list sys/storage/raft/snapshot-auto/config
-
If the list returns something, this means there are existing snapshots already configured. They might not be working and are in bad status. You can check the status by running the following command (Please replace <key> from the list returned above. ):
vault read sys/storage/raft/snapshot-auto/status/<key>
-
You will see information such as below:
vault read sys/storage/raft/snapshot-auto/status/gcs
Key Value
--- -----
consecutive_errors 0
last_snapshot_end 2022-05-13T00:21:59Z
last_snapshot_error n/a
last_snapshot_start 2022-05-13T00:21:58Z
last_snapshot_url https://storage.googleapis.com/test-vault/testvault-snapshot-1652401318922871403.snap
next_snapshot_start 2022-05-13T00:41:59Z
snapshot_start 2022-05-13T00:21:58Z
snapshot_url https://storage.googleapis.com/test-vault/testvault-snapshot-1652401318922871403.snap
-
These are the information to look for, especially checking when the last snapshot ended, where it is stored (url), whether there is an error, and when does the next snapshot starts.
- If the existing snapshot runs into an error and if you wish to delete it and reconfigure, please run the following command to delete:
vault delete sys/storage/raft/snapshot-auto/config/<key>
- The Vault Operational Logs in trace mode also shows the snapshot activities such as below:
2022-05-13T00:19:58.752Z [TRACE] core.snapshotmgr.gcs: starting snapshot runner: name=gcs interval=20m0s storage_type=google-gcs last_snapshot_start="" next_scheduled_snapshot=2022-05-13T00:21:58Z
...
2022-05-13T00:21:58.752Z [INFO] core.snapshotmgr.gcs: taking auto snapshot
2022-05-13T00:21:58.922Z [INFO] storage.raft: starting snapshot up to: index=1357
2022-05-13T00:21:58.924Z [INFO] storage.raft: snapshot complete up to: index=1357
2022-05-13T00:21:59.143Z [DEBUG] core.snapshotmgr.gcs: snapshot complete: name=gcs elapsed=390.721446ms size=0
- Note that, if the data inside Vault is huge, please do pay attention to the interval and retain values. Some common issues caused by misconfiguration including configuring an interval that is too less and caused the next snapshot to be scheduled before the last snapshot have a chance to complete, or that the retain value to either too big overfilling the cloud storage.