Problem
This article introduces ways to troubleshoot Vault auto snapshot Issues.
For configuring auto snapshots in Vault, please refer to the article for GCP or AWS S3 or our official docs.
Prerequisites
- Vault Enterprise
- Vault Automated Snapshots
Overview of possible solutions
Solutions:
- List Automated Snapshot configuration:
vault list sys/storage/raft/snapshot-auto/config
-
From the list of configurations, check the status of the one to troubleshoot by running the following command (Please replace <key> from the list returned above. ):
vault read sys/storage/raft/snapshot-auto/status/<key>
-
You will see information such as below:
vault read sys/storage/raft/snapshot-auto/status/gcs
Key Value
--- -----
consecutive_errors 0
last_snapshot_end 2022-05-13T00:21:59Z
last_snapshot_error n/a
last_snapshot_start 2022-05-13T00:21:58Z
last_snapshot_url https://storage.googleapis.com/test-vault/testvault-snapshot-1652401318922871403.snap
next_snapshot_start 2022-05-13T00:41:59Z
snapshot_start 2022-05-13T00:21:58Z
snapshot_url https://storage.googleapis.com/test-vault/testvault-snapshot-1652401318922871403.snap
-
Look for:
-
When the last snapshot ended
-
Where it is stored (url)
-
Whether there is an error
-
When does the next snapshot start.
-
- If the existing snapshot runs into an error, and you wish to delete it and reconfigure, please run the following command to delete:
vault delete sys/storage/raft/snapshot-auto/config/<key>
- The Vault Operational Logs in trace mode also shows the snapshot activities, such as below:
2022-05-13T00:19:58.752Z [TRACE] core.snapshotmgr.gcs: starting snapshot runner: name=gcs interval=20m0s storage_type=google-gcs last_snapshot_start="" next_scheduled_snapshot=2022-05-13T00:21:58Z
...
...
...
2022-05-13T00:21:58.752Z [INFO] core.snapshotmgr.gcs: taking auto snapshot 2022-05-13T00:21:58.922Z [INFO] storage.raft: starting snapshot up to: index=1357 2022-05-13T00:21:58.924Z [INFO] storage.raft: snapshot complete up to: index=1357
...
...
...
2022-05-13T00:21:59.143Z [DEBUG] core.snapshotmgr.gcs: snapshot complete: name=gcs elapsed=390.721446ms size=0
- Note: If you have a large Vault db, pay attention to the interval and retain values. Some common issues are caused by misconfiguration, including an interval that is too low, resulting in the next snapshot being scheduled before the previous snapshot is able to complete. Another common issue is the retain value is too large, exhausting the available storage.