The reindex process in Vault follows a specific sequence of steps, which can provide an indication of how long it might take to complete. These steps are as follows:
-
Cache the Last Seen WAL Index:
The algorithm begins by caching the last seen Write-Ahead Log (WAL) index for the tree. -
Copy Filtered Hashes:
Hashes of filtered data are copied to the newly created Merkle tree. -
Scan All Stored Data:
Vault scans all data in storage to prepare for rebuilding the tree. -
Build a New Tree:
A new Merkle tree is constructed based on the scanned data. -
Replay Initial WALs:
WAL entries from the cached index (step 1) up to the latest WAL are replayed to ensure the new tree is up-to-date. -
Lock the Tree:
The tree is locked to prevent changes during the final synchronization. -
Replay Remaining WALs:
WAL entries from the end of the previous replay to the latest WAL are replayed to account for any changes that occurred during earlier steps. -
Tree Diff and Replacement:
The new tree is compared with the existing tree, and any differing pages are replaced. -
Flush Dirty Pages to Disk:
All modified pages in the new tree are written to disk to finalize the changes. -
Unlock the Tree:
The tree is unlocked, marking the completion of the reindex process.
Service Availability During Reindexing
- Steps 1 to 6: Vault continues to serve requests during these steps without interruption.
- Steps 6 to 10: A brief service interruption occurs, primarily during the synchronization and locking phases. This is especially relevant when syncing to other clusters.