Overview:
HashiCorp Vault's performance replication can break if there are conflicts between global and local mount names. This document outlines the problem, provides a diagnosis, and offers a solution for resolving replication issues due to these naming conflicts.
Problem:
When the same path is used for global mounts in the performance primary as a local mount path in the performance secondary, the replication process breaks. This typically results in errors indicating that a local (non-replicated) mount already exists on the Vault cluster.
Error Log:
2024-07-17T10:45:17.314Z [ERROR] core: a local (non-replicated) mount already exists on this Vault cluster; replication is halting until the mount is removed or remounted elsewhere: path=kv_local/
2024-07-17T10:45:17.315Z [ERROR] replication: failed to invalidate key, suspending replication: key=core/mounts error="local (non-replicated) mount already exists at kv_local/"
2024-07-17T10:45:17.315Z [ERROR] replication: encountered error, applying backoff: backoff=2s error="local (non-replicated) mount already exists at kv_local/"
2024-07-17T10:45:17.315Z [WARN] core: replication fsm error channel fired
2024-07-17T10:45:17.319Z [INFO] core: error from wal receive, exiting: error="rpc error: code = Canceled desc = context canceled"
2024-07-17T10:45:17.328Z [INFO] core: stopping replication
2024-07-17T10:45:17.328Z [INFO] core: closed sync connection
2024-07-17T10:45:17.329Z [INFO] core: replication stopped
Status when replication is working as expected:
Performance Primary:
- State: Running
- Last WAL Entry: 749
-
Connected Secondaries:
pr-1
connected - URL: http://192.168.64.16:8200
Performance Secondary:
-
State:
stream-wals
- Connection State: Ready
- Last Remote WAL: 749
- Primary Cluster Address: https://192.168.64.11:8201
- Secondary URL: http://192.168.64.16:8200
- Attachment for reference:
Status after replication breaks:
- Connection Status: Disconnected
- DR Primary: Not set up
- Performance Secondary: Idle
-
Attachment for reference:
Conflicts with the mount path in performance primary and performance secondary:
Attachment for reference:
Solution:
To resolve this issue, follow these steps:
-
Remove the Conflicting Global Mount:
- Access the performance primary cluster.
- Identify and remove the global mount path that conflicts with the local mount path in the performance secondary.
-
Use the Recovery Option:
- Navigate to the performance secondary.
- Use the recovery option to re-establish the replication link after the conflicting mount has been removed.
-
Verify Connection:
- Ensure that the replication status shows connected on both primary and secondary interfaces.
- Check that the WAL indices are in sync and that replication is functioning as expected.
By addressing the mount path conflict, you can restore performance replication between the primary and secondary clusters in Vault.
Conclusion:
When setting up performance replication, ensure that global and local mount paths do not conflict. Regular monitoring and using unique paths can prevent these issues, maintaining seamless replication and data consistency across clusters.