Leader election in primary cluster breaks replication to secondary clusters.
During a leadership change on a primary cluster, two RPC clients on a secondary cluster (WAL streaming and another activity such as heart beating) can race to authenticate a new token. The non-WAL streaming RPC will authenticate first, however a bug in the WAL stream error handling can cause the new auth token to be wiped. This can cause the primary cluster to still believe that the connection has a token. The end result is a secondary cluster that cannot maintain any replication activities until either:
1) Replication is restarted on either the primary or secondary cluster
2) A leadership change happens on either cluster
There are 3 workarounds for this issue:
1) First is to perform a leadership election in either cluster.
2) Second is perform a POST action to the sys/replication/recovery endpoint if an election does not fix replication between clusters.
3) The third and most involved workaround is to follow the update-primary procedure to re-establish replication between clusters.