Terraform Enterprise Active/Active Vault Cluster HA Mode is on Standby for both Nodes – HashiCorp Help Center

Problem

Terraform Enterprise Active/Active Vault cluster nodes HA mode shows standby for both nodes and the application fails to start on one node. The Terraform Enterprise Vault container shows the following error:

2023-04-24T22:32:33.684748000Z 2023-04-24T22:32:33.684Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing read tcp 172.22.0.12:36792->10.4.1.1:8201: read: connection reset by peer\""
2023-04-24T22:32:33.685001000Z 2023-04-24T22:32:33.684Z [ERROR] core: forward request error: error="error during forwarding RPC request"

$ docker exec -ti tfe-vault vault status

Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           1
Threshold              1
Version                1.10.3
Storage Type           postgresql
Cluster Name           vault-cluster-3dcccDca8e
Cluster ID             2410f422-1183-7444-d043-abc4n35b1
HA Enabled             true
HA Cluster             https://10.10.10.10:8201
HA Mode                standby

Cause

On occasion, not all nodes in the active/active group can acquire the Vault HA mode leader lock at startup. To resolve this, issue the command vault operator step-down and the Terraform Enterprise nodes will be successful at acquiring the lock on its next attempt.

Solution

Using vault operator step-down will force the Vault node within an HA cluster to step down from active duty. When executed against a non-active node, i.e. a standby or performance standby node, the request will be forwarded to the active node.

##### Perform the same steps on all nodes ######

# Connect to tfe-vault container 
docker exec -it tfe-vault sh

## Step down command
vault operator step-down

# Shutdown the TFE node gracefully. 
tfe-admin node-drain

# Stop the application on both nodes
replicatedctl app stop

# Monitor the status
watch replicatedctl app status

# Start the application 
replicatedctl app start

# Check Vault status
docker exec -it tfe-vault vault status

Outcome

vault status should show HA Mode as active on one node and standby on another.

Problem

Cause

Solution

Articles in this section

Related articles