Problem
When using a Terraform Enterprise (TFE) Active/Active cluster, all SAML/SSO login attempts fail and redirect the user back to the login page, creating a continuous loop.
Cause
This issue occurs when the TFE nodes in the cluster cannot share session information correctly. As the load balancer directs traffic between nodes, an unrecognized session token causes the application to repeatedly request a new login.
This is typically caused by mismatched or missing session tokens that were specified during the installation of each node.
Solutions
Follow these solutions to diagnose and resolve the issue.
Solution 1: Diagnose the Session Sharing Issue
To confirm that a session sharing problem is the root cause, you can temporarily deactivate one of the nodes in the cluster.
- Log in to one of the TFE nodes via the command-line interface.
-
Run the following command to stop the TFE application on that node. This will place the node in a deactivated state.
$ replicatedctl app stop
- After the command completes, attempt to log in using SAML again. If the login is successful, it confirms that the issue is related to session information sharing between the TFE nodes.
Solution 2: Reinstall Nodes with Matching Configuration
To permanently resolve the issue, you must ensure all nodes are installed with identical configuration values for session management.
Reinstall the TFE nodes, ensuring that the values for enc_password and auth_token_encryption_key are identical across all nodes in the cluster, as outlined in the official documentation.
For detailed instructions, refer to the Active/Active Installation: Add Common Configuration documentation.
Outcome
After reinstalling the nodes with matching configuration tokens, the SAML login loop will be resolved, and users will be able to log in to the Terraform Enterprise cluster successfully.