Introduction
This guide covers common issues encountered during the setup and migration of Terraform Enterprise with Flexible Deployment Options (FDO). The solutions are organized by deployment option to help you troubleshoot specific problems efficiently.
When using this guide, carefully review your deployment mode requirements. Many issues stem from mixing configuration settings between different modes or including unnecessary settings from example templates. Start with the minimal required configuration for your chosen deployment mode and add settings only as needed.
Common Red Herrings
Before deep-diving into troubleshooting, be aware of these common log entries that may not indicate a real problem:
- Database Timeouts: Multiple database timeouts in logs often indicate network or firewall issues rather than database configuration problems. Troubleshoot network connectivity first.
-
Fluent-bit Errors: Errors from
fluent-bitduring startup typically self-resolve and are not problematic unless they persist after the system has fully initialized. - Initial SSL Errors: SSL errors may appear in logs during startup before certificates are fully loaded. These should clear once startup is complete.
- Migration Locks: Migration lock messages are normal during startup when running multiple pods. The system handles these automatically unless a lock becomes stuck due to an improper shutdown.
Common Issues Across All Deployments
Immediate Container or Pod Crashes
Symptom: The container crashes before any useful logs are produced or before any commands can be run.
Resolution Steps:
Refer to the guide on how to start the Terraform Enterprise container without the application to allow the container to continue running so you can gather logs.
Database Connectivity
Symptom: The container stops shortly after startup with a database timeout.
Common Error:
check failed: name=database duration=1m30.00443164s err="timeout: context deadline exceeded"
Resolution Steps:
- Test basic network connectivity to the database server.
- Verify that the port defined in the configuration matches the actual database port and that access is allowed.
- Check firewall and security group configurations.
- Verify the database user has the correct password and appropriate permissions.
Vault Encryption Errors
Symptom: Vault fails to decrypt the configuration.
Common Error:
Error reading Vault configuration: failed decrypting unseal key: could not decrypt ciphertext: chacha20poly1305: message authentication failed
Resolution Steps:
- Check for special characters in the encryption password. The
$character is reserved by Docker and may cause issues. - Verify the encryption password matches the one used in your previous configuration.
Certificate Errors
Symptom: You encounter SSL/TLS errors, failed health checks, or unsuccessful runs.
Resolution Steps:
- Verify the certificate paths in your compose or configuration file are correct.
- Trace and check the SSL certificate chain for completeness.
- Ensure certificates are in
.pemformat. - Verify the correct order of certificates in the SSL certificate file.
- If you use a private Certificate Authority (CA), provide the CA certificate in the installer to allow internal services to communicate properly.
VCS Integration Errors
Symptom: Terraform Enterprise is unable to connect to your Version Control System (VCS) provider.
Common Errors:
You don't have permission to access that OAuth ClientSSL_connect returned=1 errno=0 state=error: unexpected eof
Resolution Steps:
-
Test basic connectivity to your VCS provider and the Terraform Enterprise health check endpoint.
$ curl -v -L https://<vcs-fqdn> $ curl -v https://<tfe_fqdn>/_health_check
-
Verify SSL certificates for both services.
$ openssl s_client -showcerts -connect <VCS_IP>:443 </dev/null $ openssl s_client -showcerts -connect <TFE_IP>:443 </dev/null
- Check firewall and Web Application Firewall (WAF) configurations.
- You may need to create a new Application Link or OAuth application from your VCS provider.
Log Forwarding Errors
Symptom: Logs are not being forwarded to your syslog server.
Common Error:
[error] [output:syslog:syslog.1] no upstream connections available
Resolution Steps:
- Enable syslog UDP on port
514in/etc/rsyslog.confon the syslog server. - Restart the syslog service.
- Verify the port is listening.
- Restart Terraform Enterprise.
Docker-Specific Issues
Network Configuration
Symptom: The container fails to start with network-related errors.
Common Error:
Error response from daemon: network tfe_terraform_isolation not found
Resolution Steps:
-
Create the required Docker network before starting Terraform Enterprise.
$ docker network create tfe_terraform_isolation
- Verify your Docker driver settings are configured correctly.
Version Compatibility
Symptom: The container fails to start or exhibits unexpected behavior.
Resolution Steps:
- Ensure your Docker version is compatible with your version of Terraform Enterprise.
- If you must downgrade Docker, remove all Terraform Enterprise containers and volumes first.
Storage Configuration
Symptom: The service fails after you modify Docker's root directory or storage paths.
Resolution Steps:
- Stop the Terraform Enterprise and Docker services before making storage changes.
- Verify any mounted volumes have the correct ownership and permissions.
- Check that all storage paths in the compose file match the actual system paths.
- Ensure sufficient disk space is available in the new locations.
SystemD Service Issues
Symptom: The systemd service for Terraform Enterprise fails to start.
Resolution Steps:
- Verify the Docker binary path in the service file is correct (e.g.,
/usr/bin/dockervs./usr/local/bin/docker). - Check for hidden or non-standard characters in the compose file.
- Validate the working directory exists and has the correct permissions.
- Ensure certificate paths are correct relative to any mounted directories.
Kubernetes-Specific Issues
For a list of common commands, see Frequently Used Kubectl Commands for Terraform Enterprise on Kubernetes.
Resource Constraints
Symptom: Pods fail to start or enter a CrashLoopBackOff state.
Common Error:
error waiting for kubernetes container to start: pod container is not ready: context deadline exceeded
Resolution Steps:
- Check for resource quotas in the namespace.
- Review the
TFE_CAPACITY_CPUsetting, as default values may conflict with quotas. - Verify memory limits (default is
2048M). - Ensure the node has sufficient resources to schedule the pod.
Permission Issues
Symptom: Task worker pods fail with permission errors.
Common Error:
error building kubernetes config: open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied
Resolution Steps:
- Review the pod security context configuration.
- Verify the service account has the necessary permissions.
- Review your RBAC configurations.
- If permissions appear correct but issues persist, consider a
helm uninstallandhelm installto reset the configuration.
Networking Issues
For detailed guidance on network troubleshooting in Kubernetes, refer to the guide on Troubleshooting Connectivity to External Services in Terraform Enterprise.