TFE FDO Troubleshooting Guide – HashiCorp Help Center

Introduction

This guide covers common issues encountered during setup and migration from Replicated to FDO (Flexible Deployment Options), organized by deployment option. It focuses on troubleshooting specific problems rather than full installation procedures. When using this guide alongside installation documentation, carefully review your deployment mode requirements - many issues stem from mixing configuration settings between different modes or including unnecessary settings from example templates. Start with the minimal required configuration for your chosen deployment mode and add settings only as needed.

Common Red Herrings

Multiple database timeouts in logs are often a sign of network/firewall issues rather than actual database problems. Focus troubleshooting on network connectivity before investigating database configuration.
Fluent-bit errors during startup typically self-resolve and do not indicate actual problems unless they persist after system initialization. Wait a few minutes before investigating further.
Initial SSL errors may appear in logs during startup before certificates are fully loaded into the system. These should clear once startup is complete.
Migration lock messages are normal during startup when running multiple pods. The system will handle these automatically unless a lock becomes stuck due to an improper shutdown.

Common Issues Across All Deployments

Immediate Container or Pod Crashes

Symptom: Container crashes before any useful logs are produced, or before any commands can be run.
Resolution Steps: Use this article to allow the container to continue running and gather logs.

Database Connectivity

Symptom: Container stops shortly after startup with database timeout.
Common Error: check failed: name=database duration=1m30.00443164s err="timeout: context deadline exceeded"
Resolution Steps:
1. Test basic connectivity to the database server.
2. Verify port access is available, and that the port number defined in the configuration matches the actual database port.
3. Check security group configurations.
4. Verify database user has appropriate permissions and correct password.

Vault Encryption Errors

Symptom: Vault fails to decrypt configuration.
Common Error: Error reading Vault configuration: failed decrypting unseal key: could not decrypt ciphertext: chacha20poly1305: message authentication failed
Resolution Steps:
1. Check for special characters in encryption password. The $ is a reserved character for Docker.
2. Verify encryption password matches previous configuration.

Certificate Errors

Symptom: SSL/TLS errors or failed health checks, unsuccessful runs.
Resolution Steps:
1. Verify certificate paths match your compose/configuration file.
2. Check certificate chain completeness
3. Ensure certificates are in .pem format.
4. Verify correct order of certificates
5. If you use a certificate issued by a private Certificate Authority, you must provide the certificate for that CA in the Certificate Authority (CA) Bundle section of the installation. This allows services running within Terraform Enterprise to access each other properly.

VCS Integration Errors

Symptom: Unable to connect to VCS provider.
Common Error: You don't have permission to access that OAuth Client or SSL_connect returned=1 errno=0 state=error: unexpected eof

Resolution Steps:
1. Test basic connectivity:

curl -v -L https://<vcs-fqdn>

curl -v https://<tfe_fqdn>/_health_check

2. Verify SSL certificates:

openssl s_client -showcerts -connect VCS_IP:443 </dev/null

openssl s_client -showcerts -connect TFE_IP:443 </dev/null

3. Check firewall and WAF configurations.
4. May require new Application Link from VCS provider.

Log Forwarding Errors

Symptom: Logs not being forwarded.
Common Error: [error] [output:syslog:syslog.1] no upstream connections available
Resolution Steps:
1. Enable syslog UDP on port 514 in /etc/rsyslog.conf
2. Restart syslog service.
3. Verify port is listening.
4. Restart TFE.
5. Additional.

Docker-Specific Issues

Network Configuration

Symptom: Container startup fails with network errors.
Common Error: Error response from daemon: network tfe_terraform_isolation not found
Resolution Steps:
1. Create required network: docker network create tfe_terraform_isolation
2. Verify Docker driver settings

Version Compatibility

Symptom: Container fails to start or exhibits unexpected behavior.
Resolution Steps:
1. Ensure Docker version is compatible
2. Downgrade if using incompatible version.
3. Remove all containers and volumes if downgrading.

Storage Configuration

Symptom: Service fails after modifying Docker root or storage paths.
Resolution Steps:
1. Stop TFE and Docker services before making storage changes.
2. Verify any mounted volumes have correct ownership and permissions.
3. Check all storage paths in compose file match actual system paths.
4. Ensure sufficient disk space is available in new locations.

SystemD Service Issues

Symptom: Service fails to start.
Resolution Steps:
1. Verify Docker binary path in service file (commonly /usr/bin/docker vs /usr/local/bin/docker)
2. Check for hidden characters in compose file.
3. Validate working directory exists and has correct permissions.
4. Ensure certificates paths are correct relative to mounted directories.

Kubernetes-Specific Issues

Frequently used Kubernetes Commands

Resource Constraints

Symptom: Pods fail to start or enter CrashLoopBackOff.
Common Error: error waiting for kubernetes container to start: pod container is not ready: context deadline exceeded
Resolution Steps:
1. Check resource quotas in namespace.
2. Review TFE_CAPACITY_CPU setting (default values may conflict with quotas).
3. Verify memory limits (default 2048M).
4. Check if node has sufficient resources.

Permission Issues

Symptom: Task worker failures.
Common Error: error building kubernetes config: open /var/run/secrets/kubernetes.io/serviceaccount/token: permission denied
Resolution Steps:
1. Review pod security context
2. Verify service account permissions.
3. Review RBAC configurations.
4. Consider helm uninstall/install if permissions are correct.

Networking Issues

Troubleshooting Connectivity to External Services Using Kubernetes