Problem
When a Terraform Enterprise pod on Kubernetes encounters a configuration error, it may enter a crash loop. This rapid cycle of starting and crashing prevents administrators from accessing the container's logs to diagnose the underlying issue, as the pod terminates before a manual connection can be established.
Prerequisites
- A Terraform Enterprise Flexible Deployment Options (FDO) installation on Kubernetes.
Cause
During startup, a configuration error can cause the main Terraform Enterprise process to fail, leading Kubernetes to terminate and restart the pod. While the standard command to view logs is available, it may not capture the critical error messages before the pod crashes.
$ kubectl -n <NAMESPACE> logs terraform-enterprise
A more direct method is to access the pod's shell and view the log files directly. However, this is often impractical in a crash loop scenario because there is insufficient time to execute the commands before the pod terminates.
Access the container's shell.
$ kubectl -n <NAMESPACE> exec -it terraform-enterprise -- bash
Navigate to the log directory and view the relevant log file.
$ cd /var/log/terraform-enterprise $ cat terraform-enterprise.log
Because this manual process is difficult to perform on a crashing pod, a scripted approach is required to capture the logs at the moment of startup.
Solutions
Solution 1: Use a script to automatically capture logs
To capture the logs from a crashing pod, you can run the following shell script in a separate terminal. The script polls the Kubernetes namespace every two seconds, and as soon as it detects a new pod, it immediately connects and begins streaming the specified log file.
Save the following content as a script file, such as get-tfe-logs.sh, and execute it.
#!/bin/bash
## Specify the namespace you want to monitor.
NAMESPACE="terraform-enterprise"
## Specify any pods to ignore.
IGNORED_PODS=("another-pod" "another-pod2" "another-pod3")
## Specify the logfile to tail. The default is terraform-enterprise.log.
LOGFILE="terraform-enterprise.log"
## Function to get the name of the first pod found in the namespace, excluding ignored pods.
get_pod_name() {
PODS=$(kubectl get pods -n "$NAMESPACE" --no-headers -o custom-columns=":metadata.name")
for pod in "${IGNORED_PODS[@]}"; do
PODS=$(echo "$PODS" | grep -v "$pod")
done
echo "$PODS" | head -n 1
}
## Loop until a pod is detected.
while true; do
POD_NAME=$(get_pod_name)
if [ -n "$POD_NAME" ]; then
echo "Pod '$POD_NAME' detected in namespace '$NAMESPACE'. Fetching logs:"
kubectl -n "$NAMESPACE" exec -it "$POD_NAME" -- tail -f /var/log/terraform-enterprise/"$LOGFILE"
break
else
echo "No pods found in namespace '$NAMESPACE' (excluding ignored pods). Retrying in 2 seconds..."
sleep 2
fi
doneOutcome
When the Terraform Enterprise pod starts, the script will immediately connect and display the logs from the specified log file, allowing you to view the startup error messages before the pod crashes.
Pod 'terraform-enterprise-6db8847d65-g4w65' detected in namespace 'terraform-enterprise'. Fetching logs: 2024-10-10T14:07:57.839Z [INFO] terraform-enterprise: connected successfully to terraform_enterprise database ## ... (Error messages will appear here)
Additional Information
- For more information on managing Terraform Enterprise on Kubernetes, please refer to the official documentation.