Introduction:
This KB provides comprehensive troubleshooting for HCP Terraform Operator AgentPool deployment failures and upgrade procedures. It covers verification steps, common errors with solutions, AgentPool lifecycle, and three production upgrade methods with risks and recovery steps
Problem:
- Common symptoms include AgentPool CRs existing but secrets showing DATA: 0
- Pods in CrashLoopBackOff due to invalid TFC_AGENT_TOKEN,
- No healthy pools in TFE UI, and operator logs with "failed to get HCP Terraform client"
- Name is already taken" errors. Production workspaces face blocked runs in agent execution mode
Check with:
kubectl get agentpools -A # CRs exist kubectl get secrets -n <ns> # agentpool-*-agent-pool DATA: 0 or missing kubectl get pods -n <ns> # CrashLoopBackOff (invalid TFC_AGENT_TOKEN)
Prerequisites:
- Requires HCP Terraform Operator installed via Helm,
- Organization token with agent:pools:read/write scope (Owners team recommended),
- TFE/HCP Terraform organization access
Operator Installation:
helm repo add hashicorp https://helm.releases.hashicorp.com helm repo update helm install demo hashicorp/hcp-terraform-operator \ --version 2.10.0 \ --namespace tfc-operator-system \ --create-namespace \ --set operator.tfeAddress=https://your-tfe.example.com
Verification Checks:
Confirm all 6 CRDs: agentpools, agenttokens, modules, projects, runscollectors, workspaces.
kubectl get crds | grep terraform
Operator pods healthy: 2/2 Running.
kubectl get pods -n tfc-operator-system
kubectl logs deploy/demo-hcp-terraform-operator -n tfc-operator-system | head -20
AgentPool Creation:
Create tfc-owner secret with team token in target namespace.kubectl create secret generic tfc-owner -n test-agentpool-prod --from-literal=token='hcp_...'
Apply AgentPool YAML referencing secret; verify token matches TFE UI.kubectl get secret tfc-owner -n test-agentpool-prod -o jsonpath='{.data.token}' | base64 --decode
Check pods: Deployment and ReplicaSet ready, pod Running.kubectl get all -n test-agentpool-prod
Upgrade Methods:
1. Full Reinstall: Cleanest; uninstall/reinstall.helm uninstall hcp-terraform-operator -n tfc-operator-system helm install ... --version 2.10.0 # Updated version
2. Helm Upgrade: Simple for minor versions.helm upgrade --namespace tfc-operator-system hcp-terraform-operator hashicorp/hcp-terraform-operator --version 2.10.0
3. CRD Force Replace
Risky; deletes/replaces specific CRDs, breaking pools until patched.kubectl delete crd agenttokens.app.terraform.io runscollectors.app.terraform.io --ignore-not-found=true kubectl replace -f https://raw.githubusercontent.com/hashicorp/hcp-terraform-operator/main/config/crd/bases/app.terraform.io_agentpools.yaml --force
After we apply the delete and replace commands, This will delete agent token and replace it with the CRD compatible with the version.customresourcedefinition.apiextensions.k8s.io "agentpools.app.terraform.io" deleted
customresourcedefinition.apiextensions.k8s.io/agentpools.app.terraform.io replaced
Now we have to again download the agentoken yaml and runs collector yml using below :kubectl apply -f https://raw.githubusercontent.com/hashicorp/hcp-terraform-operator/main/config/crd/bases/app.terraform.io_agenttokens.yaml kubectl apply -f https://raw.githubusercontent.com/hashicorp/hcp-terraform-operator/main/config/crd/bases/app.terraform.io_runscollectors.yaml
and patch the existing pool ID as below :kubectl get agentpool <agentpoolname>-n test-agentpool-prod -o jsonpath='{.metadata.generation}' -> Get the observedGeneration valuekubectl patch agentpool <agentpoolname> -n test-agentpool-prod --subresource=status --type=merge --patch '{"status":{"agentPoolID":"agent-UPftHEEQxV4dmkK3","observedGeneration":2}}'
If the agent pod is showing as exited after patching, restart the pod usingkubectl get pods -n test-agentpool-prod -o widekubectl delete pod <podname> -n test-agentpool-prod
which will create the new pod with the same configuration.
Outcome:
- Secrets show DATA:1:
kubectl get secrets -A | grep agent-pool. - AgentPools have ID:
kubectl get agentpools -A -o custom-columns=ID:.status.agentPoolID. - Pods running:
kubectl get pods -n test-agentpool-prod - Logs confirm:
kubectl describe agentpool agent-pool-demo2 -n test-agentpool-prod
References:
https://developer.hashicorp.com/terraform/cloud-docs/agents/agents
https://developer.hashicorp.com/terraform/tutorials/kubernetes/kubernetes-operator-v2-agentpool
https://support.hashicorp.com/hc/en-us/articles/39051484823955-How-to-debug-the-HCP-Terraform-Operator
https://support.hashicorp.com/hc/en-us/articles/41240272169875-How-to-connect-Terraform-Operator-for-Kubernetes-to-an-existing-agent-pool