Introduction
When deploying and operating the HCP Terraform Operator for Kubernetes, you may encounter issues where the operator fails to provision Custom Resource Definitions (CRDs) as expected. This article explains how to troubleshoot these issues by inspecting operator logs and comparing the results with direct API calls to your Terraform Enterprise instance.
Prerequisites
- Terraform Enterprise
- HCP Terraform Operator
Procedure
1. Identify the HCP Terraform Operator pod
First, list the pods in the deployment namespace:
~ kubectl -n tfe get pod
NAME READY STATUS RESTARTS AGE
terraform-enterprise-cfb844565-5qz4z 1/1 Running 0 3h22m
tfc-operator-hcp-terraform-operator-dfb5d7c64-kq78n 2/2 Running 0 2m
The pod name is prefixed with the Helm release name, which in this example is tfc-operator
.
2. Verify the API Endpoint in HCP Terraform Operator
By default, the operator communicates with the HCP Terraform endpoint at app.terraform.io
. When setting this up on a self-managed Terraform Enterprise, override this default value operator.tfeAddress
with the Terraform Enterprise hostname either in the helm install
command or override.yaml.
You can verify the override value by running:
~ helm get values tfc-operator -n tfe
USER-SUPPLIED VALUES:
operator:
tfeAddress: https://example.terraform.com
replicaCount: 1
In this example, the operator is connecting to example.terraform.com
. If no such override value is returned, the operator will default to app.terraform.io
.
3. Check the operator pods logs
Inspecting the operator logs is important as the operator pod processes the CRDs and communicates changed with the configured API endpoint. The pod logs will indicate whether the process succeeded or failed.
If the logs indicate a permission error when creating an agent pool, you might see something like:
~ kubectl -n tfe logs --tail 6 pod/tfc-operator-hcp-terraform-operator-97f589466-lh9jx
Defaulted container "manager" out of: manager, kube-rbac-proxy
2025-03-03T02:43:47Z INFO Spec Validation {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "validating instance object spec"}
2025-03-03T02:43:47Z INFO Spec Validation {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "spec is valid"}
2025-03-03T02:43:47Z INFO Reconcile Agent Pool {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "reconciling agent pool"}
2025-03-03T02:43:47Z INFO Reconcile Agent Pool {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "status.agentPoolID is empty, creating a new agent pool"}
2025-03-03T02:43:48Z ERROR Reconcile Agent Pool {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "failed to create a new agent pool", "error": "unauthorized"}
2025-03-03T02:43:48Z ERROR Agent Pool Controller {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "reconcile agent pool", "error": "unauthorized"}
In this case, the error "unauthorized"
suggests a permissions issue. Verify that the token or credentials used by the operator have the correct permissions to create agent pools.
On the other hand, If the CRD is processed successfully, the logs should indicate that the reconciliation was successful:
~ kubectl -n tfe logs -f --tail 6 tfc-operator-hcp-terraform-operator-dfb5d7c64-kq78n
Defaulted container "manager" out of: manager, kube-rbac-proxy
2025-03-03T02:54:53Z INFO Reconcile Agent Tokens {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent tokens"}
2025-03-03T02:54:53Z INFO Reconcile Agent Deployment {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "new reconciliation event"}
2025-03-03T02:54:53Z INFO Reconcile Agent Deployment {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "mgs": "performing Deployment update"}
2025-03-03T02:54:53Z INFO Reconcile Agent Deployment {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent deployment"}
2025-03-03T02:54:53Z INFO Reconcile Agent Autoscaling {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent autoscaling"}
2025-03-03T02:54:53Z INFO Agent Pool Controller {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent pool"}
4. Compare with API Endpoint Behavior
To further isolate an identified problem, manually test resource creation directly against the API endpoint. This helps determine whether the issue is related to the operator or with the permissions/values being sent.
For example, try creating an agent pool using the same token:
~ curl -s \
--header "Authorization: Bearer $TFE_TOKEN" \
--header "Content-Type: application/vnd.api+json" \
--request POST \
--data @payload.json \
https://example.terraform.com/api/v2/organizations/$ORG/agent-pools
The result will help to rule out the possible causes. If the API call returns a similar error (e.g., status 4XX), it confirms a permissions issue or a potential problem with the payload. If the API call succeeds, then the problem may be specific to how the operator processes and forwards the request.
Additional Information