Introduction
When deploying and operating the HCP Terraform Operator for Kubernetes, you may encounter issues where the operator fails to provision Custom Resource Definitions (CRDs) as expected. This article explains how to troubleshoot these issues by inspecting operator logs and comparing the results with direct API calls to your Terraform Enterprise instance.
Prerequisites
- An operational Terraform Enterprise instance.
- The HCP Terraform Operator deployed in your Kubernetes cluster.
Procedure
Step 1: Identify the HCP Terraform Operator pod
List the pods in the deployment namespace to identify the operator pod. The pod name is prefixed with the Helm release name, which is tfc-operator in this example.
$ kubectl -n tfe get pod NAME READY STATUS RESTARTS AGE terraform-enterprise-cfb844565-5qz4z 1/1 Running 0 3h22m tfc-operator-hcp-terraform-operator-dfb5d7c64-kq78n 2/2 Running 0 2m
Step 2: Verify the API Endpoint
By default, the operator communicates with the HCP Terraform endpoint at app.terraform.io. When using a self-managed Terraform Enterprise instance, you must override this default value operator.tfeAddress with the Terraform Enterprise hostname in your helm install command or override.yaml file.
Verify the override value by running the following command.
$ helm get values tfc-operator -n tfe USER-SUPPLIED VALUES: operator: tfeAddress: https://example.terraform.com replicaCount: 1
In this example, the operator is connecting to example.terraform.com. If this command does not return an override value, the operator defaults to app.terraform.io.
Step 3: Check the operator pod logs
Inspecting the operator logs is critical, as the operator pod processes the CRDs and communicates changes with the configured API endpoint. The pod logs will indicate whether the process succeeded or failed.
If the logs indicate a permission error when creating an agent pool, the output may show an unauthorized error.
$ kubectl -n tfe logs --tail 6 pod/tfc-operator-hcp-terraform-operator-97f589466-lh9jx
## Defaulted container "manager" out of: manager, kube-rbac-proxy
INFO Spec Validation {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "validating instance object spec"}
INFO Spec Validation {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "spec is valid"}
INFO Reconcile Agent Pool {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "reconciling agent pool"}
INFO Reconcile Agent Pool {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "status.agentPoolID is empty, creating a new agent pool"}
ERROR Reconcile Agent Pool {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "failed to create a new agent pool", "error": "unauthorized"}
ERROR Agent Pool Controller {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "reconcile agent pool", "error": "unauthorized"}The "unauthorized" error suggests a permissions issue. Verify that the token or credentials used by the operator have the correct permissions to create agent pools.
If the CRD is processed successfully, the logs should indicate that the reconciliation was successful.
$ kubectl -n tfe logs -f --tail 6 tfc-operator-hcp-terraform-operator-dfb5d7c64-kq78n
## Defaulted container "manager" out of: manager, kube-rbac-proxy
INFO Reconcile Agent Tokens {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent tokens"}
INFO Reconcile Agent Deployment {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "new reconciliation event"}
INFO Reconcile Agent Deployment {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "mgs": "performing Deployment update"}
INFO Reconcile Agent Deployment {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent deployment"}
INFO Reconcile Agent Autoscaling {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent autoscaling"}
INFO Agent Pool Controller {"agentpool": {"name":"k8s-agent-pool","namespace":"tfe"}, "msg": "successfully reconcilied agent pool"}Step 4: Compare behavior with direct API calls
To further isolate an identified problem, manually test resource creation directly against the API endpoint. This helps determine whether the issue is with the operator or with the permissions and values being sent.
For example, attempt to create an agent pool using the same token.
$ curl -s \ --header "Authorization: Bearer $TFE_TOKEN" \ --header "Content-Type: application/vnd.api+json" \ --request POST \ --data @payload.json \ https://example.terraform.com/api/v2/organizations/$ORG/agent-pools
If the API call returns a 4XX status error, it confirms a permissions issue or a problem with the payload. If the API call succeeds, the problem is likely specific to how the operator processes and forwards the request.
Additional Information
- For more details, refer to the HCP Terraform Operator for Kubernetes API reference.