Overview
This article addresses common issues that users may encounter when attempting to deregister services in HashiCorp Cloud Platform (HCP) Consul. Deregistering services is essential when instances or services are decommissioned, but errors like "404 Not Found" or permission-related issues can arise. This guide will help troubleshoot these common issues and provide steps for proper service deregistration.
1. Error: Unexpected Response Code: 404
Issue: The error Unexpected response code: 404
occurs when Consul attempts to deregister a service that cannot be found. This is often due to a mismatch between the service ID and the actual services registered on the agent or the incorrect value(service name) passed instead of service ID.
Sample Error:
Error deregistering service "": Unexpected response code: 404 (Unknown service ID "xxxxx-xxxx-xxx-xxx". Ensure that the service ID is passed, not the service name.)
Possible Causes related with http 404 errors:
- The service being deregistered does not exist on the specified EC2 instance.
- The wrong ServiceID or Node is being used.
- The service has already been deregistered or removed.
Solution:
-
Verify the service is still registered on the agent using:
curl -X GET "http://localhost:8500/v1/agent/services"
This will list all services registered to the agent. Ensure the ServiceID exists.
- If the service is no longer available, manually remove it from the Consul catalog using the following API call:
curl -X PUT -H "X-Consul-Token: <consulToken>" -H "Content-Type: application/json" -d'{"ServiceID": <>,"Datacenter": <>,"Node": <>}' http://<>:8500/v1/catalog/deregister?tag=namespace:xx
<ServiceID>
will be the actual service you wish to deregister.
2. ACL not found
Issue: The error ACL not found
usually occurs when the wrong ACL token is used or the token does not have sufficient permissions to deregister the service.
Sample Error:
root@server:/home/ubuntu# curl -H "X-Consul-Token: ${consulToken}" http://127.0.0.1:8500/v1/agent/members
ACL not found
Possible Causes:
- The ACL token used does not have the correct permissions to deregister services.
- The token may have expired or been incorrectly entered.
Solution:
-
Check the ACL token and ensure it has the correct permissions(write policy enabled) to deregister services. You can verify token permissions with:
curl -X GET -H "X-Consul-Token: <consulToken>" "http://localhost:8500/v1/acl/token/self"
This command will return the details of the token, including its permissions.
-
If the token has expired, regenerate or request a new token with appropriate privileges from the Consul UI or via the ACL system.
Note : Kindly refer to Update a Token to know more on how to update a token.
3. Service Reappearing After Deregistration
Issue: Another case where services may not seems to get deregistered could be most likely caused by a client agent re-registering the service. When services are registered with a client agent using the v1/agent/service/register
endpoint, the client agent takes responsibility for keeping the registration updated with the cluster. If a service is directly deregistered from the central catalog using the v1/catalog/deregister/service_id
endpoint, the client agent will automatically re-register it.
Possible Causes:
- The Consul agent’s service registration is still active.
- There is a race condition in which multiple instances are trying to deregister services at the same time.
Solution:
-
Make sure to use the v1/agent/services/deregister endpoint to deregister the service at the agent level if the services are registered through agent only. This ensures it won't be re-registered automatically in case if service deregistration is done using agent endpoint. To identify which services are registered on a specific node, the user can access the node and then run the following command:
curl localhost:8500/v1/agent/services
This will display all services registered with the client agent, allowing the user to decide which ones should be deleted.
-
Ensure that the service is correctly deregistered from the agent and not re-registered by a running process. Use:
curl -X GET "http://localhost:8500/v1/agent/services"
This will display the current state of service registrations on the agent.
-
To avoid race conditions, ensure that each instance only deregisters its own services. You can filter services based on the instance ID using a command similar to:
curl -s -H "X-Consul-Token: ${consulToken}" "localhost:8500/v1/catalog/service/<servicename>"
| jq -r -c '.[] | { Node: .Node, ServiceID: .ServiceID } | @json'
| grep "$(ec2-metadata -i | awk '{print $2}')"
| jq -r -c '.ServiceID'
| while read i; do
curl -s -X PUT -H "X-Consul-Token: ${consulToken}" -H "Content-Type: application/json" "http://localhost:8500/v1/agent/service/deregister/$i";
done
4. Deregistering Services Across Multiple EC2 Instances Using SSM
To deregister services across an entire EC2 fleet, AWS Systems Manager (SSM) can be used to execute the deregistration command on all instances simultaneously.
Steps:
- Log into your AWS Console and navigate to Systems Manager.
- Use the Run Command feature to execute the deregistration command across your EC2 instances.
- Ensure that each instance only attempts to deregister its own services by using the
instance-id
filter :curl -s -H "X-Consul-Token: ${consulToken}" "localhost:8500/v1/catalog/service/<servicename>"
| jq -r -c '.[] | { Node: .Node, ServiceID: .ServiceID } | @json'
| grep "$(ec2-metadata -i | awk '{print $2}')"
| jq -r -c '.ServiceID'
| while read i; do
curl -s -X PUT -H "X-Consul-Token: ${consulToken}" -H "Content-Type: application/json" "http://localhost:8500/v1/agent/service/deregister/$i";
done
-
curl -s -H "X-Consul-Token: ${consulToken}" "localhost:8500/v1/catalog/service/servicename"
: Fetches the services registered to the Consul agent. Thecurl
command is used to fetch the services associated with a particular node, and the response is filtered using the Instance ID from EC2 metadata. -
jq -r -c '.[] | { Node: .Node, ServiceID: .ServiceID } | @json'
: Thejq
tool is used to extract the node and service IDs. -
grep "$(ec2-metadata -i | awk '{print $2}')"
: Filters the services by the current EC2 instance’s ID. -
jq -r -c '.ServiceID'
: Extracts the service IDs from the filtered results. -
while read i; do ...; done
: Loops over the service IDs and sends deregistration requests to the Consul API.
Key Challenges:
-
Race Condition in Deregistration: If the EC2 instances are querying and deregistering all service IDs indiscriminately, this can cause different instances to interfere with each other’s services. For example, if instance A deregisters a service belonging to instance B, instance B would find that the service had already been removed when it attempted to deregister it.
-
Deregistering the Wrong Service: When an instance tries to deregister a service, it would often deregister services not associated with itself but with other instances, causing further complications in the cluster.
Best Practices for Service Deregistration in Consul
-
Always verify the service list on the agent using
v1/agent/services
before attempting to deregister. - Filter services by instance ID to avoid race conditions in multi-instance environments.
- Use the correct ACL tokens with proper permissions for service deregistration.
- Automate the process using AWS SSM or other configuration management tools for large-scale deregistration across multiple instances.
Conclusion
Service deregistration across multiple EC2 instances in HCP Consul can sometimes lead to errors due to misconfigurations or permission issues. This guide offers solutions for the most common problems.
Further, Using SSM approach ensures that each EC2 instance only deregisters the services it owns, preventing race conditions and ensuring proper cleanup across the cluster. By filtering the services based on the instance’s own ID, the command avoids accidental deregistration of services belonging to other nodes, maintaining the integrity of the service catalog.
References
1. https://developer.hashicorp.com/consul/commands/services/deregister
2. https://developer.hashicorp.com/consul/docs/v1.15.x/security/acl/acl-rules#admin-partition-rules
3. https://developer.hashicorp.com/consul/docs/k8s/helm#synccatalog
4. https://developer.hashicorp.com/consul/docs/enterprise/namespaces
6. https://developer.hashicorp.com/consul/docs/architecture/anti-entropy#agent
7. https://developer.hashicorp.com/consul/api-docs/acl/tokens#read-self-token