Problem
New Nomad client node is able to join the Nomad cluster, however new allocation is unable to place resulting in the following errors in the Nomad log.
Prerequisites (if applicable)
Log contains ACL errors like such:
{"@level":"error","@message":"error performing RPC to server","@module":"client.rpc","@timestamp":"2022-02-16T22:19:14.309069Z","error":"rpc error: ACL token not found","rpc":"Alloc.GetAlloc","server":{"IP":"10.30.4.45","Port":4647,"Zone":""}}
{"@level":"error","@message":"error performing RPC to server which is not safe to automatically retry","@module":"client.rpc","@timestamp":"2022-02-16T22:19:14.309139Z","error":"rpc error: ACL token not found","rpc":"Alloc.GetAlloc","server":{"IP":"10.30.4.45","Port":4647,"Zone":""}}
{"@level":"error","@message":"error querying previous alloc","@module":"client.alloc_migrator","@timestamp":"2022-02-16T22:19:14.309160Z","alloc_id":"9a9ccbb4-dff0-5fe2-5274-4ad85485e38d","error":"rpc error: ACL token not found","previous_alloc":"0be6888a-fcab-34df-bafa-96a9d0412258"}
Cause
If this is happening on new Nomad client nodes, it is highly likely caused by duplicate client-id. It is possible to the AMI used to bootstrap the client node contains the same client-id of a previously existing Nomad client.
Solutions:
-
Confirm the location of data_dir in the client configuration file.
-
Confirm the new nodes have the same node-id's by verifying the content of Nomad client-id file located under [data_dir/client/client-id] upon bootstrap.
- If so, removing the Nomad [data_dir] and restarting the Nomad agent will allow a new client-id to be created.
Outcome
Content of the [data_dir/client/client-id] should now be different. If using AWS auto scaling groups for example, AMI used for launch configuration should be updated to exclude anything from data_dir to be reused.