HashiCorp Help Center
Nomad
Operating
Best Practices: Nomad Server & Client Host Reboot
Introduction
Scenarios
Regular Nomad host maintenance requires the host operating system to reboot.
Nomad host outage requires the hosts to be started back up.
Recommendation
Important
First start with the Nomad server cluster hosts, then after that is done, move on to the Nomad client nodes.
Staggering the server reboots is recommended. It is best to use the Nomad Upgrade Process as a template.
Take a Nomad snapshot before any activity that may cause an outage.
OS Reboot on Nomad Server Cluster Hosts
Do one Nomad server host on the cluster at a time.
Stop Nomad gracefully (sudo systemctl stop nomad
).
If the server was the leader , wait for a follower to gain leadership.
Verify the cluster health. (nomad server members
).
Reboot the server host.
Start Nomad (sudo systemctl start nomad
).
Verify the cluster health.
OS Reboot on Nomad Client Node Hosts
You can do one client node at a time or multiple client nodes at a time. Follow their respective steps.
When rebooting the Nomad Client Node, it is best to drain the node so all of its allocations are migrated to other nodes to avoid disruption of Nomad job tasks.
All Nomad jobs have parameters set with defaults, same for the Migrate Stanza :
max_parallel (int: 1)
health_check (string: "checks"
min_healthy_time (string: "10s")
healthy_deadline (string: "5m")
One Client Node Host at a Time
Drain the client node (nomad node drain -enable -yes <Node ID>
).
Check the client node status (nomad node status <Node ID>
).
Stop Nomad gracefully (sudo systemctl stop nomad
).
Reboot the client node's host OS.
Check the client node status.
Restore client node eligibility (nomad node drain -disable -yes <Node ID>
).
If you get "Error toggling drain mode: Unexpected response code: 500 (no servers) ", run the command from a Nomad Server host.
Check the client node status.
Multiple Client Node Hosts at a Time
See Workload Migration, Drain multiple nodes for details. It includes shell scripting that can help automate manipulating multiple nodes at once.
Check the node status (nomad node status -allocs
).
Set client nodes ineligible (nomad node eligibility -disable <Node ID>
).
Check the node status
Drain client nodes (nomad node drain -enable -yes <Node ID>
).
Check the node status.
Stop Nomad gracefully (sudo systemctl stop nomad
).
Reboot the drained client nodes' hosts OS.
Check the node status.
Restore nodes' eligibility (nomad node drain -disable -yes <Node ID>
).
If you get "Error toggling drain mode: Unexpected response code: 500 (no servers) ", run the command from a Nomad Server host.
Check the node status.
Additional Information
Was this article helpful?
Yes
No
1 out of 1 found this helpful
Have more questions?
Submit a request