Introduction
Nomad provides the disconnect block in job specifications to handle scenarios where client nodes become unavailable. When an allocation enters an UNKNOWN state due to a node or agent failure, Nomad can automatically create replacement allocations on healthy nodes. Once the original client recovers, Nomad must decide whether to keep the original or replacement allocation.
This behaviour is controlled by the reconcile parameter, which can be set to keep_original, keep_replacement, or longest_running. Understanding these options, and tuning the related lost_after parameter, is critical to minimising downtime and preventing duplicate allocations.
Scenario
When a Nomad client node fails or gets disconnected from the cluster, all allocations running on it transition to the UNKNOWN state. If the disconnect block is configured with replace = true, Nomad schedules replacements on healthy client nodes. The challenge arises when the failed client comes back online:
Should the original allocations be recovered?
Should the replacement allocations be preserved?
What happens if one or both are unhealthy or have failed?
Key disconnect configuration fields:
disconnect {
lost_after = "1h"
replace = true
reconcile = "keep_original" # or "keep_replacement" or "longest_running"
}Nomad’s decision depends on the reconcile setting:
keep_original → prioritise original allocations.
keep_replacement → prioritise replacement allocations.
longest_running → preserve whichever allocation has run the longest.
The lost_after value controls how long Nomad waits before marking an unreachable allocation as LOST, automatically transitioning its Desired state to STOP.
Recommendation
1. Use keep_replacement to Minimise Downtime
Best for high-availability environments where preserving the healthy replacement allocation is the priority.
Once the original client recovers, replacement allocations remain running, and original allocations are stopped.
Limitation: if replacements fail due to environment-specific safeguards (e.g., custom anti-duplicate logic, or Nomad's constraint block configuration), Nomad will continuously attempt replacements until conditions are met.
2. Use longest_running with a Tuned lost_after
Suitable for environments that need to protect against premature termination of replacements.
If the original client does not recover before
lost_after, the original allocations are markedLOSTand will no longer interfere with replacements.Setting
lost_afterto a small value (e.g., 1–2 minutes) ensures that recovered clients don’t override healthy replacements after transient outages.Tradeoff: short
lost_aftervalues may mark temporarily partitioned nodes as lost too aggressively.
3. When to Use keep_original
Prioritises recovery of original allocations, regardless of runtime duration.
May be appropriate if workloads are tightly coupled to specific nodes.
Caveat: testing has shown inconsistent behavior in some edge cases (e.g., allocations in
UNKNOWNsometimes recover, sometimes fail). Use with caution.
4. Operational Best Practices
Automate Desired-state changes: If you want to preserve replacements, explicitly stop originals with
nomad alloc stop <alloc-id>before the client recovers. This updates the original’s Desired state toSTOP.Prefer Nomad commands over container runtime stops:
nomad alloc stopcleanly transitions toCOMPLETE, whilepodman stopordocker stopresults inFAILED. In this case manually killed the docker or podman or respective driver's associated container.Monitor Desired vs Status: Reconciliation is driven by Nomad's both "Desired" state (
RUN,STOP) and "Status" (RUNNING,UNKNOWN,FAILED). Observing both is critical for troubleshooting.
Additional Information
Example Configurations
-
Preserve replacements (common HA scenario):
disconnect { lost_after = "10m" replace = true reconcile = "keep_replacement" } -
Preserve longest-running allocations, but ensure originals don’t override after long outages:
disconnect { lost_after = "2m" replace = true reconcile = "longest_running" }
Logs and Transitions
-
When Nomad attempts to recover a missing container:
Killing Sent interrupt. Waiting 5s before force killing Terminated Exit Code: 0, Exit Message: "Driver was unable to get the exit code. No such Container" Reconnected Client reconnected FAILEDorCOMPLETEallocations are not restarted. Nomad will create new ones instead.
Key Takeaways
keep_replacement: safest for minimising downtime in most cases.
longest_running + tuned lost_after: useful when avoiding duplicates but requires careful tuning.
keep_original: rarely recommended due to inconsistencies.
Automating state changes (
nomad alloc stop) and carefully selectinglost_afterare critical to avoid unwanted downtime or duplication.
References