Choosing the Right disconnect Reconcile Mode in Nomad: Practical Guidance & Pitfalls – HashiCorp Help Center

Introduction

Nomad provides the disconnect block in job specifications to handle scenarios where client nodes become unavailable. When an allocation enters an UNKNOWN state due to a node or agent failure, Nomad can automatically create replacement allocations on healthy nodes. Once the original client recovers, Nomad must decide whether to keep the original or replacement allocation.

This behaviour is controlled by the reconcile parameter, which can be set to keep_original, keep_replacement, or longest_running. Understanding these options, and tuning the related lost_after parameter, is critical to minimising downtime and preventing duplicate allocations.

Scenario

When a Nomad client node fails or gets disconnected from the cluster, all allocations running on it transition to the UNKNOWN state. If the disconnect block is configured with replace = true, Nomad schedules replacements on healthy client nodes. The challenge arises when the failed client comes back online:

Should the original allocations be recovered?
Should the replacement allocations be preserved?
What happens if one or both are unhealthy or have failed?

Key disconnect configuration fields:

disconnect {
  lost_after = "1h"
  replace    = true
  reconcile  = "keep_original" # or "keep_replacement" or "longest_running"
}

Nomad’s decision depends on the reconcile setting:

keep_original → prioritise original allocations.
keep_replacement → prioritise replacement allocations.
longest_running → preserve whichever allocation has run the longest.

The lost_after value controls how long Nomad waits before marking an unreachable allocation as LOST, automatically transitioning its Desired state to STOP.

Recommendation

1. Use `keep_replacement` to Minimise Downtime

Best for high-availability environments where preserving the healthy replacement allocation is the priority.
Once the original client recovers, replacement allocations remain running, and original allocations are stopped.
Limitation: if replacements fail due to environment-specific safeguards (e.g., custom anti-duplicate logic, or Nomad's constraint block configuration), Nomad will continuously attempt replacements until conditions are met.

2. Use `longest_running` with a Tuned `lost_after`

Suitable for environments that need to protect against premature termination of replacements.
If the original client does not recover before lost_after, the original allocations are marked LOST and will no longer interfere with replacements.
Setting lost_after to a small value (e.g., 1–2 minutes) ensures that recovered clients don’t override healthy replacements after transient outages.
Tradeoff: short lost_after values may mark temporarily partitioned nodes as lost too aggressively.

3. When to Use `keep_original`

Prioritises recovery of original allocations, regardless of runtime duration.
May be appropriate if workloads are tightly coupled to specific nodes.
Caveat: testing has shown inconsistent behavior in some edge cases (e.g., allocations in UNKNOWN sometimes recover, sometimes fail). Use with caution.

4. Operational Best Practices

Automate Desired-state changes: If you want to preserve replacements, explicitly stop originals with nomad alloc stop <alloc-id> before the client recovers. This updates the original’s Desired state to STOP.
Prefer Nomad commands over container runtime stops: nomad alloc stop cleanly transitions to COMPLETE, while podman stop or docker stop results in FAILED. In this case manually killed the docker or podman or respective driver's associated container.
Monitor Desired vs Status: Reconciliation is driven by Nomad's both "Desired" state (RUN, STOP) and "Status" (RUNNING, UNKNOWN, FAILED). Observing both is critical for troubleshooting.

Additional Information

Example Configurations

Preserve replacements (common HA scenario):

disconnect {
  lost_after = "10m"
  replace    = true
  reconcile  = "keep_replacement"
}

Preserve longest-running allocations, but ensure originals don’t override after long outages:
```
disconnect {
  lost_after = "2m"
  replace    = true
  reconcile  = "longest_running"
}
```

Logs and Transitions

When Nomad attempts to recover a missing container:

Killing      Sent interrupt. Waiting 5s before force killing
Terminated   Exit Code: 0, Exit Message: "Driver was unable to get the exit code. No such Container"
Reconnected  Client reconnected

FAILED or COMPLETE allocations are not restarted. Nomad will create new ones instead.

Key Takeaways

keep_replacement: safest for minimising downtime in most cases.
longest_running + tuned lost_after: useful when avoiding duplicates but requires careful tuning.
keep_original: rarely recommended due to inconsistencies.
Automating state changes (nomad alloc stop) and carefully selecting lost_after are critical to avoid unwanted downtime or duplication.

References