Troubleshooting Driver Failure and Container Process Initialization Errors in Nomad Jobs for Exec Driver – HashiCorp Help Center

Issue: While executing a Nomad job, Nomad allocations are getting failures as the container process related to the exec driver and container initialization error across various RHEL and Nomad versions. This article provides a detailed analysis of the error message, explores potential causes, and offers troubleshooting steps to resolve the issue effectively.

Error description:

Driver Failure: failed to launch command with executor: rpc error: code = Unknown desc = unable to start container process: error during container init: read init-p: connection reset by peer

The error indicates that the command intended for execution within the container failed to launch due to an unspecified Remote Procedure Call (RPC) error. This issue occurred during the container's initialization phase and involved a network-related problem where the connection was unexpectedly closed by the peer.

Reproduction steps: Implement the reproduction scenario's actual and expected behavior, and test it across various versions of RHEL and Nomad.

Step 1: Create a job file with the name httpd.hcl.

job "httpd" {
group "httpd" {

task "httpd" {
driver = "exec"

config {
command = "bash"
args = ["-c", "while true; do sleep 500; done"]
}
}
}
}

Step 2: Run the Nomad job using the below command.

nomad job run httpd.hcl

Step 3: Check the status of the Nomad job from the Nomad UI or using the below command on CLI.

nomad job status httpd

Job Failure screenshot from UI:

Job successfully running screenshot:

Actual behavior:

Run the nomad file and it throws the below error of exec driver failure.

Error message:

Driver Failure: failed to launch command with executor: rpc error: code = Unknown desc = unable to start container process: error during container init: read init-p: connection reset by peer

Expected behavior:

The job will run without any errors.

Steps to mitigate this issue:

Use cases	Linux flavor	Nomad Version	Status	Comments
Scenario 1	RHEL 8	1.8.0	Not Running	Either upgrade to RHEL 9 or Nomad 1.8.1 version. Issue occurs due to exec driver failure.
Scenario 2	RHEL 8	1.8.1	Running	In Nomad 1.8.1, there is a fix for bug where `exec` driver tasks would fail on older versions of glibc [GH-23331] Driver: Fixed a bug where the exec, java, and raw_exec drivers would not configure cgroups to allow access to devices provided by device plugins [GH-22518]
Scenario 3	RHEL 8	1.6.8	Running	The Go version for Nomad 1.6.8 is set to 1.21.6 but the first version of Nomad with the bug is 1.6.9 and we can see it's set to 1.22.1. GH-20212
Scenario 4	RHEL 9	1.8.1	Running	Driver: Fixed a bug where the exec, java, and raw_exec drivers would not configure cgroups to allow access to devices provided by device plugins [GH-22518]

Conclusion:

The error indicates that the attempt to launch a command with the executor failed due to an unknown exec driver failure, container initialization error, and RPC error. Specifically, the container process couldn't start because the connection was reset by the peer during the initialization phase. Issue is tested on different versions of RHEL and Nomad, and detailed test results are provided. If customers encounter this error, they can refer to these recommendations for solutions.

Reference Documents:

Nomad upgrade document

Nomad Specific version upgrade

RHEL 8 to RHEL 9 upgrade steps

Articles in this section

Related articles