Introduction
Expected Outcome
This article provides details on creating a Nomad debug bundle which is a *.tar.gz file that contains logs and configurations of your Nomad environment (including servers and clients).
Prerequisites
A workstation or administrative server with network access and credentials to communicate with the Nomad, Consul, and if applicable, Vault clusters, along with hashicorp.com (needed in order to pick up the latest Nomad binary release).
Use Case
To aid in troubleshooting Nomad issues.
Procedure
-
Reference: nomad operator debug.
-
If you can, please replicate the issue, running the nomad operator debug for the -duration of the reproduction. Otherwise, use the default in the example below.
-
Before running the nomad operator debug command, please make sure you are connected to Nomad, Consul, and Vault (if you are running Vault) by running the following commands:
-
vault status
consul members
nomad server members
-
- If you receive any errors above, make sure the following environment variables are set properly:
-
VAULT_ADDR
VAULT_TOKEN
CONSUL_HTTP_ADDR
CONSUL_HTTP_TOKEN
NOMAD_ADDR
NOMAD_TOKEN
-
- IMPORTANT: As the latest Nomad and Consul binaries have the best debug features, please use them to create the bundles. Download the binaries to your workstation or administrative server, unzip them in the directory of choice, then run the two debug commands locally:
-
# Nomad & Consul binary location:
https://releases.hashicorp.com/nomad/<choose-latest>/
# Download the binary packages. Choose the appropriate zip files for your OS:
# For Linux:
curl https://releases.hashicorp.com/nomad/<choose-latest>/nomad_N.N.N_linux_amd64.zip -o nomad_N.N.N.zip
# For Windows:
curl https://releases.hashicorp.com/nomad/<choose-latest>/nomad_N.N.N_windows_amd64.zip -o nomad_N.N.N.zip
# Unzip the downloaded packages. This will leave you with the "nomad" executable to use to run `nomad operator debug` with.
# Run `nomad operator debug`. If reproducing the issue, change the "-duration" value to be the duration that will capture the before issue and include the issue:
./nomad operator debug -duration=2m -interval=15s -log-level=TRACE -server-id=all -node-id=all
Error Troubleshooting
- ERRORS
- Failed to retrieve agent host data, err: Unexpected response code: 403 (Permission denied)
- Agent host retrieval requires agent:read ACL or enable_debug=true. See https://www.nomadproject.io/api-docs/agent#host for more information.
- client/b1f30685-d344-d747-1720-871189e5d62d: Failed to retrieve pprof profile.prof, err: Unexpected response code: 403 (Permission denied)
- Pprof retrieval requires agent:write ACL or enable_debug=true. See https://www.nomadproject.io/api-docs/agent#agent-runtime-profiles for more information
- Failed to retrieve agent host data, err: Unexpected response code: 403 (Permission denied)
- INVESTIGATION
- Research pointed to access tokens. Issued command to see what access tokens were present (nomad acl token self), it was determined that they were not present, but disabled.
- Access Control Tokens
- https://learn.hashicorp.com/collections/nomad/access-control
-
nomad acl token self
- ACL support disabled
-
- This points to the other idea, "enable_debug=true". In the nomad configuration file, as in the case with the environment https://github.com/watsonian/hashicorp-stack-ubuntu, one needs to set "enable_debug = true" in ./config/nomad-server.hcl and ./config/nomad-client.hcl, at top of file, as global flags.
- Research pointed to access tokens. Issued command to see what access tokens were present (nomad acl token self), it was determined that they were not present, but disabled.
- CAUSE
- Disabled ACL support and enable_debug not set.
- SOLUTION
- Enable ACL support:
- Or, set "enable_debug = true" in ./config/nomad-server.hcl and ./config/nomad-client.hcl, at top of file, as global flags.
- Restart both Nomad server and clients:
-
sudo systemctl restart nomad
journalctl -xu nomad # monitor the restart
-
- Restart both Nomad server and clients:
Additional Information
- Reference: Command: nomad operator debug
-
Related tool: HashiCorp Diagnostics (hcdiag)