The information contained in this article has been verified as up-to-date on the date of the original publication of the article. HashiCorp endeavors to keep this information up-to-date and correct, but it makes no representations or warranties of any kind, express or implied, about the ongoing completeness, accuracy, reliability, or suitability of the information provided.
All information contained in this article is for general information purposes only. Any reliance you place on such information as it applies to your use of your HashiCorp product is therefore strictly at your own risk.
Introduction
This is a guide on how to identify and increase the netfilter connection tracking table (nf_conntrack) when it becomes full, which can cause problems establishing new connections to the instance including the Consul agent running on the instance.
Problem
Services registered to Consul running on a virtual machine are getting deregistered due to I/O timeout and connection refused errors.
Cause
When investigating the Consul agent logs, we find the following error.
I/O timeout error:
Mar 21 11:20:13 xxxxxxxxx consul: 2023-03-21T11:20:13.690-0300 [WARN] agent:
Check socket connection failed: check=default/default/_nomad-check-XXXXXXXXXXXX
error="dial tcp 10.14.51.43:25922: i/o timeout"
Connection refused error:
Mar 21 11:26:58 xxxxxxxxx consul: 2023-03-21T11:26:58.587-0300 [INFO] agent:
Deregistered service: service=_nomad-task-XXXXXXXXXXXX
Mar 21 11:34:58 xxxxxxxxx consul: 2023-03-21T11:34:58.449-0300 [INFO] agent:
Deregistered service: service=xxxxx-XXXXXXXXXXXX
Mar 21 11:34:59 xxxxxxxxx consul: 2023-03-21T11:34:59.128-0300 [WARN] agent:
Check socket connection failed: check=default/default/_nomad-check-XXXXXXXXXXXX
error="dial tcp 10.14.51.43:30692: connect: connection refused"
The above errors do not provide any details on what is causing connection refused"
or i/o timeout"
. In order to investigate the issue, check the OS system-level logs and look for the following error.
sudo tail -F /var/log/messages
Mar 21 10:40:57 xxxxxxxxxx kernel: nf_conntrack: table full, dropping packet
Mar 21 10:40:57 xxxxxxxxxx kernel: nf_conntrack: table full, dropping packet
Mar 21 10:40:57 xxxxxxxxxx kernel: nf_conntrack: table full, dropping packet
Mar 21 10:40:57 xxxxxxxxxx kernel: nf_conntrack: table full, dropping packet
If the above error is found, /var/log/messages
shows that the nf_conntrack
table is full and the kernel is dropping packets.[1]
Diagnosis:
The conntrack entries in the /proc tree are only populated if conntrack is active. Some Linux distributions do not require conntrack to be enabled, and nf_conntrack_max
entries will not be active, then the table cannot be full. For distributions where it is active, check the maximum and current conntrack entries.
sysctl net.netfilter.nf_conntrack_max net.netfilter.nf_conntrack_count
If the nf_conntrack_count is near nf_conntrack_max, then the maximum may need to be increased.
Solution:
- Increase the value for net.netfilter.nf_conntrack_max to desired value.
sysctl -w net.netfilter.nf_conntrack_max=262144
-
To make the changes persistent, we add these entries to sysctl.conf
echo "net.netfilter.nf_conntrack_max=
262144" >> /etc/sysctl.conf
Note: Use the following rule to calculate an appropriate value for nf_conntrack_max.[2]
CONNTRACK_MAX = RAMSIZE (in bytes) / 16384 / (x / 32) where x is the number of bits in a pointer (for example, 32 or 64 bits). |
For example, for running a 64-bit OS with 8 GB of memory, the most appropriate value for net.netfilter.nf_conntrack_max is CONNTRACK_MAX = 8 * 1024^3/16384 = 524288/2 = 262144
References
[1] https://people.netfilter.org/pablo/docs/login.pdf
[2] https://wiki.khnet.info/index.php/Conntrack_tuning