consul.autopilot.healthy
metric shows 0 for healthy cluster in 1.10.x until 1.10.3. This metric exposes the overall health of Consul cluster. It is a boolean. For healthy datacenter, metric should show 1 and for unhealthy ones, the value is 0.
In 1.10.x until 1.10.3, there is a bug, due to which consul.autopilot.healthy metric shows 0 for healthy cluster.
Telemetry: fixes a bug with Prometheus consul_autopilot_healthy metric where 0 is reported instead of NaN on servers. [GH-11231]
The fix went into 1.10.4. In case, the metric still shows 0 in the versions >=`1.10.4` for a healthy cluster.
Then, the following needs to be checked -
1. "prometheus_retention_time”: It is possible that the behavior is different, when that value is very large vs very small, so need to make sure that this value is set to recommended one. "A good value for this parameter is at least 2 times the interval of scrape of Prometheus".
2. "disable_hostname" : to avoid having prefixed metrics with hostname, it is recommended to also enable the option disable_hostname.
If there is prefixed metrics with hostname, then it will show the consul_autopilot_healthy metric as 0