Overview
This document lists the recommendations and requirements to configure and use Consul on daily basis. These will also help in gathering information during troubleshooting or opening a support ticket with Consul.
Checklist
-
System Requirement / Infrastructure Planning (hardware capacity recommendations, network requirements, and additional infrastructure considerations)
- Please follow the Consul Reference Architecture documentation
-
Check ports:
- Please make sure that all the necessary ports are open, within the infrastructure, according to the Require Ports documentation.
-
Server Performance:
- Please reference the Server Performance documentation in order to determine minimum requirements and read/write tuning.
-
Telemetry metrics setup:
- After setting up the first datacenter, please make sure the deployment is healthy and establish a baseline. Please review the Telemetry documents and Monitor Consul Datacenter Health tutorial.
-
Consul commands:
- Please educate yourself with the Consul Commands (CLI). Below are some basic commands the Consul Support Engineering team will typically expect to see the outputs for when a ticket is opened:
$ consul members - to see the servers and clients in the Consul cluster
$ consul operator raft list-peers - This will list leader state, voting status, and raft protocol version.
$ consul info - provides various debugging information that can be useful to operators
- Please educate yourself with the Consul Commands (CLI). Below are some basic commands the Consul Support Engineering team will typically expect to see the outputs for when a ticket is opened:
- Where to look for logs:
- Please review the Knowledge Base article Where are my Consul logs and how do I access them to learn more.
-
How to Increase Log Verbosity on Consul agents:
- Review the Increase log verbosity on agents tutorial to learn more.
-
Consul security considerations:
- When planning for security, you need to manage three distinct types of secrets within a Consul deployment; TLS certificates, ACL tokens, and gossip encryption keys. Please follow our security documentation to learn more.
-
Consul Health checks:
- When enabled, health checks are a crucial part of the operation of the Consul datacenter. Unhealthy services will not be published for discovery via standard DNS, or some HTTP API calls. Please check the health-checks tutorial for details.
-
Recovery and Outage preparation:
- Please make a habit of taking a recent backup of Consul's internal state store (snapshot) regularly to recover Consul from a disaster or during a long term outage. Please review our documentation on backup-and-restore to learn more on this.
-
Recovery and outage preparation:
-
You must backup and secure the following Consul secrets in order to recover from the loss of your secured primary Consul datacenter.
The Consul ACL bootstrap token
The last active Consul CA cert
The last active Consul CA key
The last active gossip encryption key
-
You must backup and secure the following Consul secrets in order to recover from the loss of your secured primary Consul datacenter.
-
How to collect Debug bundle:
- Often during deep-dive investigation of complex issues, the customers are asked to collect debug bundle, which log metrics, logs, profiling data, and other data to the current directory
-
hcdiag:
- HashiCorp Diagnostics (hcdiag) is a troubleshooting data-gathering tool that you can use to collect and archive important data from Consul server environments. Please check hcdiag-with-consul for details.
-
For multi-cluster Deployment:
- For deploying multiple Consul clusters across multiple datacenters with basic or advanced federation topologies, please check multi-cluster-deploy for details.
-
For Scaling Reference:
- Sometimes we are asked to give recommendations on max number of services (instances) count that can be registered in a Consul cluster. There is no set number for this, as it is hard to give guidance on that. Because there can be wide variations in how often service instances are updated and how those are queried. Instance counts also do not account for other load on Consul like traffic to the KV store. There are cases where we have observed stable 5-server VM-based clusters (used primarily for service discovery) with 20-30 thousand service instances registered, but those numbers are specific to that deployment. In some other cases, the number could be more or less than that depending on their specific workloads. To learn more, please check our article hashicorp-consul-global-scale-benchmark
-
Best Practices:
- To avoid downtime for your services you should consider, as a general best practice, to run multiple instances of the same service each with its own Consul client agent. Please check here for reference.