Introduction
This Article addresses a situation where a Dev tier HCP Vault cluster becomes unresponsive for a period of time.
A Development single-node instance is designed for individual users and building proof of concept projects, it is not designed to be used for production workloads or other large-scale workloads.
Problem
When attempting to make API requests to HCP Vault API, you may receive a “TLS handshake timeout” response, similar to the attached screenshot.
Cause
This is usually an indication that the current workloads being run against the cluster cannot be sustained by the resources available.
HCP DEV instances are single nodes and sized extra small (t3a.micro in AWS). In certain workload scenarios, it may be more prone to replacement by the underlying orchestration of the platform when resource utilization peaks occur due to CPU throttling because of its size. Since the DEV tier only consists of a single node, each time the orchestration layer replaces the node, Vault cannot serve requests until a new node comes up.
If a single node in HA is unresponsive, you may experience i/o timeout errors. These errors happen when a bad node is being queried. HA should still be working properly on the cluster, as it will retry requests on a working node which will then go through.
Solution
Consider moving your workloads to a 3-NODE HA CLUSTER which has better fallback measures compared to the Dev tier (SINGLE NODE CLUSTER) where you experience downtime while the single node is being replaced.
If you see i/o errors, contact HCP Support so that we can replace the unresponsive node.
Additional Information
For information on HCP pricing, please review our HCP Vault Pricing page.
To assist in planning Vault deployments, Vault Limits, and Maximums.
If you are experiencing this or similar issues and you believe it is not related to your workloads, contact HCP Support.