Introduction
This article provides a clear explanation of the issue and offers actionable steps to resolve or prevent the CPU overload problem caused by disconnections between boundary workers.
Issue Summary
There are workers distributed across two networks(Network1 and Network2), with X ingress workers hosted in Network1 and Y egress worker hosted in Network2 which connects to the workers in upstream network, Network1. When one of the ingress worker in Network1 is stopped, a spike in CPU usage on the egress worker (connected to the offline ingress worker along with other ingress worker in Network1) in Network2 is observed, which eventually max out over time.
Affected and possible fix versions
Affected Versions : 0.17.0 - 0.17.1-HCP
Possible Fix Version : 0.18.0-HCP
Root Cause
This issue occurs when the boundary worker is unable to maintain a stable connection with an upstream worker. The disconnection triggers a condition in which the boundary worker consumes excessive CPU resources as it attempts to reconnect or handle processes affected by the lost communication.
Resolution Steps
A temporary workaround is to remove the IP of the offline worker from the downstream worker's configuration(in the initial_upstreams configuration) when the upstream worker is taken offline.
Configuration Example :
listener "tcp" {
purpose = "proxy"
tls_disable = true
address = "127.0.0.1"
}
worker {
# Path for worker storage, assuming worker-led or controller-led registration. Must be unique across workers
auth_storage_path="/boundary/demo-worker-1"
# Local storage path required if session recording is enabled
recording_storage_path = "tmp/boundary/"
# Minimum available disk space required in the local storage path if session recording is enabled
recording_storage_minimum_available_capacity = "500MB"
# Workers typically need to reach upstreams on :9201
initial_upstreams = [
"10.0.0.1",
"10.0.0.2",
"10.0.0.3", //For eg, This IP can be removed if the upstream ingress worker is offline//
]
public_addr = "myhost.mycompany.com"
tags {
type = ["prod", "webservers"]
region = ["us-east-1"]
}
}
References
Create a Boundary Instance on HCP
Manage Workers with HCP Boundary
Manage Multi-Hop Sessions with HCP Boundary