Envoy's concurrency setting (--concurrency
) determines the number of worker threads used to handle incoming connections. Since Envoy scales well due to its threading model, proper tuning is necessary to optimize resource usage while avoiding unnecessary memory consumption or suboptimal request distribution.
Sidecar Proxy Configuration
(For workload-mesh use cases where Envoy is deployed as a sidecar proxy next to application services)
Initial Default Value
- The default value is
1
(--concurrency 1
), which is generally a good starting point for lightweight services that have relatively low request volumes. -
Why? Because:
- Sidecar proxies often sit next to applications that may be CPU/memory constrained.
- Higher concurrency does not necessarily improve performance unless the application itself can utilize more CPU.
Scaling Considerations
Primary Factors:
-
- Application Threading Model: If the application is single-threaded (e.g., Python Flask), there is no benefit in running multiple Envoy threads. However, for multi-threaded services (e.g., Go, Java), additional concurrency may be beneficial.
- Traffic Volume: If the service handles a large volume of inbound/outbound connections, increasing concurrency can help distribute the load.
-
CPU Allocation: Each Envoy worker thread is pinned to a CPU thread. If a sidecar has only
250MHz
CPU allocated (as per the Nomad example), increasing concurrency may not yield performance benefits.
Recommended Tuning Strategy
-
Low traffic applications: Stick with the default (
--concurrency 1
). -
Moderate traffic (e.g., 100-500 RPS per instance): Scale concurrency to
2-4
, but only if the CPU allocation is at least 500-1000 MHz. -
High traffic (e.g., 500+ RPS per instance): Consider increasing to
4-8
, ensuring sufficient CPU (>1 core
).
Additional Optimization Notes
- CPU Pinning: If running Nomad with cgroups-based resource isolation, ensure that Envoy is allocated enough CPU cores to justify increasing concurrency.
-
Memory Usage Consideration: More concurrency means more memory usage per worker for connection pools. Be mindful if sidecar memory is limited (
128MB
in your Nomad config). - Connection Pool Efficiency: Too many workers can result in fragmented connection pools, leading to reduced reuse of HTTP/2 or TCP connections.
TL;DR for Sidecars
- Default (
--concurrency 1
) is fine for most workloads. - Scale based on CPU/memory allocation and service threading model.
- If your sidecar is handling 500+ RPS and has sufficient CPU, increase to
2-4
or more.
Edge Proxy Configuration (Ingress/Egress)
(For Envoy acting as a gateway, e.g., NGINX ingress replacement or terminating gateway egress proxy)
Initial Default Value
- Unlike sidecars, edge proxies require much higher concurrency due to their role in handling multiple independent connections across multiple backend services.
-
Good initial default: Set
--concurrency
equal to the number of CPU cores allocated (NOMAD_CPU_LIMIT / 1000
).
Explanation
-
NOMAD_CPU_LIMIT
gives the allocated CPU in MHz. - Since Envoy's concurrency is typically set to the number of CPU cores, we divide by
1000
to get the number of vCPUs.
Example Usage in Nomad Task Configuration
If you want to dynamically set concurrency based on available CPU, modify your args
field in Nomad. You'll need to update the Default Envoy Configuration within the sidecar_task block of each application:
sidecar_task { args = [ "-c", "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json", "-l", "${meta.connect.log_level}", "--concurrency", "$(($((${NOMAD_CPU_LIMIT} + 999)) / 1000))", "--disable-hot-restart" ] }
To ensure a minimum concurrency of 1, use:
--concurrency=$(($((${NOMAD_CPU_LIMIT} + 999)) / 1000))
This ensures that even with NOMAD_CPU_LIMIT=500
, concurrency will not be 0
, but at least 1
.
Explanations
- If Nomad allocates
500 MHz
(NOMAD_CPU_LIMIT=500
), then--concurrency
will be0
(not ideal). - If Nomad allocates
2000 MHz
(NOMAD_CPU_LIMIT=2000
), then--concurrency
will be2
.
Scaling Considerations
Primary Factors:
-
-
CPU Allocation: Since each worker thread runs its own event loop, a good rule of thumb is
1 worker per vCPU core
. Setting concurrency higher than CPU cores often wastes memory. - Throughput Needs: If the proxy needs to handle thousands of RPS, concurrency should match the expected workload.
- Connection Characteristics: If many long-lived connections exist (e.g., gRPC or WebSockets), you may want slightly higher concurrency.
-
CPU Allocation: Since each worker thread runs its own event loop, a good rule of thumb is
Recommended Tuning Strategy
-
1 vCPU → Set concurrency to
1
. -
2 vCPUs → Set concurrency to
2
. -
4 vCPUs → Set concurrency to
4
. -
8+ vCPUs → Consider keeping concurrency at
8
, unless traffic volume justifies more.
Additional Optimization Notes
- Autoscaling: If deploying in Nomad, consider auto-scaling based on request latency, CPU load, and number of active connections.
- Performance Bottlenecks: If CPU usage is consistently high, increasing concurrency will not help. Instead, optimize filters, connection pooling, and avoid blocking filters.
- TLS Handshakes: Edge proxies often terminate TLS, which is CPU-intensive. Ensure CPU is appropriately allocated if handling high numbers of HTTPS requests.
TL;DR for Edge Proxies
- Set concurrency to match CPU cores (e.g., 4 vCPUs →
--concurrency 4
). - Too high concurrency wastes memory and connection pool efficiency.
- Use Nomad autoscaling for dynamic concurrency tuning.
Summary of Recommendations
Proxy Type | Default (--concurrency ) |
When to Scale Up? |
---|---|---|
Sidecar Proxy | 1 |
If handling 500+ RPS AND CPU allocation is at least 500+ MHz. Scale to 2-4 based on load. |
Edge Proxy | Match CPU cores | If handling high sustained traffic, optimize based on CPU/memory/latency. |
References
- Medium: Envoy Proxy Threading Model
- Envoy Docs: Envoy Listener Theading Model