When using Consul-backed storage with Vault (or with integrated raft storage), you can run into a situation where Vault attempts to save data that exceeds the maximum request length configured for Consul. These requests will fail with an error similar to this:
* rpc error: code = Unknown desc = failed to create an entity for the authenticated alias: failed to persist packed storage entry: Failed request: Request body(524401 bytes) too large, max size: 524288 bytes.
These issues can happen if:
- You use Consul as a storage backend and have a
kv_max_value_sizeset too low in the Consul configuration.
- You use Vault's Integrated Storage backend and have a
max_entry_sizeconfigured too low.
- There is some Vault activity that is causing data saved to be unreasonably large.
It can also happen in the context of replication if the KV size configurations indicated above are larger on the primary cluster than the secondary cluster. The error message will be clear about the size of the request vs. the maximum configured size, so based on that information you may need to tune the maximum values and/or investigate what is causing excessively large values to be saved in Vault.
If the difference between the body size and max configured size is close in value, it's probably enough to simply bump up those configured values, either the txn_max_req_len for Consul 1.7.2 or later, the kv_max_value_size for Consul 1.7.1 or earlier, or the max_entry_size for Vault integrated storage.
If the difference is extreme, and the requests contain unreasonably large sets of data, they should be investigated to determine if a workflow change of some kind is in order.
Keep in mind that the larger this value is, the more room you allow for introducing IO delays across the entire cluster which can potentially lead to leadership instability across the Consul nodes. This can also have a trickle-down impact on Vault stability as it needs a stable storage backend (with quorum) in order to be able to read/write secret data on behalf of client requests.
While increasing this value doesn't automatically mean that IO delays will occur, it does open the door for larger KV entries to be written into the storage backend. If you have workloads that are creating or updating high volumes of large size KV entries regularly, then you are at increased risk for IO delays to occur which could potentially impact leadership stability.
For more details, please refer to the following relevant documentation pages: