On most Linux type systems a combination of CLI utilities like grep, cut, sort & uniq can be used to in order to find, scope and or count particular conditions of interest like for example errors from the Vault Operational Log.
When dealing with a Vault Operational Log that's in JSON it's possible to gather much of required counts or other analysis using jq alone that's available for both Windows & Linux operating systems.
Vault Operational Logs in JSON scheme is like:
{
"@level":"...",
"@message":"...",
"@module":"...",
"@timestamp":"...",
"...":"...",
...
}Where possible ranges and example values for each are:
level:
info,debug,error,tracemessage: arbitrary string with message of event or activity
-
module:
core,core.snapshotmgr,core...,audit,activity,identity,expiration,mfa,sealwrap,token,storage.raft,storage.raft...,auth...,auth.approle...tidy,auth.plugin...,auth.ldap...,secrets.database...,replication...
ISO timestamp:
2025-12-31T00:00:00.889487Z... other Key-Value type pair values that follow can vary and are subject to the emitter.
Example entries of both error & info can resemble:
{"@level":"error","@message":"failed to install snapshot","@module":"storage.raft","@timestamp":"2025-12-28T13:24:39.817387Z","error":"dial tcp 10.1.1.149:8201: i/o timeout","id":"bolt-snapshot"}
{"@level":"info","@message":"revoked lease","@module":"expiration","@timestamp":"2025-12-28T13:24:40.992509Z","lease_id":"auth/aws/site1/login/h..."}The common structure between the two stops timestamp where subsequent values that follow after in the case of error level are with with an error KV as well as an id (specific to this example) where as in the case of info level revoke what follows is lease_id with the reference of the lease that was revoked.
This article details some example jq queries that are intended to assist Vault administrators with devising their own. All references to $LOG denote a source log file with the expected JSON entries and multiple log files can be stipulated in succession with the next file name eg: $LOG2 $LOG3 ..., etc.
Extract JSON from mixed Logs
JSON lines may be amidst other mixed content for example with the typical Vault start header(s) that are not in JSON like:
==> Vault server configuration:
Api Address: https://...:8200
Cgo: disabled
Cluster Address: https://...:8201
Go Version: go1.16.15
Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", ...)
Log Level: info
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.8.12+ent
Version Sha: 709c4f28369da19d13d4e540f0860db48b8c9f10
==> Vault server started! Log data will stream in below:
{"@level":"info","@message":"proxy environment","@timestamp":"2025-11-29T12:01:11.320091Z","http_proxy":"","https_proxy":"","no_proxy":""}
{"@level":"info","@message":"using autoloaded license","@module":"core","@timestamp":"2025-11-29T12:01:11.582848Z","license":"{\"license_id\":\"...\",\"customer_id\":\"...\",\"installation_id\":\"*\",\"issue_time\":\"...\",\"start_time\":\"...\",\"expiration_time\":\"...\",\"flags\":{\"modules\":[]},\"features\":[\"DR Replication\",\"Namespaces\",\"Lease Count Quotas\",\"Automated Snapshots\"],\"performance_standby_count\":0}"}
{"@level":"info","@message":"Initializing new log shipper","@module":"replication.perf.logshipper","@timestamp":"2025-11-29T12:01:11.585814Z","max_bytes":3343111782,"max_elements":16384}
{"@level":"info","@message":"Initializing new log shipper","@module":"replication.dr.logshipper","@timestamp":"2025-11-29T12:01:11.585950Z","max_bytes":3343111782,"max_elements":16384}
{"@level":"info","@message":"raft retry join initiated","@module":"core","@timestamp":"2025-11-29T12:01:11.590508Z"}
{"@level":"info","@message":"stored unseal keys supported, attempting fetch","@module":"core","@timestamp":"2025-11-29T12:01:11.590574Z"}
[mysql] 2025/11/29 12:01:11 packets.go:122: closing bad idle connection: EOF
[other... system... events...] ....
{"@level":"warn","@message":"failed to unseal core","@timestamp":"2025-11-29T12:01:11.590944Z","error":"stored unseal keys are supported, but none were found"}
...or other Journal logs that are converged together. These may be stripped out using a query like:
jq -R 'inputs|sub("^[^{]+"; "")|try fromjson catch empty|select(has("@module"))' file1.log file2.log ... > logs_cleaned.jsonThe log_cleaned.json version can then be used thereon or otherwise when needing to deal with original mixed logs then all the provided example can be prefixed with the this and appended with the other examples provided below.
Select by Level
To scope logs to only errors a select can be performed such as:
jq -c 'select(."@level"=="error")' $LOGSelect by Message
To select only events related to a specific message one can perform:
jq -c 'select(."@message" | contains("lease renewal failed"))' $LOGTo obtain a total count for the above use jq map & length functions such as:
jq -s 'map(select(."@message" | contains("lease renewal failed")))| length' $LOGSelect by Module
An example selection of all internal .tidy related events can be made using:
jq -c 'select(."@module" | strings | contains(".tidy"))' $LOGThis will return all tidy cases related to all authentication and secrets mounts.
Obtaining a count for these selected entries can be made with map & length adaptation similar to what was demonstrated earlier as per:
jq -s 'map(select(."@module" | strings | contains(".tidy")))|length' $LOGSummary Timeline & Ops
A brief summary of the total time span, total operations and average operations per second can be produced using a query similar to:
jq -s 'map(select(."@level"))|length as $L| sort_by(."@timestamp")|
(.[-1]."@timestamp"|sub("\\..*Z";"Z")|sub("\\..*+";"Z")|fromdate) as $d1|
(.[0]."@timestamp"|sub("\\..*Z";"Z")|sub("\\..*+";"Z")|fromdate) as $d2|
($d1-$d2) as $d3|
{
time_start: .[0]."@timestamp", time_ended: .[-1]."@timestamp",
span_seconds: $d3, span_minutes: ($d3/60), span_hours: ($d3/3600), span_days: ($d3/93600),
operations: $L, ops_average: ($L/$d3)
}' $LOGTidy Rates, Frequency & Totals
To get the number of tidy operations occurring per second showing peak periods of revocation activity perform:
jq -sc 'map(select(."@module" | contains(".tidy")) | ."@timestamp" | sub("\\..*";"")) | group_by(.)| map([(first),(length)])| .[]' $LOGA substitution is made using the sub function so as to remove the fractional portion of the timestamp.
Tidy Rates, Frequency & Totals with mixed Logs
To obtain a similar total as the previous example but using the original mixed format log use:
jq -cnRr '[ inputs | sub("^[^{]+"; "") | try fromjson catch empty |
select(has("@module")) | select(."@module" | contains(".tidy")) |
."@timestamp" | sub("\\..*";"") ] | group_by(.)[] | "\(.[0]) \(length)"' $LOGOperations Per Second
Obtaining an operational rate (events / QPS) on a per second rate is possible with a query similar to:
# // OPS chronological
jq -sr 'map(select(has("@level"))."@timestamp"|sub("\\..*Z";"Z")|sub("\\..*+";"Z"))|group_by(.)[]|"\(first): \(length)"' $LOGThe above returns events in the chronological order of their input. A low to high ordered list can be gathered using this query instead:
# // OPS high to low
jq -sr 'map(select(has("@level"))."@timestamp"|sub("\\..*Z";"Z")|sub("\\..*+";"Z"))|group_by(.)|sort_by(length)[]|"\(first) \(length)"' $LOGMost Frequent Modules
A possible grouped count of all module types can be gathered using the query below so as to spot particular portion of the system that may be attributing the most to overall activities.
jq -sr 'map(select(has("@module"))|{operation: ."@module"})|group_by(.operation)|sort_by(length)[]|"\(first) \(length)"' $LOGOnline play.jqlang.org
There are resources available online to preview and test jq queries and the earlier example: Tidy Rates, Frequency & Totals with mixed Logs can be found at the following URL:
All the examples provided in this article may also be trialed using the provided sample data.
Resources
Vault KB: Vault Operational Log analysis using jq and general use CLI tools
jq official: jq Manual (development version)