Audit logs are missing from Terraform Enterprise – HashiCorp Help Center

Problem

Audit log information is missing every once in a while in the logs of Terraform Enterprise and this could relate to any audit log or component. The following example is for a sentinel run.

You would expect the following 3 audit lines in the logs for a sentinel run.
policy_check - created || queued || passed

"log\":\"2024-02-10 00:58:41 [INFO] [ce17c08e-1225-4d75-a4da-428a4d5e78d2] [dd.service=atlas dd.trace_id=3995258940181134386 dd.span_id=0 ddsource=ruby] [Audit Log] {\\\"resource\\\":\\\"policy_check\\\",\\\"action\\\":\\\"created\\\"
"log\":\"2024-02-10 00:58:52 [INFO] [Audit Log] {\\\"resource\\\":\\\"policy_check\\\",\\\"action\\\":\\\"queued\\\"
"log\":\"2024-02-10 00:58:55 [INFO] [3374520e-833d-429b-a777-25bfd06a5c19] [dd.service=atlas dd.trace_id=1910763846437290669 dd.span_id=0 ddsource=ruby] [Audit Log] {\\\"resource\\\":\\\"policy_check\\\",\\\"action\\\":\\\"passed\\\

In your logs you are only seeing a audit log for the policy_check - queued

"log\":\"2024-02-10 16:58:53 [INFO] [Audit Log] {\\\"resource\\\":\\\"policy_check\\\",\\\"action\\\":\\\"queued\\\",\\\"resource_id\\\":\\\"polchk-3ucnF1gM4if7nPfK\\\",

Prerequisites

Terraform Enterprise version less than (<) 202402-1

Cause

In the Terraform Enterprise container the application logs are picked up by fluent-bit. When there is something in the log that is bigger then 32k it will crash with the following error.

2024-02-15T13:09:17.764748000Z [2024/02/15 13:09:17] [error] [input:tail:tail.0] file=/var/log/terraform-enterprise/atlas.log requires a larger buffer size, lines are too long. Skipping file.

When this error happens fluent-bit crashes and doesn't pick up logs for a while with missing audit logs in the process.

Solution 1:

Upgrade your TFE environment to version 202402-1, The buffer limit has been increased from 32k to 128k.

If the issue persists after this then please check if you for example have a sentinel run output that exceeds the 128k size.

Solution 2:

If you are still missing logs then please verify if you don't have rate limiting on your journald process. See if you have following message in journald logs

journalctl | grep -i suppressed

[ 3.495723] printk: systemd: 19 output lines suppressed due to ratelimiting

Then change the limits by altering the file/etc/systemd/journald.conf

RateLimitInterval=0
RateLimitBurst=0

Restart the journald service

sudo systemctl restart systemd-journald

Additional Information

KB article related to solution 2 can be found here