Introduction
This article addresses an issue where the tfe-fluent-bit container in Terraform Enterprise (TFE) hangs when using instance profile credentials for log forwarding to AWS S3 or CloudWatch.
Prerequisites
- Terraform Enterprise release
v202112-1or later (which includesfluent-bitversion1.8.10). - An AWS environment for the TFE installation.
- Log forwarding enabled with the S3 or CloudWatch output plugin.
Problem
The tfe-fluent-bit container becomes unresponsive when configured to use the S3 or CloudWatch output plugin with AWS instance profile credentials.
To confirm this issue, you can enable debug logging for fluent-bit.
-
Find the path to the
fluent-bitconfiguration file on the host.## If jq is installed $ TFE_FLUENT_BIT_CONFIG=$(docker inspect -f '{{json .HostConfig.Binds}}' tfe-fluent-bit | jq .[0] | awk -F':' '{print $1}' | sed 's/"//g') ## If jq is not installed $ TFE_FLUENT_BIT_CONFIG=$(docker inspect -f '{{index (index .HostConfig.Binds) 0}}' tfe-fluent-bit | awk -F':' '{print $1}' | sed 's/"//g') -
Edit the configuration file to enable debug logging. This may be time-sensitive if the container is in a restart loop.
$ sudo vim "$TFE_FLUENT_BIT_CONFIG"
-
In the
[SERVICE]block, add theLog_Levelsetting.[SERVICE] Parsers_File /fluent-bit/etc/parsers.conf HTTP_Server On HTTP_Listen 0.0.0.0 HTTP_Port 2020 Health_Check On Log_Level debug -
Save the changes and restart the container.
$ docker restart tfe-fluent-bit
-
Check the logs for an HTTP 401 error when
fluent-bitattempts to retrieve credentials from the EC2 metadata service.$ docker logs -f tfe-fluent-bit
##... [info] [engine] started (pid=25) [debug] [aws_credentials] Initialized EC2 Provider in standard chain [debug] [aws_credentials] Sync called on the STS provider [debug] [aws_credentials] Sync called on the EC2 provider ##... [debug] [aws_credentials] Init called on the EC2 IMDS provider [debug] [aws_credentials] requesting credentials from EC2 IMDS [debug] [http_client] not using http_proxy for header [debug] [http_client] server 169.254.169.254:80 will close connection #27 [debug] [aws_client] (null): http_do=0, HTTP Status: 401 [debug] [http_client] not using http_proxy for header ##...
Cause
The EC2 instance metadata service (IMDS) has a default HttpPutResponseHopLimit of 1. The connection from the tfe-fluent-bit container must traverse the TFE outbound proxy (ptfe_outbound_http_proxy or tfe-outbound-http-proxy), which requires two hops. The request is denied because it exceeds the default hop limit.
You can verify the current hop limit setting for your instance.
$ aws ec2 describe-instances --instance-id i-XXXXXX | jq -r '.Reservations[].Instances[].MetadataOptions'
{
"State": "applied",
"HttpTokens": "optional",
"HttpPutResponseHopLimit": 1,
"HttpEndpoint": "enabled",
"HttpProtocolIpv6": "disabled",
"InstanceMetadataTags": "disabled"
}Solution
To resolve this issue, increase the HttpPutResponseHopLimit to 2 for the TFE EC2 instance.
-
Modify the instance metadata options using the AWS CLI.
$ aws ec2 modify-instance-metadata-options \ --instance-id i-XXXXX \ --http-put-response-hop-limit 2
-
Restart the
tfe-fluent-bitcontainer for the change to take effect.$ docker restart tfe-fluent-bit
-
Verify that the stream processor starts successfully by checking the logs.
$ docker logs -f tfe-fluent-bit
##... [debug] [aws_credentials] Requesting credentials for instance role $your_role [debug] [imds] using IMDSv2 ##... [debug] [aws_credentials] upstream_set called on the EC2 provider [debug] [router] match rule systemd.0:s3.0 [info] [http_server] listen iface=0.0.0.0 tcp_port=2020 [info] [sp] stream processor started
-
Optionally, use the
fluent-bitAPI to confirm that records are being processed.$ curl -s http://$(docker inspect tfe-fluent-bit | jq -r '.[].NetworkSettings.Networks[].IPAddress'):2020/api/v1/metrics | jq