Introduction
This occurs when using instance profile credentials for object storage on AWS which are inherited by the tfe-fluent-bit container.
Prerequisites
- AWS
- TFE release v202112-1 or later (fluent-bit 1.8.10)
- Log forwarding enabled with S3/Cloudwatch output plugin
Problem
- The fluent-bit container hangs when using S3/Cloudwatch output plugin, for confirmation you may enable debug level logging on fluent-bit and restart the container.
-
# Edit the config file for fluent-bit inside the container. If you are container is rebooting this process is time sensitive.
# If vim is not installed replace with an available editor, ex: vi or nano.
sudo vim $(docker inspect -f '{{json .HostConfig.Binds}}' tfe-fluent-bit|jq .[0]| awk -F':' '{print $1}'|sed 's/"//g')
# If jq is not installed use the command below instead
sudo vim $(docker inspect -f '{{index (index .HostConfig.Binds) 0}}' tfe-fluent-bit| awk -F':' '{print $1}'|sed 's/"//g')
# At the end of the [SERVICE] block add the log level line as follows
[SERVICE]
Parsers_File /fluent-bit/etc/parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
Health_Check On
Log_Level debug
# Save the changes and restart the container
docker restart tfe-fluent-bit
# Check the logs to confirm the issue is present.(Relevant logs colored-italicized)
docker logs -f tfe-fluent-bit
Fluent Bit v1.8.10
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2022/01/30 09:15:34] [ info] Configuration:
[2022/01/30 09:15:34] [ info] flush time | 5.000000 seconds
[2022/01/30 09:15:34] [ info] grace | 5 seconds
[2022/01/30 09:15:34] [ info] daemon | 0
[2022/01/30 09:15:34] [ info] ___________
[2022/01/30 09:15:34] [ info] inputs:
[2022/01/30 09:15:34] [ info] systemd
[2022/01/30 09:15:34] [ info] ___________
[2022/01/30 09:15:34] [ info] filters:
[2022/01/30 09:15:34] [ info] modify.0
[2022/01/30 09:15:34] [ info] ___________
[2022/01/30 09:15:34] [ info] outputs:
[2022/01/30 09:15:34] [ info] cloudwatch_logs.0
[2022/01/30 09:15:34] [ info] ___________
[2022/01/30 09:15:34] [ info] collectors:
[2022/01/30 09:15:34] [ info] [engine] started (pid=25)
[2022/01/30 09:15:34] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2022/01/30 09:15:34] [debug] [storage] [cio stream] new stream registered: systemd.0
[2022/01/30 09:15:34] [ info] [storage] version=1.1.5, initializing...
[2022/01/30 09:15:34] [ info] [storage] in-memory
[2022/01/30 09:15:34] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/01/30 09:15:34] [ info] [cmetrics] version=0.2.2
[2022/01/30 09:15:34] [debug] [input:systemd:systemd.0] add filter: _SYSTEMD_UNIT=docker.service (or)
[2022/01/30 09:15:34] [debug] [input:systemd:systemd.0] jump to the end of journal and skip 1 last entries
[2022/01/30 09:15:34] [ warn] [input:systemd:systemd.0] seek_cursor failed
[2022/01/30 09:15:34] [debug] [input:systemd:systemd.0] sd_journal library may truncate values to sd_journal_get_data_threshold() bytes: 65536
[2022/01/30 09:15:34] [debug] [filter:modify:modify.0] Initialized modify filter with 0 conditions and 1 rules
[2022/01/30 09:15:34] [debug] [cloudwatch_logs:cloudwatch_logs.0] created event channels: read=25 write=26
[2022/01/30 09:15:34] [debug] [aws_credentials] Initialized Env Provider in standard chain
[2022/01/30 09:15:34] [debug] [aws_credentials] Initialized AWS Profile Provider in standard chain
[2022/01/30 09:15:34] [debug] [aws_credentials] Not initializing EKS provider because AWS_ROLE_ARN was not set
[2022/01/30 09:15:34] [debug] [aws_credentials] Not initializing ECS Provider because AWS_CONTAINER_CREDENTIALS_RELATIVE_URI is not set
[2022/01/30 09:15:34] [debug] [aws_credentials] Initialized EC2 Provider in standard chain
[2022/01/30 09:15:34] [debug] [aws_credentials] Sync called on the STS provider
[2022/01/30 09:15:34] [debug] [aws_credentials] Sync called on the EC2 provider
[2022/01/30 09:15:34] [debug] [aws_credentials] Init called on the STS provider
[2022/01/30 09:15:34] [debug] [aws_credentials] Init called on the env provider
[2022/01/30 09:15:34] [debug] [aws_credentials] Init called on the profile provider
[2022/01/30 09:15:34] [debug] [aws_credentials] Reading shared config file.
[2022/01/30 09:15:34] [debug] [aws_credentials] Shared config file /root/.aws/config does not exist
[2022/01/30 09:15:34] [debug] [aws_credentials] Reading shared credentials file.
[2022/01/30 09:15:34] [debug] [aws_credentials] Shared credentials file /root/.aws/credentials does not exist
[2022/01/30 09:15:34] [debug] [aws_credentials] Init called on the EC2 IMDS provider
[2022/01/30 09:15:34] [debug] [aws_credentials] requesting credentials from EC2 IMDS
[2022/01/30 09:15:34] [debug] [http_client] not using http_proxy for header
[2022/01/30 09:15:34] [debug] [http_client] server 169.254.169.254:80 will close connection #27
[2022/01/30 09:15:34] [debug] [aws_client] (null): http_do=0, HTTP Status: 401
[2022/01/30 09:15:34] [debug] [http_client] not using http_proxy for header
Cause
- The default value of HttpPutResponseHopLimit is 1 under the MetadataOptions and the connection has to traverse the ptfe_outbound_http_proxy(tfe-outbound-http-proxy for TFE v202205-1 and later) requiring 2 hops.
-
# Verify the value of HttpPutResponseLimit
aws ec2 describe-instances --instance-id i-XXXXXX |jq -r '.Reservations[].Instances[].MetadataOptions'
{
"State": "applied",
"HttpTokens": "optional",
"HttpPutResponseHopLimit": 1,
"HttpEndpoint": "enabled",
"HttpProtocolIpv6": "disabled",
"InstanceMetadataTags": "disabled"
}
Solution
- Change the value of HttpPutResponseLimit to 2 and restart the fluent-bit container. Check fluent-bit logs.
-
# Change the setting via the awscli.
aws ec2 modify-instance-metadata-options --instance-id i-XXXXX --http-put-response-hop-limit 2
docker restart tfe-fluent-bit
docker logs -f tfe-fluent-bit
# Verify the stream processor has started
[2022/02/01 16:02:59] [debug] [http_client] not using http_proxy for header
[2022/02/01 16:02:59] [debug] [http_client] server 169.254.169.254:80 will close connection #34
[2022/02/01 16:02:59] [debug] [aws_credentials] Requesting credentials for instance role $your_role
[2022/02/01 16:02:59] [debug] [imds] using IMDSv2
[2022/02/01 16:02:59] [debug] [http_client] not using http_proxy for header
[2022/02/01 16:02:59] [debug] [http_client] server 169.254.169.254:80 will close connection #34
[2022/02/01 16:02:59] [debug] [aws_credentials] upstream_set called on the EC2 provider
[2022/02/01 16:02:59] [debug] [router] match rule systemd.0:s3.0
[2022/02/01 16:02:59] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2022/02/01 16:02:59] [ info] [sp] stream processor started
# Verify using fluent-bit api that records are being processed successfully
curl -s http://$(docker inspect tfe-fluent-bit|jq -r '.[].NetworkSettings.Networks[].IPAddress'):2020/api/v1/metrics|jq