fluentbit_ metrics stop being sent to prometheus_remote_write output about 1 hour after start

## Bug Report

Approximately one hour after fluentbit starts, all `fluentbit_` internal metrics begin to be omitted from what is written with the `prometheus_remote_write` output.  These are all of the metrics from the `fluentbit_metrics` input.  This continues indefinitely, until fluentbit is restarted; these metrics never start getting written again by the existing process.

Metrics from any other inputs that produce metrics, such as `prometheus_scrape` and `prometheus_textfile`, continue to be sent normally.  Also, if a `prometheus_exporter` output is configured, the `fluentbit_metrics` metrics *are* still exported there.

**To Reproduce**
I can reproduce this with a minimal configuration, running on my local macbook.
After starting up Victoria Metrics listening on localhost:8428, I run fluent-bit with this config:

```
---
service:
  flush: 1
  daemon: Off
  log_level: debug
  # Enable/Disable the built-in HTTP Server for metrics
  http_server: Off
  http_listen: 127.0.0.1
  http_port: 2020

pipeline:
  inputs:
    - name: fluentbit_metrics
      tag: metrics_fluentbit
      scrape_interval: 60s

  outputs:
    - name: prometheus_remote_write
      match: 'metrics_*'
      host: localhost
      port: 8428
      uri: /api/v1/write
      retry_limit: 2
      log_response_payload: True
      tls: Off
      add_label: job fluentbit2
```

Metrics such as `fluentbit_output_upstream_total_connections` and `fluentbit_build_info` begin appearing immediately, but cease after approximately one hour.  After that time, fluentbit continues to log that it is sending prometheus remote writes, and continues to log ` HTTP status=204` and `FLB_OK`, but those metrics cease.

If I add an additional input with any other metrics, those metrics continue to be sent.  For example, I created a file `/tmp/node_info.prom` with a single static metric, and added this input to the config:

```
    - name: prometheus_textfile
      tag: metrics_textfile
      path: /tmp/node_info.prom
      scrape_interval: 60s
```

After the `fluentbit_` metrics ceased, this one additional metric continued to be sent for as long as the fluentbit process ran, which was more than a day in a couple of my tests.

**Your Environment**

* Version used: 4.0.3, 4.0.8, 4.1.1 (I reproduced with the same minimal config in all three of these versions)
* Configuration: See above
* Server type and version: Macbook Pro, and AWS EC2
* Operating System and version: macOS Sequoia 15.7.1 and Amazon Linux 2023

**Additional context**
We started observing this issue earlier this month.  We're using fluentbit metrics to monitor fluentbit and possibly alert on problems, but can no longer do so because these metrics are no longer being sent consistently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fluentbit_ metrics stop being sent to prometheus_remote_write output about 1 hour after start #11082

Bug Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fluentbit_ metrics stop being sent to prometheus_remote_write output about 1 hour after start #11082

Description

Bug Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions