EverythingDevOps

In the previous article, we explored what Fluentd is and why it's an essential tool for log management and data collection.

With this solid foundation, we can switch gears and take a look at some fundamental Fluentd concepts. Having a solid understanding of why and where you would want to use each of these concepts can greatly improve your experience.

Fluentd uses a plugin-based architecture, which means each component is “pluggable” and can be swapped out for custom logic as long as it adheres to Fluentd’s specifications.

Plugins are typically written in Ruby. While it is not the primary focus of this series, later on, we will take a look at how to write one.

Fluentd ships with a set of core plugins, which we'll go into more detail about shortly however, at a high level, Fluentd’s architecture can be represented as in the diagram below:

Figure 1: Fluentd architecture

Understanding input plugins

Input plugins serve as an entry point for your logs; they take logs from a given source and pass them along to the next plugin in the pipeline.

To better understand this, let's take a look at one of the most popular input plugin configurations, the tail input plugin:

<source>
 @type tail
 path /var/log/httpd-access.log
 pos_file /var/log/td-agent/httpd-access.log.pos
 tag apache.access
 <parse>
   @type apache2
 </parse>
</source>

‍Every input plugin begins with a type, which is specified using the @type directive. Next, you specify where the plugin should fetch logs from using the path directive.

In the example above, we specify /var/log/httpd-access.log using the pos_file directive. The tail plugin can track the current read position within log files. If Fluentd restarts, it consults the pos_file to determine where to resume reading from.

The tag directive is used to identify a source in a given configuration file. We'll also take a deeper look at tags later in the series.

Finally, the <parse> section enables Fluentd to use one of its inbuilt parser plugins. Recall, in the diagram above we showed that logs are sent to a parser before filtering. Fluentd supports multiple parsers, of which apache2 is one of them. Other supported parsers include JSON, Nginx, and msgpack to name a few.

Filter plugins

Filter plugins are an optional but powerful part of Fluentd’s pipeline. They allow you to manipulate and transform before it proceeds further downstream. While you may not always need them, filter plugins can be invaluable when you want to ensure only relevant or processed logs reach their final destination.

Let’s take a look at a configuration example using the grep filter plugin:

<filter foo.bar>
 @type grep
 regexp1 message cool
</filter>

In this example, the <filter> directive defines a filter plugin. Similar to input plugins, every filter plugin begins with a @type directive to specify the type of filter. Here, the @type grep plugin is used to filter logs based on patterns.

The foo.bar tag determines which logs this filter applies to. Fluentd matches this tag with logs processed earlier in the pipeline—typically from an input plugin. If the tag matches, the filter processes the logs.

The regexp1 directive defines a regular expression to apply to log entries. In this case, it checks if the message field contains the word cool. Logs that do not meet this condition are discarded, ensuring that only relevant logs continue through the pipeline.

Why use filter plugins?

Filter plugins are optional because not every use case requires filtering or transformation. However, here are a few scenarios where they become important:

Log Reduction: Reducing noise by discarding unnecessary logs.
Enrichment: Adding metadata or modifying log entries before storage.
Anonymizing or masking sensitive data for security and compliance purposes.

Output plugins

Output plugins are arguably the most critical part of Fluentd's plugin architecture. They are responsible for writing logs to their final destination, which can range from local files to cloud storage or databases.

Fluentd supports various output modes to suit different use cases, including Non-Buffered, Synchronous Buffered, and Asynchronous Buffered modes.

Non-Buffered Mode: This mode writes logs immediately to the destination without any intermediate buffering. It’s simple but may not be optimal for high-throughput scenarios.
Synchronous Buffered Mode: In this mode, Fluentd stages log data into chunks and queue them for delivery. The behavior of the buffer is configured in the <buffer> section.
Asynchronous Buffered Mode: This mode also stages and queues log chunks but asynchronously commits them to the destination. It’s designed for performance and scalability, especially with remote or cloud storage systems.

The diagram below shows how the buffer operates in the pipeline:

Figure 2: Fluentd buffer in the pipeline

Using the S3 output plugin

While the built-in Fluentd file output plugin is easy to configure, scaling systems often require a more durable and distributed storage solution.

For this reason, cloud storage like Amazon S3, becomes an attractive choice. Let’s look at an example configuration for the S3 output plugin:

<match pattern>
 @type s3

 aws_key_id YOUR_AWS_KEY_ID
 aws_sec_key YOUR_AWS_SECRET_KEY
 s3_bucket YOUR_S3_BUCKET_NAME
 s3_region ap-northeast-1
 path logs/
 # if you want to use ${tag} or %Y/%m/%d/ like syntax in path / s3_object_key_format,
 # need to specify tag for ${tag} and time for %Y/%m/%d in <buffer> argument.
 <buffer tag,time>
   @type file
   path /var/log/fluent/s3
   timekey 3600 # 1 hour partition
   timekey_wait 10m
   timekey_use_utc true # use utc
   chunk_limit_size 256m
 </buffer>
</match>

‍

Every output plugin begins with a type specified using the @type directive. In the example above, the @type s3 plugin is used to send logs to an Amazon S3 bucket.

Next, you provide credentials for AWS authentication using the aws_key_id and aws_sec_key directives. These should correspond to your AWS account and are required to write logs to the specified S3 bucket.

The s3_bucket directive identifies the bucket where Fluentd will store logs, while the path directive specifies the folder structure within the bucket. For instance, in this example, logs are stored under the logs/ folder.

The <buffer> section is where Fluentd’s buffering capabilities come into play. This section defines how logs are staged before being uploaded to S3. For example:

The timekey directive specifies a time-based partitioning interval. In this configuration, Fluentd creates a new file every hour (3600 seconds).
The timekey_wait directive ensures logs are delayed by 10 minutes to capture late-arriving events before they are flushed.
The chunk_limit_size defines the maximum size of each buffer chunk. Here, a limit of 256MB ensures optimal use of resources and minimizes API calls to S3.

Finally, the path directive within <buffer> specifies the local path where Fluentd stages logs temporarily. If Fluentd restarts or fails, it can resume from this staging area, ensuring no logs are lost.

Recall from the diagram above that logs pass through a buffer stage before reaching their destination. The S3 output plugin takes full advantage of Fluentd's buffering modes, making it a reliable option for production environments.

Looking ahead

Now that we have a solid grasp of Fluentd's architecture and key components, the next logical step is to get hands-on with Fluentd.

In the upcoming part of this series, we will guide you through installing Fluentd on various platforms. This will help solidify your understanding and get you ready to start implementing Fluentd in your environment.

Like this article? Sign up for our newsletter below and become one of over 1000 subscribers who stay informed on the latest developments in the world of DevOps. Subscribe now!

Blog categories