Fluentd Architecture and Key Concepts
Table of Contents
In the previous article, we explored what Fluentd is and why it's an essential tool for log management and data collection.
With this solid foundation, we can switch gears and take a look at some fundamental Fluentd concepts. Having a solid understanding of why and where you would want to use each of these concepts can greatly improve your experience.
Fluentd uses a plugin-based architecture, which means each component is “pluggable” and can be swapped out for custom logic as long as it adheres to Fluentd’s specifications.
Plugins are typically written in Ruby. While it is not the primary focus of this series, later on, we will take a look at how to write one.
Fluentd ships with a set of core plugins, which we'll go into more detail about shortly however, at a high level, Fluentd’s architecture can be represented as in the diagram below:
Figure 1: Fluentd architecture
Understanding input plugins
Input plugins serve as an entry point for your logs; they take logs from a given source and pass them along to the next plugin in the pipeline.
To better understand this, let's take a look at one of the most popular input plugin configurations, the tail input plugin:
<source>
@type tail
path /var/log/httpd-access.log
pos_file /var/log/td-agent/httpd-access.log.pos
tag apache.access
<parse>
@type apache2
</parse>
</source>
Every input plugin begins with a type, which is specified using the @type
directive. Next, you specify where the plugin should fetch logs from using the path
directive.
In the example above, we specify /var/log/httpd-access.log
using the pos_file
directive. The tail plugin can track the current read position within log files. If Fluentd restarts, it consults the pos_file
to determine where to resume reading from.
The tag
directive is used to identify a source in a given configuration file. We'll also take a deeper look at tags later in the series.
Finally, the <parse>
section enables Fluentd to use one of its inbuilt parser plugins. Recall, in the diagram above we showed that logs are sent to a parser before filtering. Fluentd supports multiple parsers, of which apache2 is one of them. Other supported parsers include JSON, Nginx, and msgpack
to name a few.
Filter plugins
Filter plugins are an optional but powerful part of Fluentd’s pipeline. They allow you to manipulate and transform before it proceeds further downstream. While you may not always need them, filter plugins can be invaluable when you want to ensure only relevant or processed logs reach their final destination.
Let’s take a look at a configuration example using the grep
filter plugin:
<filter foo.bar>
@type grep
regexp1 message cool
</filter>
In this example, the <filter>
directive defines a filter plugin. Similar to input plugins, every filter plugin begins with a @type
directive to specify the type of filter. Here, the @type grep
plugin is used to filter logs based on patterns.
The foo.bar
tag determines which logs this filter applies to. Fluentd matches this tag with logs processed earlier in the pipeline—typically from an input plugin. If the tag matches, the filter processes the logs.
The regexp1
directive defines a regular expression to apply to log entries. In this case, it checks if the message
field contains the word cool
. Logs that do not meet this condition are discarded, ensuring that only relevant logs continue through the pipeline.
Why use filter plugins?
Filter plugins are optional because not every use case requires filtering or transformation. However, here are a few scenarios where they become important:
- Log Reduction: Reducing noise by discarding unnecessary logs.
- Enrichment: Adding metadata or modifying log entries before storage.
- Anonymizing or masking sensitive data for security and compliance purposes.
Output plugins
Output plugins are arguably the most critical part of Fluentd's plugin architecture. They are responsible for writing logs to their final destination, which can range from local files to cloud storage or databases.
Fluentd supports various output modes to suit different use cases, including Non-Buffered, Synchronous Buffered, and Asynchronous Buffered modes.
- Non-Buffered Mode: This mode writes logs immediately to the destination without any intermediate buffering. It’s simple but may not be optimal for high-throughput scenarios.
- Synchronous Buffered Mode: In this mode, Fluentd stages log data into chunks and queue them for delivery. The behavior of the buffer is configured in the
<buffer>
section. - Asynchronous Buffered Mode: This mode also stages and queues log chunks but asynchronously commits them to the destination. It’s designed for performance and scalability, especially with remote or cloud storage systems.
The diagram below shows how the buffer operates in the pipeline:
Figure 2: Fluentd buffer in the pipeline
Using the S3 output plugin
While the built-in Fluentd file
output plugin is easy to configure, scaling systems often require a more durable and distributed storage solution.
For this reason, cloud storage like Amazon S3, becomes an attractive choice. Let’s look at an example configuration for the S3 output plugin:
<match pattern>
@type s3
aws_key_id YOUR_AWS_KEY_ID
aws_sec_key YOUR_AWS_SECRET_KEY
s3_bucket YOUR_S3_BUCKET_NAME
s3_region ap-northeast-1
path logs/
# if you want to use ${tag} or %Y/%m/%d/ like syntax in path / s3_object_key_format,
# need to specify tag for ${tag} and time for %Y/%m/%d in <buffer> argument.
<buffer tag,time>
@type file
path /var/log/fluent/s3
timekey 3600 # 1 hour partition
timekey_wait 10m
timekey_use_utc true # use utc
chunk_limit_size 256m
</buffer>
</match>
Every output plugin begins with a type specified using the @type
directive. In the example above, the @type s3
plugin is used to send logs to an Amazon S3 bucket.
Next, you provide credentials for AWS authentication using the aws_key_id
and aws_sec_key
directives. These should correspond to your AWS account and are required to write logs to the specified S3 bucket.
The s3_bucket
directive identifies the bucket where Fluentd will store logs, while the path
directive specifies the folder structure within the bucket. For instance, in this example, logs are stored under the logs/
folder.
The <buffer>
section is where Fluentd’s buffering capabilities come into play. This section defines how logs are staged before being uploaded to S3. For example:
- The
timekey
directive specifies a time-based partitioning interval. In this configuration, Fluentd creates a new file every hour (3600
seconds). - The
timekey_wait
directive ensures logs are delayed by 10 minutes to capture late-arriving events before they are flushed. - The
chunk_limit_size
defines the maximum size of each buffer chunk. Here, a limit of 256MB ensures optimal use of resources and minimizes API calls to S3.
Finally, the path
directive within <buffer>
specifies the local path where Fluentd stages logs temporarily. If Fluentd restarts or fails, it can resume from this staging area, ensuring no logs are lost.
Recall from the diagram above that logs pass through a buffer stage before reaching their destination. The S3 output plugin takes full advantage of Fluentd's buffering modes, making it a reliable option for production environments.
Looking ahead
Now that we have a solid grasp of Fluentd's architecture and key components, the next logical step is to get hands-on with Fluentd.
In the upcoming part of this series, we will guide you through installing Fluentd on various platforms. This will help solidify your understanding and get you ready to start implementing Fluentd in your environment.
Like this article? Sign up for our newsletter below and become one of over 1000 subscribers who stay informed on the latest developments in the world of DevOps. Subscribe now!
The Practical DevOps Newsletter
Your weekly source of expert tips, real-world scenarios, and streamlined workflows!