edit-icon download-icon

Comparison among log collection tools: Logstash, fluentd, and Logtail

Last Updated: Apr 19, 2018

Assessments of log collection clients

Hundreds of millions of servers, mobile terminals, and network devices generate massive logs every day in the DT era. Centralized log processing solution effectively supports log consumption in the entire lifecycle. The first step is to collect logs from devices to the cloud.

logtail_centralize_arch

Three log collection tools

  • Logstash

    • Logstash is the “L” of the ELK stack, which is famous in the open source community. It plays an active role in the community and supports many plug-ins in the ecosystem.
    • Logstash is implemented based on JRuby and can be run on JVM across platforms.
    • Its modular design delivers high scalability and interoperability.
  • Fluentd

    • Fluentd is a popular log collection tool in the open source community. td-agent, the commercial version of fluentd, is maintained by Treasure Data and is assessed in this document.
    • Fluentd is implemented based on CRuby and re-implements the components essential for performance by using the C language. The overall performance is good.
    • Fluentd features concise design and provides high reliability for the data transfer in the pipeline.
    • Compared with Logstash, fluentd supports fewer plug-ins.
  • Logtail
    • Logtail is the producer of Alibaba Cloud Log Service and has been widely applied in massive big data scenarios of Alibaba Group for more than three years.
    • Logtail is implemented by using the C++ language and delivers good performance after great efforts made to improve its stability, resource control capability, and management.
    • Compared with the community support of Logstash and fluentd, Logtail is dedicated to log collection with lower functional variety.

Function comparison

Function Logstash Fluentd Logtail
Log reading Polling Polling Event triggered
File rotation Supported Supported Supported
Failover (local checkpoint) Supported Supported Supported
General log parsing Grok parsing (based on a regular expression) Parsing based on a regular expression Parsing based on a regular expression
Specific log type Supports mainstream formats such as delimiter, key-value, and JSON Supports mainstream formats such as delimiter, key-value, and JSON Supports mainstream formats such as delimiter, key-value, and JSON
Data compression before being sent Supported by using plug-ins Supported by using plug-ins LZ4
Data filter Supported Supported Supported
Buffer-based data transfer Supported by using plug-ins Supported by using plug-ins Supported
Transfer exception handling Supported by using plug-ins Supported by using plug-ins Supported
Running environment JRuby implementation with JVM environment dependency CRuby and C implementation with Ruby environment dependency C++ implementation without special requirements
Thread support Multiple threads Multiple threads restricted by GIL Multiple threads
Hot upgrade Not supported Not supported Supported
Centralized configuration management Not supported Not supported Supported
Self-detection of the running status Not supported Not supported Supports CPU/memory threshold protection

Log file collection – Performance comparison

Log sample: Take the Nginx access logs as an example. The following log is a 365-byte Nginx access log with 14 structured fields.

logtail_nginx_access_log

The following test repeatedly writes the log to a file at different simulated pressures. The time field of each log is set to the system time when the log is written, and the other 13 fields are the same for all logs.

The log parsing process in the simulated scenario is the same as that in the actual scenario, except that the network traffic generated by the write operation is reduced due to a relatively high data compression rate.

Logstash

logstash-2.0.0 parses logs by using grok and writes parsed logs to Kafka (which has built-in plug-ins and enables Gzip compression).

Log parsing configuration:

  1. grok {
  2. patterns_dir=>"/home/admin/workspace/survey/logstash/patterns"
  3. match=>{ "message"=>"%{IPORHOST:ip} %{USERNAME:rt} - \[%{HTTPDATE:time}\] \"%{WORD:method} %{DATA:url}\" %{NUMBER:status} %{NUMBER:size} \"%{DATA:ref}\" \"%{DATA:agent}\" \"%{DATA:cookie_unb}\" \"%{DATA:cookie_cookie2}\" \"%{DATA:monitor_traceid}\" %{WORD:cell} %{WORD:ups} %{BASE10NUM:remote_port}" }
  4. remove_field=>["message"]
  5. }

Test results

Write TPS Write traffic (KB/s) CPU usage (%) Memory usage (MB)
500 178.22 22.4 427
1000 356.45 46.6 431
5000 1782.23 221.1 440
10000 3564.45 483.7 450

Fluentd

td-agent-2.2.1 parses logs by using a regular expression and writes parsed logs to Kafka (which has the third-party plug-in fluent-plugin-kafka and enables Gzip compression).

Log parsing configuration:

  1. <source>
  2. type tail
  3. format /^(?<ip>\S+)\s(?<rt>\d+)\s-\s\[(?<time>[^\]]*)\]\s"(?<url>[^\"]+)"\s(?<status>\d+)\s(?<size>\d+)\s"(?<ref>[^\"]+)"\s"(?<agent>[^\"]+)"\s"(?<cookie_unb>\d+)"\s"(?<cookie_cookie2>\w+)"\s"(?
  4. <monitor_traceid>\w+)"\s(?<cell>\w+)\s(?<ups>\w+)\s(?<remote_port>\d+).*$/
  5. time_format %d/%b/%Y:%H:%M:%S %z
  6. path /home/admin/workspace/temp/mock_log/access.log
  7. pos_file /home/admin/workspace/temp/mock_log/nginx_access.pos
  8. tag nginx.access
  9. </source>

Test results

Write TPS Write traffic (KB/s) CPU usage (%) Memory usage (MB)
500 178.22 13.5 61
1000 356.45 23.4 61
5000 1782.23 94.3 103

Note: A single process of fluentd uses at most one CPU core due to GIL limits. You can use the multiprocess plug-in to support higher log throughput in multiple processes.

Logtail

Logtail 0.9.4 performs log structuring by using a regular expression and writes LZ4-compressed data to Alibaba Cloud Log Service in the HTTP protocol. The batch_size is set to 4000.

Log parsing configuration:

  1. logRegex : (\S+)\s(\d+)\s-\s\[([^]]+)]\s"([^"]+)"\s(\d+)\s(\d+)\s"([^"]+)"\s"([^"]+)"\s"(\d+)"\s"(\w+)"\s"(\w+)"\s(\w+)\s(\w+)\s(\d+).*
  2. keys : ip,rt,time,url,status,size,ref,agent,cookie_unb,cookie_cookie2,monitor_traceid,cell,ups,remote_port
  3. timeformat : %d/%b/%Y:%H:%M:%S

Test results

Write TPS Write traffic (KB/s) CPU usage (%) Memory usage (MB)
500 178.22 1.7 13
1000 356.45 3 15
5000 1782.23 15.3 23
10000 3564.45 31.6 25

Single-core processing capability comparison

1

Conclusion

Logstash, fluentd, and Logtail have their own features as follows:

  • Logstash supports all the mainstream log types, diverse plug-ins, and flexible customization, but has relatively low performance and is prone to high memory usage because of JVM.
  • Fluentd supports all the mainstream log types and many plug-ins, and delivers good performance.
  • Logtail occupies the least CPU and memory resources of the machine, delivers good performance throughput, and provides full support for common log collection scenarios. However, it has no plug-in support and delivers lower flexibility and scalability than Logstash and fluentd.
Thank you! We've received your feedback.