Getting Started with Beats

This tutorial aims to help users who are new to Elasticsearch understand the concepts of Beats, a collection of lightweight data collectors, and deploy them quickly.

By Liu Xiaoguo, an Elastic Community Evangelist in China

Released by ELK Geek

Elasticsearch

Elasticsearch is a distributed, open-source search and analytics engine that supports all types of data, including text, numbers, geospatial data, structured data, and unstructured data. Elasticsearch was built based on Apache Lucene and was first released by Elasticsearch N.V. (now known as Elastic) in 2010.

Elasticsearch is famous for its simple RESTful APIs, distributed nature, speed, and scalability. It also provides a search experience with scale, speed, and relevance. These three properties differentiate Elasticsearch from other products, making Elasticsearch very popular.

Scale: Scalability refers to the capability of ingesting and processing petabytes of data. An Elasticsearch cluster is distributed. So, if more data needs to be stored, users can easily scale-out servers in the Elasticsearch cluster to meet commercial needs.

Speed: Elasticsearch allows obtaining a search result from petabytes of data in milliseconds. New data imported into Elasticsearch is searchable within one second, allowing near real-time search. In contrast, other databases may require several hours to perform a search.
Relevance: Elasticsearch allows querying text, numbers, geospatial data, and other data types using any method to obtain relevant results. Elasticsearch returns data based on the degree of data matches. Each search result has a score that represents the match relevance. Among returned data results, the result with the highest match degree is ranked first.

Elastic Stack

"ELK" is the abbreviation of three open-source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine and also a core component of the Elastic Stack. Logstash is a server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to a "storage" similar to Elasticsearch. Beats is a collection of lightweight data collectors that sends data directly to Elasticsearch or to Logstash for further processing before the data goes to Elasticsearch. Kibana allows using charts to visualize data in Elasticsearch.

Elastic Solutions

Elastic has provided many out-of-the-box solutions for the Elastic Stack. A lot of search or database companies have good products. However, to implement a solution, they need to expend a lot of effort to combine their products with products provided by other companies. So, Elastic has released the 3+1 Elastic Stack.

Elastic's three major solutions are as follows:

Enterprise search
Observability
Security

These three solutions are based on the same Elastic Stack: Elasticsearch, Logstash, and Kibana.

Beats

In centralized log entries, a data processing pipeline involves three main phases: aggregation, processing, and storage. In the previous ELK Stack, Logstash was responsible for the first two phases. Certain costs are incurred when implementing these phases. Logstash frequently encounters internal problems due to its design as well as performance issues. In complex pipelines, Logstash needs to process a large amount of data. Therefore, it would be helpful to outsource some of Logstash's responsibilities, especially transferring data extraction tasks to other tools. This idea was first reflected in Lumberjack and then in the Logstash transponder. After several development cycles, a new and improved protocol was introduced, which has become the backbone of the "Beats" family.

Beats are a collection of lightweight (resource-efficient, non-dependent, and small) log shippers with open source code. These log shippers function as proxies installed on different servers in the basic structure, where they collect logs or metrics. The collected data can be log files (Filebeat), network data (Packetbeat), server metrics (Metricbeat), or data of other types, which can be collected by the increasing number of Beats developed by Elastic and the community. Beats send collected data to Elasticsearch or Logstash for processing. Beats are built on a Go framework called libbeat, which is used for data forwarding. The community has constantly developed and contributed new Beats.

Elastic Beats

Filebeat

Filebeat is the most commonly used Beat for collecting and transferring log files. A reason why Filebeat is so efficient is the way it handles backpressure. If Logstash is busy, Filebeat will temporarily slow down its read speed.

Filebeat can be installed on almost any operating system. It may also be installed as a Docker container or used for internal modules, including default configurations and Kibaba objects of certain platforms, such as Apache, MySQL, and Docker.

I have presented several examples of how to use Filebeat in my previous articles:

Beats: Use Filebeat to Write Logs to Elasticsearch
Logstash: Import Apache Logs to Elasticsearch

Packetbeat

Packetbeat, a network data packet analyzer, was the first Beat introduced. Packetbeat captures network traffic between servers to monitor application programs and performance.

Packetbeat can be installed on monitored servers or a dedicated server. Packetbeat tracks network traffic, decodes protocols, and records data for each transaction. Packetbeat supports DNS, HTTP, ICMP, Redis, MySQL, MongoDB, Cassandra, and other protocols.

Metribeat

Metricbeat is a popular Beat that collects and reports system-level metrics of various systems and platforms. It also supports internal modules that collect statistics from specific platforms. You may use these modules and metricsets to configure how often Metricbeat collects metrics and which specific metrics to collect.

Heartbeat

Heartbeat is used for uptime monitoring. Essentially, It has a feature that detects whether services are accessible. For example, it verifies whether the uptime of service meets specific SLA requirements. Just provide a list of URLs and uptime metrics to Heartbeat. It will send the list directly to Elasticsearch or Logstash in order to send it to the stack before indexing.

Auditbeat

Auditbeat is used to audit user and process activities on a Linux server. Similar to other traditional system auditing tools, such as systemd and auditd, Auditbeat identifies security vulnerabilities, such as file changes, configuration changes, and malicious behaviors.

Winlogbeat

Winlogbeat is specifically designed to collect Windows event logs. It analyzes security events and installed updates.

Functionbeat

Functionbeat is defined as a "serverless" shipper that is deployed for collecting data and sending the data to the ELK Stack. Functionbeat is designed to monitor cloud environments. It has been tailored for Amazon and can be deployed as an Amazon Lambda function to collect data from Amazon CloudWatch, Kinesis, and SQS.

Incorporation of Beats in Elastic Stack

Currently, three methods are used to import data of interest into Elasticsearch.

As shown in the preceding figure, these methods are as follows:

1) Beats: Use Beats to import data into Elasticsearch.
2) Logstash: Use Logstash to import data into Elasticsearch. The Logstash data source can also be Beats.
3) RESTful APIs: Import data into Elasticsearch using APIs provided by Elastic, such as Java, Python, Go, and Node.js APIs.

Next, let's see how Beats work with other Elastic Stack components. The following block diagram shows the interworking between different Elastic Stack components.

As shown in the preceding figure, Beats data can be imported into Elasticsearch using any one of the following three methods:

Beats > Elasticsearch: Directly transmit Beats data to Elasticsearch. This is a popular solution in many scenarios and can provide more powerful features when combined with the pipelines provided by Elasticsearch.
Beats > Logstash > Elasticsearch: Use powerful filter combinations provided by Logstash to process data streams, including parsing, enrichment, conversion, deletion, and addition. For more information, see my article "Data Conversion, Analysis, Extraction, Enrichment, and Core Operations."
Beats > Kafka > Logstash > Elasticsearch: In some scenarios with uncertain data streams, such as when a large amount of data is generated at a specific point of time and Logstash cannot process it timely, use Kafka for caching. For more information, see my article "Using Kafka to Deploy Elastic Stack."

Ingest Pipeline

Among Elasticsearch nodes, there are some ingest nodes (article in Chinese). Ingest pipelines run on ingest nodes and preprocess documents as follows before indexing.

Parse, convert, and enrich data
Configure the processors to be used

As shown in the above figure, use the ingest nodes in the Elasticsearch cluster to run defined processors. For more information about these processors, see Processors on the official Elastic website.

Libbeat: Building Beats Go Framework

Libbeat is a database used for data forwarding. Beats are built on a Go framework called libbeat. Libbeat is open-source software. To view its source code, visit https://github.com/elastic/beats/tree/master/libbeat. Libbeat easily customizes a Beat for any type of data that you want to send to Elasticsearch.

To build your own beat, see the following articles:

Build Your Own Beat
Generate Your Beat

Also, refer to my article "How to Customize an Elastic Beat."

A Beat consists of two parts: data collector, and data processor and publisher. Libbeat provides the data processor and publisher.

For more information about the preceding processors, see "Define processors." Some processor examples are as follows.

- add_cloud_metadata
- add_locale
- decode_json_fields
- add_fields
- drop_event
- drop_fields
- include_fields
- add_kubernetes_metadata
- add_docker_metadata

Start Filebeat and Metricbeat

Filebeat Overview

Filebeat is a lightweight shipper for forwarding and centralizing log data. As an agent installed on a server, Filebeat monitors the log files or specified locations, collects log events, and forwards them to Elasticsearch or Logstash for indexing. Filebeat features include:

Correct Log Processing: Filebeat helps to correctly process new logs generated periodically.
Backpressure Sensitive: If logs are generated excessively, Filebeat automatically adjusts the processing speed to allow Elasticsearch to process the logs in a timely manner.
Processing Log Events at Least Once: Filebeat processes events generated for each log at least once.
Structured Logs: Filebeat processes structured log data.
Multi-line Events: Filebeat processes logs that contain multiple lines of information, such as error logs.
Conditional Filtering: Filebeat conditionally filters certain events.

When Filebeat is started, it will start one or more inputs that are found in the location specified for the log data. Filebeat starts a harvester for each log that it finds. Each harvester reads a log to obtain new content and sends the new log data to libbeat. Libbeat summarizes events and sends the summarized data to the output configured for Filebeat.

As shown in the above figure, the spooler has cached some data that can be re-sent to ensure event consumption at least once. This mechanism is also used in backpressure-sensitive scenarios. When Filebeat generates events faster than Elasticsearch can handle them, some events are cached.

Metricbeat Overview

Metricbeat is a lightweight shipper that can be installed on a server to collect metrics periodically from the operating system and services running on the server. Metricbeat collects the metrics and statistics that services collect and ships them to a specified output, such as Elasticsearch or Logstash.

Metricbeat helps to monitor the server by collecting metrics from systems and services running on the server, including:

Apache
HAProxy
MongoDB
MySQL
Nginx
PostgreSQL
Redis
System
Zookeeper

Metricbeat features the following:

Polls service APIs to collect metrics.
Effectively stores metrics in Elasticsearch.
Collects metrics of the JMX/Jolokia, Prometheus, Dropwizard, and Graphite applications.
Labels metrics collected from AWS, Docker, Kubernetes, Google Cloud, or Azure.

Metricbeat consists of modules and metricsets. Metricbeat modules define the basic logic for collecting data from specific services, such as Redis and MySQL. They also specify details about services, such as service connection, metrics collection frequency, and metrics to be collected.

Each module has one or more metricsets that acquire and construct data. A metricset does not collect each metric as a separate event but retrieves a list of relevant metrics by a single request to a remote system. For example, the Redis module provides an information metricset that collects information and statistics from Redis by running the INFO command and parsing the output.

The MySQL module also provides a status metricset that collects data from MySQL by running SHOW GLOBAL STATUS SQL queries. Relevant metricsets are combined in a single request returned by a remote server. If there are no user-enabled metricsets, most modules have default metricsets.

Metricbeat retrieves metrics periodically from the host system based on the cycle specified while configuring a module. Metricbeat reuses connections as much as possible because multiple metricsets send requests to the same service. If Metricbeat fails to connect to the host system within the specified time during timeout configuration, it will return an error. Metricbeat sends events asynchronously, so event retrieval is not acknowledged. If the configured output is unavailable, events may be lost.

Filebeat and Metricbeat Modules

The following figure shows the components of a Filebeat module.

A Filebeat module simplifies the collection, parsing, and visualization of logs in common formats

A typical Filebeat module consists of one or more filesets, such as the access and error filesets for Nginx logs. A fileset contains the following content:

Filebeat input configuration, including the default paths for finding log files. These default paths vary depending on the operating system. The Filebeat configuration is also used to combine multiple rows of events.
Elasticsearch ingest node pipeline definitions, which are used to parse log lines.
Field definitions that are used to configure the correct Elasticsearch type for each field and contain a brief description of each field.
Sample Kibana dashboard (if available), which can be used to visualize log files.

Filebeat automatically adjusts these configurations based on the user environment and loads them into the corresponding Elastic Stack components.

Other Beats modules are basically the same as the Filebeat module. At present, many modules are provided for Elasticsearch.

This article was authorized for publication by the official blog of the CSDN-Elastic China community.

Source title: Beats: Getting Started with Beats (1)

Source link: (Page in Chinese) https://elasticstack.blog.csdn.net/article/details/104432643

Alibaba Cloud One-Stop Fully-Managed Beats Service

The Alibaba Cloud fully-managed Beats collection center offers batch management of Filebeat, Metricbeat, and Heartbeat collection clients.

The Alibaba Cloud Elastic Stack is completely compatible with open-source Elasticsearch and has nine unique capabilities.

Community

Getting Started with Beats

Elasticsearch

Elastic Stack

Elastic Solutions

Beats

Elastic Beats

Filebeat

Packetbeat

Metribeat

Heartbeat

Auditbeat

Winlogbeat

Functionbeat

Incorporation of Beats in Elastic Stack

Ingest Pipeline

Libbeat: Building Beats Go Framework

Start Filebeat and Metricbeat

Filebeat Overview

Metricbeat Overview

Filebeat and Metricbeat Modules

Alibaba Cloud One-Stop Fully-Managed Beats Service

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Big Data Consulting Services for Retail Solution

Big Data Consulting for Data Technology Solution

Apsara Stack

Omnichannel Data Mid-End Solution