Understand the SAE log collection architecture in one article

The importance of log for a program is self-evident. Whether it is used as a means of troubleshooting, recording key node information, or early warning, configuring and monitoring the market, it plays a crucial role. It is an important content that needs to be recorded and viewed by every category and even every application. In the cloud native era, log collection is different from traditional log collection in both the collection scheme and the collection architecture. We have summarized some practical common problems in the process of log collection, such as:

• For applications deployed in K8s, the disk size will be far lower than that of the physical machine, and all logs cannot be stored for a long time, and there is a demand to query historical data

• Log data is very critical and cannot be lost, even after the application restarts and the instance is rebuilt

• I hope to alarm the log with some keywords and other information, and monitor the market

• Permission control is very strict. You cannot use or query log systems such as SLS, and you need to import them to your own log collection system

• The exception stack of JAVA, PHP and other applications will generate a new line. Print the stack exception into multiple lines. How to summarize and view it?

How do users use the log function to collect data in the actual production environment? In the face of different business scenarios and different business demands, which acquisition scheme is better? The Serverless App Engine SAE (Serverless App Engine), as a fully managed, maintenance-free, and highly elastic general PaaS platform, provides SLS collection, mounted NAS collection, Kafka collection, and other collection methods for users to use in different scenarios. This article will focus on the characteristics of various log collection methods and the best use scenarios to help you design an appropriate collection architecture and avoid some common problems.

SAE log collection method

SLS collection architecture

SLS log collection is a log collection scheme recommended by SAE. One-stop data acquisition, processing, query and analysis, visualization, alarm, consumption and delivery.

SAE has integrated SLS collection, which can easily collect business logs and container standard output to SLS. The architecture of SAE integrated SLS is shown in the following figure:

• SAE will mount a Sidecar of logtail (SLS collector) in the pod.

• Then share the files or paths configured by the customer to be collected with the business container and logtail Sidecar in the form of volume. This is why/home/admin cannot be configured for SAE log collection. Because the startup container of the service is placed in/home/admin, mounting volume will overwrite the startup container.

• At the same time, the data of logtail is reported through the SLS intranet address, so there is no need to open an external network.

• For Sidecar collection of SLS, in order not to affect the operation of the service container, resource limits will be set, such as the CPU limit of 0.25C and the memory limit of 100M.

SLS is suitable for most business scenarios and supports the configuration of alarm and monitoring diagrams. Most of them are suitable for selecting SLS directly.

NAS collection architecture

NAS is a distributed file system with shared access, elastic expansion, high reliability and high performance. It provides high throughput and high IOPS, and supports random read/write and online modification of files. It is suitable for log scenarios. If you want to keep more or larger logs locally, you can mount the NAS, and then point the path to save the log file to the NAS mount directory. Attaching NAS to SAE does not involve too many technical points and architectures, so I will skip the introduction here.

When NAS is used as a log collection, it can be regarded as a local disk. Even if the instance crashes and reconstructs, and other circumstances, there will be no log loss. For very important scenarios where data loss is not allowed, this scheme can be considered.

Kafka acquisition architecture

Users can also collect the contents of log files to Kafka, and then collect logs by consuming Kafka's data. Subsequent users can import the logs in Kafka into ElasticSearch or use the program to consume Kafka data for processing according to their own needs.

There are many ways to collect logs from Kafka itself, such as the most common logstack, the relatively lightweight collection component filebeat, vector, and so on. The acquisition component used by SAE is vector. The architecture of SAE integrated vector is shown in the following figure:

• SAE will mount a Sidecar of logtail (vector collector) in the pod.

• Then share the files or paths configured by the customer that need to be collected with the business container and vector sidecar in the form of volume

• Vector will regularly send the collected log data to Kafka. Vector itself has relatively rich parameter settings, which can set collection data compression, data transmission interval, collection indicators, etc.

Kafka acquisition is a supplement to SLS acquisition. In the actual production environment, some customers have very strict control over permissions. They may only have SAE permissions, but not SLS permissions. Therefore, they need to collect logs to Kafka for subsequent viewing, or they need to do secondary processing for logs, or they can choose Kafka log collection scheme.

Important parameter resolution:

• multiline.start_ Pattern is a new data point when a line that conforms to this rule is detected

• multiline.condition_ Pattern is that when a line that meets this rule is detected, it will be merged with the previous line and treated as one line

• sinks.internal_ metrics_ to_ After prom is configured, it will report the collection metadata of configuring some vectors to prometheus

Best Practices

In actual use, you can choose different log collection methods according to your own business demands. The log collection strategy of logback itself needs to limit the size and number of files, otherwise it is easier to fill the disk of pod. Taking JAVA as an example, the following configuration will retain a maximum of 7 files, each with a maximum size of 100M.

This log4j configuration is a common log rotation configuration.

There are two common log rotation modes: create mode and copy truncate mode. However, different log collection components have different levels of support for the two.

The create mode renames the original log file and creates a new log file to replace it.

1. Before writing the event log of the log, it will judge whether the maximum file capacity is reached. If it is not reached, it will complete the writing. If it is reached, it will enter the second stage

2. First close the file pointed to by the current currentlyActiveFile, then rename the original file and create a new file. The name of this file is the same as that pointed to by the previous currentlyActiveFile

3. Change the file pointed to by the currentlyActiveFile into the newly created file in phase 2

The idea of copytruncate mode is to copy the log being output, and then empty the original log.

Actual case demonstration

Here are some real scenarios in the customer's actual production environment.

A customer A sets the log of the program through log rotation and collects the log to SLS. And configure relevant alarms through keywords to monitor the market.

First, through the configuration of log4j, the log files can be kept at most 10, each with a size of 200M, and the disk monitoring can be maintained. The log files are saved in the path of/home/admin/logs. We won't introduce too much here. You can use the configuration described in the best practice scenario.

Then collect the logs into SLS through the SLS log collection function of SAE.

Finally, the alarm is configured through some keywords in the log of the program, or some other rules, such as the 200 status code ratio.

The configuration of the monitoring system is completed through the log of Nginx.

common problem

Introduction to log merge

Many times, we need to collect logs, not just one line at a time, but to combine multiple lines of logs into one line for collection, such as JAVA exception logs. At this time, you need to use the log merge function.

In SLS, there is a multi-line collection mode. This mode requires the user to set a regular expression for multi-line consolidation.

Vector acquisition also has similar parameters, multiline.start_ Pattern is used to set the regularity of a new line. If the regularity is met, it will be considered as a new line. It can be used with the multiline.mode parameter. For more parameters, see the official website of vector.

Log collection loss analysis

Both SLS acquisition and vector acquisition to Kafka ensure that the acquisition log is not lost. The collected CheckPoint information will be saved locally. In case of unexpected server shutdown, process crash and other abnormal conditions, the data will be collected from the last recorded location to ensure that the data will not be lost as much as possible.

However, this does not guarantee that the log will not be lost. In some extreme scenarios, log collection may be lost, for example:

1. The K8s pod process crashed, and the liveness failed continuously, which led to the pod reconstruction

2. The log rotates very fast, for example, once every second.

3. The log collection speed cannot reach the log generation speed for a long time.

For scenarios 2 and 3, you need to check whether your application has printed too many unnecessary logs, or whether the log rotation setting is abnormal. Because under normal circumstances, these situations should not occur. For scenario 1, if the log requirements are very strict and cannot be lost after the pod is rebuilt, you can use the mounted NAS as the log storage path, so that even after the pod is rebuilt, the log will not be lost.

Summary

This article focuses on the various log collection schemes provided by SAE, as well as the relevant architecture and scenario usage characteristics. To sum up, there are three points:

1. SLS acquisition is adaptable and practical for most scenarios

2. NAS acquisition will not be lost in any scenario, which is suitable for scenarios with very strict log requirements

3. Kafka collection is a supplement to SLS collection. There are scenarios where logs need secondary processing or SLS cannot be used due to permissions and other reasons. You can choose to collect logs to Kafka for collection and processing.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us