From ELK to EFK, combined with the new features of Flink and Elasticsearch to reconstruct the full observation plan

The figure above shows several stages of log application. At present, a large number of enterprises are in the two stages of level 3 and level 4, of which level 3 accounts for a larger part. The retrieval level means that the logs are rarely extracted structurally, and the global search of all operation and maintenance data is realized through a single platform, without correlation analysis. The analysis level has a precondition that the original text logs need to be well processed, and useful structured information can be extracted from unstructured information, which is more difficult to support when a large number of logs emerge.

If the enterprise can implement very good structured logs and generate a large number of meaningful columns, it can provide good support for analysis. But when the structure is not enough, you can only do simple search.

Many enterprise logs are huge in volume and low in quality. There are a large number of meaningless logs or meaningful logs in the application but they are not completely recorded, which does not expose the observability. In addition, log output is very random, completely unstructured, and even output text directly. When extracting important information, it can only be extracted by regular expressions. Finally, O&M and development are separated, and O&M does not understand the development output log.

The goal of many operation and maintenance personnel is to collect logs first. The collected logs only contain very limited structured fields. They only index the structured fields provided by the collector and cannot extract key information. However, when the log requirements become so simple, it is no longer necessary to index the logs. Only full-text indexing is required.

Post diagnosis, such as searching according to error code and time period, can quickly get results even if using brute force computing.

Therefore, the industry has generated a "new" force of logging, with the core of reducing costs. For example, direct brute force computing without indexing; Use the cheapest storage object on the cloud, instead of SSD, disk, etc., to improve the compression ratio, but this will certainly sacrifice some performance, because you need to decompress when querying.

However, the above ideas have obvious shortcomings. It takes some time for massive logs to be scanned without indexing. You need to index the time and tag the original text.

For the above requirements, Elasticsearch officially launched asynchronous search. After the search is distributed, you can slowly run the search in the background and retrieve the results in batches. When searching for error logs, you can first return the first 100 results, and then return the next 100 results after a certain period of time.

Another new feature of Elasticsearch is the runtime field Runtime Field. It does not build an index but can still support various searches.

Define a script (brute force computing) to get the desired information from the string, which is equivalent to a virtual field. With virtual fields, all ES query statements can query like normal fields, but the essence is still brute force computing. If the time period is reduced, the running speed can basically meet the user's needs.

In the past, the community used to snapshot long-term logs into objects, but only for backup purposes, unable to query. If you want to query, you need to restore the object to the cluster, which requires a lot of human work. The ideal situation we expect is to query directly in the object. Therefore, Elasticsearch has developed the official function Searchable Snapshot to store logs in objects. Logs become colder and colder with the life cycle and can be queried directly. However, the results cannot be returned quickly. Generally, the query is performed through asynchronous search. This function is very suitable for large enterprises to search logs for a very long period of time.

Alibaba Cloud also provides a similar function, OpenStore, which is developed by Alibaba Cloud itself. Alibaba Cloud can make more powerful optimization based on its infrastructure. Its multi-level cache implementation mechanism is different from that of the official original. It uses SSD or a spinning disk for multi-level cache, so the search speed will be better than that of the official original.

Flink can play a good role in tag indexing. The official name of ES is Logstash, but now EFK can use Flink instead of Logstash. Logstash is not a cluster, but a multi instance load balance. However, Flink's cluster performance, precise one-time delivery and other features are better than Logstash.

Therefore, Flink has advantages in many scenarios. For example, multiple lines of logs can be merged into one line through the Flink window; For example, when adding dimensions, Flink can help query a lot of structured data, complete it in the log stream, and finally generate structured information. Only when the labels and dimensions are more comprehensive can the scan of brute force computing be reduced to the minimum, so that not only does not sacrifice too much performance, but also can significantly reduce costs.

Flink is a good choice for saving indicators and reducing sampling.

EFK is Elasticsearch+Flink+Libana. Flink replaces the original Logstash, and the whole link is fully hosted, which can quickly build a log full observation solution.

When writing massive logs, Flink can handle problems such as label extraction. Flink is better than other similar products. When there are a lot of concurrent writes, the bearing capacity is also a big problem. ES provides a solution for it. In addition, OpenStore can help solve the problem of high log storage costs. At the same time, this set of open source ecological scalability is very strong, and the upstream and downstream access capabilities are very strong.

Today's IoT scenarios will generate a large number of logs, and write and storage will be under great pressure.

Alibaba Cloud provides a solution for it, with low migration costs. Based on ES and Flink, it can easily consider both the cloud and the cloud, connecting the two ends. Alibaba Cloud ES OpenStore can store long-term logs to OSS, effectively reducing costs. In addition, Flink's processing capability under high concurrent writes also provides good support.

In case of large acceleration, there must be peaks and troughs in the write traffic, and peak clipping and valley filling can be easily realized on the cloud.

Alibaba Cloud ES Indexing Service provides a massive ES pool, where traffic can enter the pool to support massive writes. The ES Indexing Service will build the index and transfer it to the ES cluster. The ES cluster only needs to provide the search capability. The ES Indexing Service can better match peak values without reserving too many ES computing instances.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us