When talking about real-time log analysis, people will think of using ELK (Elastic, Logstash, and Kibana) to implement it. ELK Stack is an open-source solution that has accumulated many contents and use cases in the community.
The new version of Alibaba Cloud Log Service adds enhancements of LogSearch/LogAnalytics to support real-time indexing, query, and analysis of log data and optimize query performance and data volume computing in many aspects. This document conducts a comprehensive comparison and analyzes the aspects that you pay attention.
- Ease of use: The cost when you get started and use the function.
- Functions (important): Query and analysis.
- Performance (important): The query and analysis requirements for unit data volume and how is the latency.
- Scale (important): The data volumes that can be processed and the scalability.
- Cost (important): The cost for using the same function and performance.
Ease of use
A log analysis system is used in the following procedures:
- Collection: Write data in a stable manner.
- Configuration: How to configure the data source.
- Expansion: Access more data sources and machines. Expand the storage space and machines.
- Usage: Described in the Functions section.
- Export: Whether data can be conveniently exported to other systems for operations such as streaming computing and storage in OSS for backup purposes
- Multi-tenant: The way data is shared to others and whether or not data can be used securely.
|Item||Sub item||Self-built ELK||LogSearch/Analytics|
|Collection||Protocol||Restful API||Restful API|
|Agent||Logstash/Beats/Fluentd, with rich ecosystem||Logtail (main) + Others (for example, Logstash)|
|Configuration||Unit||Use index to differentiate logs||Project + Logstore. Provide a concept of two levels. A project is considered as a namespace, and multiple Logstores can be created in a project|
|Attribute||API + kikana||API + SDK + Console|
|Expansion||Storage||Add machines and purchase cloud disks||No operation is needed|
|machine||Add machines||No operation is needed|
|Configuration||Configure Logstash and apply Logstash to machines by using the configuration management system||Perform operations in the console or by using APIs, without using the configuration management system|
|Collection point||Install configuration and Logstash on a machine group by using the configuration management system||Perform operations in the console or by using APIs, without using the configuration management system|
|Export||Method||API/SDK||API/SDK + Stream computing engines (Spark, Storm, Flink, and CloudMonitor) + Storage (OSS)|
|Multi-tenant||Safety||None (non-commercial version)||HTTPS + Transmission signature + Multi-tenant isolation + Access control|
|Decompress the file by using||Same account||Sub-account, role, product, and temporary authorization|
The ELK has many ecosystem and write tools and supports many installation and configuration tools. LogSearch/Analytics is a hosting service with a high degree of integration in terms of access, configuration, and usage. Normal users can access LogSearch/Analytics in five minutes, without worrying about the capacity and concurrency issues. The billing method is Pay-As-You-Go and auto scaling is supported.
Functions (query and analysis)
The query function enables quick hitting of logs that comply with search criteria. The analysis function performs statistics and computing of data.
For example, you have an analysis requirement that intends to collect statistics by IP address on the number and traffic of all read requests with a status code greater than 200. This analysis requirement can be converted to two operations: query specified results and perform statistical analysis of the results. In some cases, you can directly analyze all the logs without query.
1. Status in (200,500] and Method:Get* 2. select count(1) as c, sum(inflow) as sum_inflow, ip group by Ip
Comparison of query capability
|Type||Sub item||Self-built ELK||LogSearch/Analytics|
|Chinese word segmentation||Supported||Not supported|
|Geo||Geo query||Supported||Not directly supported. You can use the range query to have the same effect|
|Ip||IP address query||Supported||Not directly supported. You can use the string query to have the same effect|
Elasticsearch supports more data types and more advanced query methods. LogSearch/Analytics supports most of the common queries with unique features (for example, context query and expansion of program logs).
Comparison of analysis capability
|Type||Sub item||Self-built ELK||LogSearch/Analytics|
|Interface||Method||API/SDK||API/SDK + SQL92|
|Pipeline||Limited support||Full support|
|Arithmetic operation||Numeric value||Supported|
|Join||Join of multiple tables||Supported|
LogSearch/LogAnalytics provides a superset of functions compared with Elasticsearch and fully supports SQL-92. LogSearch/LogAnalytics can be directly used in SQL writing scenarios.
By using the same data set, compare the self-built ELK and LogSearch/Analytics in terms of data writing, data query, and aggregation.Hands-on Environment
- Test configuration
Type Self-built ELK LogSearch/Analytics Environment Elastic Compute Service (ECS) instance (4 core and 16 GB) x 4 + Efficient SSD cloud disk - Shard 10 10 Number of copies 2 3 (configured by default and invisible to users)
- Test data
The preceding fields are random. The following is a sample test log:
- Five columns of double-type data.
- Five columns of long-type data.
- Five columns of text-type data, with the dictionary sizes 256, 512, 768, 1024, and 1280, respectively.
timestamp:August 27th 2017, 21:50:19.000 long_1:756,444 double_1:0 text_1:value_136 long_2:-3,839,872,295 double_2:-11.13 text_2:value_475 long_3:-73,775,372,011,896 double_3:-70,220.163 text_3:value_3 long_4:173,468,492,344,196 double_4:35,123.978 text_4:value_124 long_5:389,467,512,234,496 double_5:-20,10.312 text_5:value_1125
- Size of the data set
- Size of raw data: 50 GB
- Size of raw data with the key removed: 27 GB (LogSearch/Analytics uses this size as the unit of storage and billing.)
- Number of log lines: 162,640,232 (about 160 million logs)
Write test results
Elasticsearch writes data in batches using the Bulk API, whereas LogSearch/LogAnalytics performs batch write using the PostLogstoreLogs API. The results are shown as follows:
|Latency||Average write latency||40 ms||14 ms|
|Storage||Data volume copied at a time||86 GB||58 GB|
|Expansion rate: Data volume/Raw data size||172%||121%|
|The storage size of LogSearch/Analytics that generates bills includes the volume of compressed raw data that has been written (23 GB) and the indexing traffic (27 GB), amounting to 50 GB in total.|
According to the test results:
- LogSearch/Analytics has a lower write latency (14 ms) than Elasticsearch (40 ms).
- Space: The size of raw data is 50 GB. The storage space expands because the test data is random. (In most real scenarios, the storage space after the compression is smaller than the size of raw data.) The storage space occupied by Elasticsearch expands to 86 GB, with an expansion rate of 172%, which is 58% more than the storage space occupied by LogSearch/Analytics.
Read (query + analysis) test
Use two common scenarios as an example: log query and aggregation. The average latency in the two cases is counted when concurrency is 1, 5, and 10, respectively.
- Perform GROUP BY calculation on any text column of full data. Calculate the avg, min, max, sum, and count values of five columns and sort the values by count. The first 1,000 results are obtained. Example:
select count(long_1) as pv,sum(long_2),min(long_3),max(long_4),sum(long_5) group by text_1 order by pv desc limit 1000
- For full data, randomly query logs by using a keyword, such as value_126. Obtain the number of logs that meet the query condition and the first 100 log lines. Example:
|Type||Number of concurrencies||Latency (unit: seconds) of Elasticsearch||Latency (unit: seconds) of LogSearch/Analytics|
|Case1: Analysis class||1||3.76||3.4|
|Case 2: Query||1||0.097||0.086|
- According to the test results, for the scale of 150 million data, both Elasticsearch and LogSearch/Analytics can query and analyze data within seconds.
- In Case 1 (statistics), Elasticsearch and Log Service are at the same performance level in terms of latency. Elasticsearch with SSD cloud disks has I/O advantage over Log Service when reading large amounts of data.
- In Case 2 (query), LogSearch/Analytics has much lower latency than Elasticsearch. As concurrency increases, the latency of the ELK increases, while that of LogSearch/Analytics remains stable and even decreases.
- LogSearch/Analytics can index petabytes of data in one day and query dozens of terabytes of data within seconds at a time, and supports auto scaling and horizontal scaling of data volume.
- Elasticsearch is applicable to writing gigabytes to terabytes of data in one day and storing terabytes of data. The main limits are as follows:
- Single cluster scale: The ideal condition is that one cluster contains about 20 machines. In the industry, one cluster can contain up to 100 nodes.
- Expansion of write capability: The write capability cannot be modified after shards are created. Nodes are dynamically expanded when the throughput rate is increased. The maximum number of nodes that can be used is the number of shards.
Storage expansion: When the primary shard reaches the upper limit of disk capacity, it must be migrated to another disk with larger capacity, or more shards must be allocated. Generally, you can create an index, specify more shards, and rebuild existing data.
LogSearch/Analytics does not have expansion issues because each shard is in distributed storage. When the throughput rate is increased, shards can be dynamically split for horizontal scaling of the processing capability.
Based on the preceding test data, this section calculates the average monthly cost in the case that 50 GB data is written on a daily basis and stored for 90 days (the actual data size is 27 GB).
- 1.The billing method of Log Service LogSearch/Analytics includes read and write traffic, indexing traffic, and storage space. The query function is free of charge. For more information, see Billing method
Billing item Value Unit price Cost (USD) Read and write traffic 23 GB x 30 USD 0.2/GB 138 Storage space (data stored for 90 days) 50 GB x 90 USD 0.3/GB x Month 1350 Indexing traffic 27 GB x 30 USD 0.0875/GB 283 In total - - 1771
- The Elasticsearch costs include the machine costs and the costs of SSD cloud disks used for data storage
- Generally, cloud disks provide high reliability. Therefore, the storage of copies is not billed.
- Generally, for storage disks, 15% available space must be reserved to avoid full space occupation by written data. Therefore, a factor of 1.15 is multiplied.
Billing item Value Unit price Cost (USD) Server Server of 4 cores and 16 GB x 4 (three months) (ecs.mn4.xlarge) Cost of monthly or yearly subscription: USD 675/month 2021 Storage 86*1.15*90 (only one copy is calculated here) SSD: USD 1/GB x Month 8901 - SATA: USD 0.35/GB x Month 3115 In total 12943 (SSD) 5135 (SATA)
With the same performance, the cost ratio of LogSearch/Analytics to the ELK (SSD) is 13.6%. During the test process, SSD is replaced with SATA to lower costs (the cost ratio of LogSearch/Analytics to the ELK with SATA is 34%). However, latency increases from 40 ms to 150 ms. After a long period of reading and writing, the query and read/write latency increases greatly and the query and analysis functions become abnormal.
Compared with the open-source ELK, LogSearch/Analytics provides the same query speed but higher throughput, more robust analysis capability, and a 87% cost reduction, with a support for the Pay-As-You-Go billing method and zero O&M, allowing you to focus on business analysis.
In addition to LogSearch/Analytics, Log Service also provides the LogHub and LogShipper functions and supports real-time data collection and interconnection with stream computing systems (Spark, Storm, and Flink) and offline analysis systems (E-MapReduce, Presto, and Hive) to provide a one-stop real-time data solution.