Simple Log Service offers four data processing solutions: processing plugins, Ingest Processor, data transformation, and Consumer Processor. This topic compares their features and scenarios to help you choose the most suitable solution for your needs.
Background information
-
Processing plug-in configuration: The Simple Log Service data collector offers various configurations for data processing. You can use processing plug-ins and SPL statements to process data on the client.
-
Ingest Processor : An ingest processor can be associated with a Logstore. By default, data written to the Logstore is processed by the ingest processor on the server-side.
-
Data transformation: Data is written to a source Logstore and then processed based on data transformation rules. The processed data is written to the destination Logstore.
-
Consum Processor: You can configure a Consum Processor to perform real-time data processing on data from a Logstore using SPL. The Consum Processor supports integration with third-party services such as SDK, Flink, and DataWorks.
Comparison of the methods
Processing plug-ins, ingest processors, data transformation, and Consum Processors cover the entire data lifecycle: before storage (at collection), during storage (at write time), and after storage. They have similarities. For example, they can all process data and support the SPL language. However, these data processing methods differ in their specific use cases and capabilities.
|
Comparison dimension |
Processing plug-in |
Ingest processor |
Data transformation |
Consum Processor |
|
Data processing stage |
Before storage (during data collection). |
During storage. |
After storage. |
After storage. |
|
Write to multiple Logstores |
This is not supported by a single collection configuration. You can use multiple collection configurations together with processing plug-ins. |
Not supported. |
Supported. |
Not supported. |
|
SPL |
Supported. |
Supported. |
Supported. |
Supported. |
|
Supported SPL instructions |
Supports instructions that process single-line data. The input is one row of data, and the output is zero or one row of results. |
Supports instructions that process single-line data. The input is one row of data, and the output is zero or one row of results. |
Supports complete SPL instructions. |
Supports complete SPL instructions. |
|
No sensitive data written to disks |
Supported. |
Supported. |
Not supported. Data is written to the source Logstore. |
Not supported. Data is written to the source Logstore. |
|
Resource usage |
Consumes some client resources. |
Server-side resources are automatically scaled. This process is transparent to users. |
Server-side resources are automatically scaled. This process is transparent to users. |
Server-side resources are automatically scaled. This process is transparent to users. |
|
Performance impact |
Collection performance is slightly affected by the number of plug-ins and the complexity of the configurations. Data write performance is not affected. |
Write performance is slightly affected by the complexity of the data and SPL statements. The latency of a single request can increase by several milliseconds to tens of milliseconds, depending on the size of the requested data packet and the complexity of SPL statements. |
The write performance of the source Logstore is not affected. |
The write performance of the source Logstore is not affected. |
|
Scenario coverage |
Numerous. |
Normal. |
There are many. |
Multiple |
|
Cost |
No SLS data processing fees are charged, but some client resources are consumed. |
Data processing fees are charged. In data filtering scenarios, this fee is typically lower than the cost saved from reduced data traffic and storage. |
Source Logstore fees and data processing fees are charged. You can set the data retention period of the source Logstore to one day and disable indexing to reduce the cost of the source Logstore. |
Source Logstore fees and data processing fees are charged. You can set the data retention period of the source Logstore to one day and disable indexing to reduce the cost of the source Logstore. |
|
Fault tolerance |
In the plug-in, you can configure whether to retain the original fields if processing fails. |
You can configure whether to retain the original data if processing fails. |
Because the source data is already stored, you can choose to reprocess the data if a transformation rule fails. You can also create multiple data transformation jobs to process data separately. |
Because the source data is already stored, Flink, DataWorks, and SDK consumer groups that integrate SPL consumption rules automatically retry when errors occur. |
Differences in capabilities inform the comparison of solutions for Ingest Processors, Logtail processing configurations, and data transformation in typical scenarios.
|
Scenario |
Logtail processing configuration |
Ingest processor |
Data transformation |
Consumer processor |
|
Simple data processing, such as single-line data processing that does not involve complex computational logic. |
Recommended |
Recommended |
Recommended |
Recommended |
|
Complex data processing that involves complex computational logic or requires multiple conditions, window aggregation, or dimension table enrichment. |
General |
General |
Recommended |
Recommended |
|
Limited client resources, such as when the compute resources available to Logtail are limited. |
General |
Recommended |
Recommended |
Recommended |
|
Limited client-side control, such as no permission to modify collection-side Logtail configurations or SDK write logic. |
Not recommended |
Recommended |
Recommended |
Recommended |
|
Limited server-side control, such as no permission to modify Logstore or data transformation configurations. |
Recommended |
Not recommended |
Not recommended |
Not recommended |
|
Sensitive to data write latency and performance, such as when you want raw data to be collected as quickly as possible. |
General |
General |
Recommended |
Recommended |
|
Data masking, and sensitive data can be written to disks. |
Recommended |
Recommended |
Recommended |
Recommended |
|
Data masking, and sensitive data cannot be written to disks. |
Recommended |
Recommended |
Not recommended |
Not recommended |
|
Data enrichment that does not depend on external data sources, such as adding a new field whose value is static or extracted from an existing field. |
General |
Recommended |
Recommended |
Recommended |
|
Data enrichment that depends on external data sources, such as querying a MySQL table for other enrichment data based on a log field. |
Not recommended |
Not recommended |
Recommended |
Recommended |
|
Data distribution, which writes data to different Logstores based on different conditions. |
General |
Not recommended |
Recommended |
Not recommended |
|
Data filtering, where raw data is not required, to save costs. |
General |
Recommended |
General |
General |