Simple Log Service provides four data processing methods: processing plugins, ingest processors, data transformation, and consumer processors. This topic compares the features and applicable scenarios of these methods to help you select the most suitable one.
Background information
Processing plugin configuration: Simple Log Service data collectors provide various processing configurations. These configurations support processing plugins and data processing on clients that use Structured Process Language (SPL) statements.
Ingest processor: An ingest processor can be associated with a Logstore. By default, data written to the Logstore is processed by the ingest processor on the server-side.
Data transformation: Data is first written to a source Logstore and then processed based on transformation rules. The processed data is written to a destination Logstore.
Consumer Processor: A consumer processor uses SPL to process data in real time as it is consumed from a Logstore. Consumer processors support integration with third-party services such as the SDK, Flink, and DataWorks.
Comparison
Processing plugins, ingest processors, data transformation, and consumer processors cover the entire data lifecycle, including before storage (at collection), during storage (at write), and after storage (after write). These methods have similarities. For example, they all process data and support SPL. However, they differ in their applicable scenarios and capabilities.
Comparison dimension | Processing plugin | Ingest processor | Data transformation | Consumer Processor |
Data processing stage | Before storage (during data collection). | During storage. | After storage. | After storage. |
Write to multiple Logstores | Not supported for a single collection configuration. You can use multiple collection configurations with processing plugins. | Not supported. | Supported. | Not supported. |
SPL | Supported. | Supported. | Supported. | Supported. |
Supported SPL instructions | Supports instructions for single-line data processing, which take one line of data as input and produce zero or one line of data as output. | Supports instructions for single-line data processing, which take one line of data as input and produce zero or one line of data as output. | Supports complete SPL instructions. | Supports complete SPL instructions. |
Prevents sensitive data from being written to disks | Supported. | Supported. | Not supported. Data passes through the source Logstore. | Not supported. Data passes through the source Logstore. |
Resource usage | Consumes some client resources. | Server-side resources are automatically scaled. This process is transparent to users. | Server-side resources are automatically scaled. This process is transparent to users. | Server-side resources are automatically scaled. This process is transparent to users. |
Performance impact | Collection performance is slightly affected by the number of plugins and configuration complexity. Data write performance is not affected. | Write performance is slightly affected by the complexity of the data and SPL statements. The latency of a single request can increase by several to tens of milliseconds, depending on the size of the requested data packet and the complexity of the SPL statements. | The write performance of the source Logstore is not affected. | The write performance of the source Logstore is not affected. |
Scenario coverage | High. | Standard. | There are many. | Multiple |
Cost | No SLS data processing fees are charged, but some client resources are consumed. | Data processing fees are charged. In data filtering scenarios, this fee is usually lower than the savings from reduced data traffic and storage costs. | Source Logstore fees and data processing fees are charged. You can reduce the cost of the source Logstore by setting its data retention period to one day and disabling the index. | Source Logstore fees and data processing fees are charged. You can reduce the cost of the source Logstore by setting its data retention period to one day and disabling the index. |
Fault tolerance | You can configure the plugin to retain the original fields if processing fails. | You can configure it to retain raw data if processing fails. | Because the source data is already stored, data can be reprocessed if a transformation rule fails. You can also create multiple data transformation jobs to process data separately. | Because the source data is already stored, consumer groups that integrate SPL consumption rules through Flink, DataWorks, or the SDK automatically retry when errors occur. |
The following compares the capabilities of write processors, Logtail processing configurations, and data transformation in typical scenarios:
Scenario | Logtail Processing Configuration | Ingest processor | Data transformation | Consumer Processor |
Simple data processing, such as single-line data processing that does not involve complex computational logic. | Recommended | Recommended | Recommended | Recommended |
Complex data processing, such as tasks that involve complex computational logic or require multiple conditions, window aggregation, or dimension table enrichment. | General | General | Recommended | Recommended |
Limited client resources, such as when the compute resources available to Logtail are limited. | General | Recommended | Recommended | Recommended |
Limited client-side control, such as no permission to modify Logtail configurations or SDK write logic on the client. | Not recommended | Recommended | Recommended | Recommended |
Limited server-side control, such as no permission to modify Logstore or data transformation configurations. | Recommended | Not recommended | Not recommended | Not recommended |
Sensitive to data write latency and performance, such as when raw data must be collected as soon as possible. | General | General | Recommended | Recommended |
Data masking, and sensitive data can be written to disks. | Recommended | Recommended | Recommended | Recommended |
Data masking, and sensitive data cannot be written to disks. | Recommended | Recommended | Not recommended | Not recommended |
Data enrichment that does not depend on external data sources, such as adding a new field whose value is a static field or is extracted from an existing field. | General | Recommended | Recommended | Recommended |
Data enrichment that depends on external data sources, such as querying additional enrichment data from a MySQL table based on a log field. | Not recommended | Not recommended | Recommended | Recommended |
Data distribution, which writes data to different Logstores based on different conditions. | General | Not recommended | Recommended | Not recommended |
Data filtering, where raw data is not required, to save costs. | General | Recommended | General | General |