Simple Log Service (SLS) offers a real-time data consumption feature using Consume Processors. Structured Process Language (SPL) can be used to process data on the server before consumption. This topic covers the concept, benefits, scenarios, billing rules, and supported consumption targets of Consume Processors.
How it works
A Consume Processor uses SPL to process SLS data in real time. This feature is suitable for various applications, including third-party software, multi-language apps, cloud products, and stream computing frameworks. SPL is a high-performance data processing language designed for semi-structured logs. It allows for pre-processing and cleaning of log data on the server, including row filtering, column cropping, and regex-based extraction. After processing, the client receives the data in a structured format. For more information about SPL syntax, see SPL syntax.
Benefits
Reduces data transfer costs for data consumption over the internet.
For example, if you write logs to SLS and then consume them over the internet, you may need to filter the logs before distributing them to internal systems. With SPL-based consumption, you can filter logs directly in SLS. This prevents large volumes of irrelevant logs from being sent to consumers and saves data transfer costs.
Saves local CPU resources and accelerates computing.
For example, if you write logs to SLS and then consume them on a local machine for computation, use SPL-based consumption to perform computations directly in SLS. This reduces local resource consumption.
Billing rules
If the Logstore uses the pay-by-data-written billing mode, using a Consume Processor incurs no extra fees. You are charged only for outbound data transfer when you pull data from the public endpoint of SLS. The cost is calculated based on the amount of compressed data. For more information, see Billing items for the pay-by-data-written mode.
If the Logstore uses the pay-by-feature billing mode, using a Consume Processor incurs server-side computation fees. You may also incur data transfer costs if you use the public endpoint of SLS. For more information, see Billing items for the pay-by-feature mode.
Consumption targets
The following table describes the consumption targets that SLS supports for Consume Processors.
Type | Target | Description |
Multi-language applications | Multi-language applications | Applications based on languages such as Java, Python, and Go can consume data from SLS using Consume Processor-based consumer groups. For more information, see Consume data using an API and Consume logs using a consumer group. Best practice: Consume logs based on a Consume Processor (SPL) using an SDK |
Cloud products | Alibaba Cloud Flink | You can use Alibaba Cloud Flink real-time computing to consume data from SLS. For more information, see Simple Log Service (SLS). Best practices: |
Stream computing | Kafka | If you need this feature, submit a ticket. |
Usage notes
Consume Processors perform complex computations on the server. The server-side latency for data reads might increase slightly due to variations in SPL computation complexity and data characteristics. For example, processing 5 MB of data might add 10 ms to 100 ms of latency. However, the overall end-to-end latency, which is the total time from data pull to local computation completion, usually decreases despite the increased server-side latency.
When you use a Consume Processor, issues such as SPL syntax errors or missing source data fields can cause data loss or consumption failures. For more information, see Error handling.
The maximum length of an SPL statement in a Consume Processor configuration is 4 KB.
The shard read limits for Consume Processors are the same as those for normal real-time consumption. The shard read traffic for a Consume Processor is calculated based on the raw data volume before SPL processing. For more information about the limits, see Data reads and writes.
Limits
Item | Description |
Number of Consume Processors | You can create a maximum of 100 Consume Processors in each project. To request a higher quota, submit a ticket. |
SPL statement length in a Consume Processor configuration | Each SPL statement cannot exceed 4,000 characters. |
SPL instruction limits in a Consume Processor | Only row processing instructions are supported. Instructions for aggregation, logical judgment, or other similar operations are not supported. |
Effective period for Consume Processor updates or deletions | Updates to or deletions of a Consume Processor configuration take effect within one minute. |
FAQ
How do I handle the ShardReadQuotaExceed error when using a Consume Processor?
This error code indicates that the shard read traffic has exceeded the quota. To resolve this issue, use one of the following solutions:
If your client application encounters this error, wait for a period and then retry the operation.
Alternatively, you can manually split the shard. This reduces the read speed for each shard when you consume new data from the resulting shards.
What is the traffic shaping policy for Consume Processors?
The throttling policy for Consume Processors is the same as for standard data consumption. For more information, see Data reads and writes. The traffic for a Consume Processor is calculated based on the raw data volume before SPL processing.
For example, assume the raw data size is 100 MB (compressed). After the data is filtered by the SPL statement
* | where method = 'POST', the data returned to the client is 20 MB (compressed). The traffic for throttling purposes is calculated as 100 MB.
Why is the outbound traffic low in the "Traffic/min" chart in Project Monitoring after I use a rule to consume data?
The outbound traffic value in the 'Traffic/Minute' chart in Project Monitoring shows the data volume after SPL processing, not the raw data volume. If your SPL statement includes instructions that reduce the data volume, such as row filtering or column cropping, the outbound traffic value may be low.