Data transformation limits and quotas - Simple Log Service

Task configuration

Limit	Description
Number of tasks	You can create up to 100 data transformation jobs in one project. Important A job consumes quota even when it is stopped or has completed. To reduce quota usage, delete jobs that are stopped, completed, and no longer needed. For more information, see Manage data transformation jobs. If you need a higher quota, submit a ticket.
Source data LogStore consumer group dependencies	Each data transformation job uses one consumer group from its source LogStore. While a job is running, do not delete or reset the checkpoint for the consumer group that the job depends on. Otherwise, the job restarts data consumption from its configured start time and may produce duplicate output. Important To improve processing efficiency, the job updates its shard consumption progress to the dependent consumer group instance at regular intervals. Therefore, the result of the GetCheckPoint API for this consumer group does not reflect the latest processing progress. For accurate progress, check the Shard consumption latency module in the Data Transformation Dashboard. For more information, see How data transformation works, Glossary, and Consumer group APIs.
Number of source LogStore consumer groups	You can create up to 30 consumer groups in one LogStore. Therefore, one source LogStore can support up to 30 data transformation jobs. For more information, see Basic resource limits. If this limit is exceeded, jobs fail to run after they are started. The operational logs show specific error messages. For more information, see How to view error logs. Important Simple Log Service does not automatically delete consumer groups used by jobs that are stopped or have completed. To reduce the number of unused consumer groups, delete jobs that are stopped, completed, and no longer needed. For more information, see Manage data transformation jobs.
Modifying the time range of a running job	After you modify the time range for a running job, the job starts processing from the new start time and handles all data within the new time range. To extend the time range: Keep the existing job and create a new job to cover the extended period. To shrink the time range: Data that is already written to the destination is not deleted. If necessary, purge the existing destination data before you modify the job to prevent duplicate data.
Number of output destinations	You can configure up to 20 static output destinations in one data transformation job. In your transformation code, if you use a single static output destination configuration and dynamically specify the project and LogStore, you can write to a maximum of 200 destinations. If you exceed this limit, data written to additional destinations is dropped.

Data transformation

Limitations	Description
Quick preview	The quick preview feature helps you debug transformation code. It has the following limits: It does not support connections to external resources such as RDS, OSS, or SLS. Use custom input to test dimension table data. Each request processes no more than 1 MB of raw data and no more than 1 MB of dimension table data. Requests exceeding these limits return an error. Each request returns at most the first 100 processed results. The Advanced preview feature has no such limits.
Runtime concurrency	A data transformation job uses the number of read/write shards in its source LogStore as its maximum runtime concurrency. For more information, see How data transformation works. For LogStore shard limits, see Basic resource limits. For how to split shards, see Manage shards. Important Insufficient runtime concurrency for a data transformation job does not trigger the automatic sharding feature for the source LogStore. You must manually split shards in the source LogStore to increase the job's runtime concurrency. For how to perform automatic sharding, see Manage shards. Splitting shards in the source LogStore increases the maximum runtime concurrency only for data written after the split. For data written before the split, the maximum runtime concurrency depends on the number of read/write shards in the source LogStore at the time of writing.
Data load per concurrent unit	The data load per concurrent unit depends on the storage data size in the source LogStore shard where the job runs. If data is unevenly distributed across shards in the source LogStore, some concurrent units become hot spots and cause processing delays for those shards. If the source data uses KeyHash routing, distribute keys across shards evenly to reduce imbalance.
Memory usage	Each concurrent unit has a memory limit of 6 GB. Exceeding this limit slows down job performance and causes processing delays. This limit is usually exceeded when too many LogGroups are pulled in one batch. Adjust the advanced parameter `system.process.batch_size` to control memory usage. Important The default (and maximum) value for the advanced parameter `system.process.batch_size` is 1000. You can set it to any positive integer up to 1000.
CPU usage	Each concurrent unit has a CPU limit of 100%. To meet higher CPU requirements, increase the concurrency limit.
Dimension table data volume	Dimension tables support up to two million entries and up to 2 GB of memory usage. If the data exceeds these limits, it is truncated to fit within them. Affected functions include res_rds_mysql, res_log_LogStore_pull, and res_oss_file. Important If a job uses multiple dimension tables, they share this limit. Keep dimension table data as small as possible.

Result data writing

Limit

Description

Writing to destination LogStore

Warning

Do not configure the destination store as the current source store (same-source configuration). Otherwise, logs may be written in a loop, which incurs additional storage and traffic costs. You are responsible for the resource consumption and costs incurred.

When writing processed results to a destination LogStore, follow the LogStore write limits. For details, see Basic resource limits and Data read and write.

If you use the e_outputLogStoreut function and specify the hash_key_field or hash_key parameter with KeyHash routing, distribute keys across shards evenly to reduce imbalance.

You can identify this limit using task logs. See How to view error logs.

Important

When a data transformation job hits a destination LogStore write limit, it retries indefinitely to ensure data integrity. However, this affects job progress and causes processing delays for the current source shard.

Cross-region transmission

When you use a public network endpoint for cross-region data transmission, unpredictable network quality may cause network errors when writing results to the destination LogStore. This leads to processing delays for data transformation jobs. For Simple Log Service endpoints, see Service endpoints.

We recommend enabling transfer acceleration for the destination project and configuring its transfer acceleration endpoint in your data transformation job to improve network stability. For more information, see Manage transfer acceleration.