This topic describes the limits on data transformation in Simple Log Service.
Job configuration
Item | Description |
Number of jobs | You can create up to 100 data transformation jobs in a project. Important When a data transformation job is stopped or complete, the job still consumes the job quota. To prevent the quota from being consumed by the data transformation jobs that are stopped or complete, we recommend that you delete the jobs. Make sure that you no longer require the jobs. For more information, see Manage a data transformation job. To increase the quota, submit a ticket. |
Dependency of a consumer group in a source Logstore | The running of a data transformation job depends on a consumer group in the source Logstore. When a data transformation job is running, do not delete or reset the consumption checkpoint for the consumer group on which the job depends. If you perform the delete or reset operation, the job consumes data again from the start time that you specify, and duplicate data exists in the result. Important The data consumption progress of a job in a shard is updated to the consumer group on which the job depends at regular intervals. This optimizes the efficiency of data transformation. However, the result of the GetCheckPoint operation on the consumer group does not indicate the latest data transformation progress. To obtain the accurate data transformation progress of a job, you can go to the shard consumption delay chart of the dashboard that is created for the job. For more information about the dashboard, see Data transformation dashboard. For more information, see Data transformation basics, Terms, and API operations related to consumer groups. |
Number of consumer groups in a source Logstore | You can create up to 30 consumer groups in a Logstore. Therefore, you can create up to 30 data transformation jobs in a source Logstore. For more information, see Basic resources. If you create more than 30 consumer groups, the data transformation jobs cannot run as expected after the jobs are started. The run logs of the jobs record error information. For more information, see View error logs. Important When a data transformation job is stopped or complete, Simple Log Service does not automatically delete the consumer group on which the job depends. To reduce invalid consumer groups, we recommend that you delete the data transformation jobs that are stopped or complete and you no longer require. For more information, see Manage a data transformation job. |
Change in the time ranges of jobs | If you change the time range of a running job, the job starts consumption from the start time that you specify and transforms all data that is generated in the new time range.
|
Number of storage destinations | You can configure up to 20 independent static storage destinations for a data transformation job. Up to 200 projects and 200 Logstores can be dynamically specified in data transformation code. If one of the preceding limits is exceeded, the data that is written to a different storage destination other than the allowed 20 storage destinations is discarded. |
Data transformation
Item | Description |
Quick preview | The quick preview feature of data transformation is used to debug data transformation code. The feature has the following limits:
The advanced preview feature does not have these limits. |
Runtime concurrency | The number of readwrite shards in a source Logstore specifies the maximum number of data transformation jobs that can concurrently run. For more information, see Data transformation basics. For more information about the limits on the shards of a Logstore, see Basic resources. For more information about how to split a shard of a Logstore, see Manage shards. Important
|
Data load of a concurrent unit | The data load of a concurrent unit in a data transformation job varies based on the amount of data that is consumed by the job from a shard of the source Logstore. If the data in the source Logstore is unevenly distributed among shards, the data load of a concurrent unit in a data transformation job may be heavier. This type of concurrent unit is considered a hot concurrent unit. In this case, the transformation of data in specific shards is delayed. If data is written to the source Logstore in KeyHash mode, we recommend that you appropriately allocate hash keys and shards to minimize uneven data distribution. For more information about data writing, see PutLogs. |
Memory usage | The memory usage threshold of a concurrent unit in a data transformation job is 6 GB. If the memory usage threshold is exceeded, the job performance is limited, and transformation latency exists. The memory usage threshold is exceeded when a large number of log groups are pulled at the same time. You can modify the Important The maximum value allowed for the |
CPU utilization | The CPU utilization threshold for a concurrent unit of a data transformation job is 100%. If you have higher requirements for CPU utilization, you can increase the number of data transformation jobs that can concurrently run based on the preceding descriptions. |
Data amount in a dimension table | The maximum number of data entries allowed in a dimension table is 2 million, and the maximum memory that can be occupied by data in a dimension table is 2 GB. If one of the preceding limits is exceeded, truncation is performed. In this case, only the allowed data entries and data can be used. The related functions include res_rds_mysql, res_log_logstore_pull, and res_oss_file. For more information, see res_rds_mysql, res_log_logstore_pull, and res_oss_file. Important If a single data transformation job consumes data from multiple dimension tables, the tables must conform to the limits as a whole. We recommend that you minimize the amount of data in a dimension table. |
Result data writing
Item | Description |
Data writing to a destination Logstore | When transformation results are written to a destination Logstore, the write limits of the Logstore cannot be exceeded. For more information, see Basic resources and Data read and write. If you configure the You can locate a write limit error based on the logs that record data transformation jobs. For more information, see View error logs. Important If a write limit error occurs when the results of a data transformation job are written to a destination Logstore, repeated retries are performed to ensure that the transformation results are complete. In this case, the progress of the data transformation job is compromised, and the transformation of data in the source shard is delayed. |
Cross-region data transmission | When data is transferred across regions by using a public endpoint, network quality cannot be ensured. In this case, a network error may occur when the results of a data transformation job are written to a destination Logstore. This delays the progress of the entire data transformation job. For more information about Simple Log Service endpoints, see Endpoints. To improve the stability of network transmission, we recommend that you enable the transfer acceleration feature for your project and specify a transfer acceleration endpoint in your data transformation job. |