Transactional Table在数据组织优化上的设计 - MaxCompute

This topic describes the architecture of data organization optimization that is supported for Transaction Table 2.0.

Clustering

Pain points
Transaction Table 2.0 supports minute-level near-real-time incremental data import. In high-traffic load scenarios, if Transaction Table 2.0 is used, the number of small incremental files may increase. This causes high access load and high storage costs. In addition, if a large number of small files exist, multiple other issues occur, such as metadata update, slow data analysis speed, and low I/O efficiency. To resolve these issues, you can perform the clustering operation to automatically merge small files in the preceding scenarios.
Solution
The storage service of MaxCompute performs the clustering operation to merge small files. However, after the clustering operation is performed, the intermediate historical status of data of the UPDATE and DELETE types still exists.
Process of the clustering operation
The following figure shows the process of the clustering operation.
- A clustering policy can be configured based on typical read and write business scenarios. After the configuration, MaxCompute merges data files in a hierarchical manner based on multiple dimensions, such as the size and number of data files. In the preceding figure, the original delta files that are highlighted in blue at Level 0 are merged into medium-sized delta files that are highlighted in yellow at Level 1. If the number of medium-sized delta files at Level 1 reaches the specified threshold, these files are further merged into large-sized delta files that are highlighted in orange at Level 2.
- Special isolation processing is performed on some data files that exceed a specific size. Further merging is not triggered on these delta files. This prevents unnecessary read and write amplification. In the preceding figure, the t8 data file in Bucket 3 is not merged with other files. Files that exceed a specific time range are not merged. If files that exceed a specific time range are merged, a large amount of historical data that does not belong to the time range of the query may be read when a time travel query or an incremental query is performed. This causes unnecessary read amplification.
- Data of a Transaction Table 2.0 table is split by BucketIndex for storage. In this case, the clustering operation is concurrently implemented at the bucket granularity. This significantly reduces the overall runtime.
- The clustering operation needs to interact with Meta Service to obtain the list of tables or partitions on which the clustering operation is performed. After the interaction is complete, the information of old and new data files is passed to Meta Service. Meta Service detects transaction conflicts of the clustering operation, atomically updates the metadata of old and new data files, and reclaims old data files.
- The clustering operation can help resolve read and write efficiency-related issues caused by a large number of large-sized data files. However, the clustering operation also consumes computing and I/O resources. All data is read and written each time the clustering operation is performed. This causes read and write amplification. Therefore, high clustering frequency is not recommended. MaxCompute automatically triggers the clustering operation based on the system status to ensure high efficiency of the clustering operation.

Compaction

Pain points
Transaction Table 2.0 allows you to write data of the UPDATE and DELETE types. If a large amount of data of the UPDATE and DELETE types is written to a Transaction Table 2.0 table, excessive redundant intermediate status data is generated. This increases storage and computing costs and causes low query efficiency. Therefore, we recommend that you use the compaction operation based on your business requirements to delete the intermediate status.
Solution
The storage service of MaxCompute performs the compaction operation to merge data files. You can manually execute SQL statements to trigger the compaction operation. You can also configure table property parameters to automatically trigger the compaction operation based on dimensions, such as the time frequency and the number of commits. The compaction operation merges the selected data files, including base files and delta files, to delete the intermediate status of data of the UPDATE and DELETE types. If multiple rows of data have the same primary key, only one row of the latest status of data is retained. As a result, a new base file in which all data is of the INSERT type is generated.
Process of the compaction operation
The following figure shows the process of the compaction operation.
- During the time points from t1 to t3, the compaction operation is triggered after a number of delta files are written to buckets. All delta files are merged in each bucket. Then, a new base file is generated for each bucket. During the time points from t4 to t6, the compaction operation is triggered again after a number of new delta files are written into buckets. The existing base files and the newly added delta file are merged together to generate a new base file.
- The compaction operation also needs to interact with Meta Service to obtain the list of tables or partitions on which the compaction operation is performed. The process of the interaction between the compaction operation and Meta Service is similar to the process of interaction between the clustering operation and Meta Service. After the interaction is complete, the information of old and new data files is passed to Meta Service. Meta Service detects transaction conflicts of the compaction operation, atomically updates the metadata of old and new data files, and reclaims old data files.
- The compaction operation can delete the intermediate historical status. This helps reduce computing and storage costs and effectively improves the efficiency of full snapshot queries. However, each time you perform the compaction operation, all data in the data files is read and merged. This greatly consumes computing and I/O resources. The new base file that is generated also occupies additional storage costs. The old delta files may need to be used for time travel queries and cannot be deleted within a short period of time. Therefore, the old delta files also occupy storage costs. In this case, we recommend that you perform the compaction operation at an appropriate frequency based on your business requirements and data characteristics. In most cases, in scenarios where a large amount of data of the UPDATE or DELETE type exists and full data is frequently queried, you can increase the frequency at which the compaction operation is performed to improve the query efficiency.

Data reclamation

The historical status of data is queried when time travel queries and incremental queries are performed. Therefore, the data must be retained for a specific period of time. You can configure the table property parameter acid.data.retain.hours to specify the time period during which historical data is retained. If the time at which historical data is generated is earlier than the time range specified by the acid.data.retain.hours parameter, the system automatically reclaims and deletes the data. Once the data is deleted, you cannot use the time travel feature to query the historical status of the data. The operation logs and data files are reclaimed.

The Purge command is also provided to help you manually trigger a forceful clearing of the historical data in special scenarios.

If delta files are continuously written to buckets in a Transaction Table 2.0 table, no delta file can be deleted because the new delta files may rely on the previous delta files. To remove dependencies between delta files, you can perform the compaction operation or the INSERT OVERWRITE operation. The data files that are generated after the compaction operation or the INSERT OVERWRITE operation is performed do not rely on the previous delta files. After the period of time during which the delta files can be queried by using the time travel feature elapses, the delta files can be deleted.

In extreme scenarios where users do not perform the compaction operation, MaxCompute optimizes the configuration to prevent unlimited historical records from being generated. The backend system periodically compacts the base files or delta files that cannot be queried by using the time travel feature. This way, the historical data files that have been compacted can be reclaimed as expected.