All Products
Search
Document Center

MaxCompute:Near-real-time incremental import

Last Updated:Aug 01, 2023

Transaction Table 2.0 supports two data write modes: near-real-time incremental write and batch write. This topic describes the architecture of high-concurrency near-real-time incremental write.

Actual business data processing scenarios involve various data sources, such as databases, log systems, and message queue systems. To help you write data to Transaction Table 2.0 tables, MaxCompute provides an open source Flink connector, which can be used together with Data Integration of DataWorks and other data import tools to meet the requirements for low delay and high data accuracy in various scenarios such as high concurrency, fault tolerance, and transaction submission.

image.png

The preceding figure shows business data processing.

  • The data import tool is integrated with the SDK client that is provided by the Tunnel service of MaxCompute to support high-concurrency minute-level data writing to the Tunnel server. Then, the Tunnel server initiates multiple worker nodes to write data in parallel to the data files of each bucket.

  • You can configure the table property write.bucket.num to specify the degree of write parallelism. Therefore, the write traffic can be scaled out. For more information about the benefits of bucket splitting, see Table data format.

  • The data writing interface that is provided by Tunnel SDK supports only UPSERT and DELETE operations.

  • The call of the commit interface represents an atomic commit of the data that is written before the commit.

    • If the call is successful, the data that is written can be queried and meets the read/write snapshot isolation requirements.

    • If the call fails, you can retry to write the data. If the failure is not caused by an unrecoverable error, such as data corruption, the retry may be successful and you do not need to rewrite the data. Otherwise, you must rewrite and recommit the data.