All Products
Search
Document Center

MaxCompute:Overall architecture

Last Updated:May 09, 2024

The special design of the incremental storage and processing architecture of Transaction Table 2.0 focuses on five modules: data access, compute engine, data optimization service, metadata management, and data file organization. Other modules in the incremental storage and processing architecture of Transaction Table 2.0 are the same as the modules in the general architecture of MaxCompute. This topic describes the core architecture of Transaction Table 2.0.

The following figure shows the architecture of Transaction Table 2.0.

图片1

Description of the modules in the architecture:

  • Data access

    • Full import and near-real-time incremental import for various data sources: MaxCompute is used together with related services to provide various custom data import tools, such as the Flink connector of MaxCompute and Data Integration of DataWorks. These tools support efficient near-real-time incremental import. These tools can be connected to the Tunnel server of the Tunnel service of MaxCompute to support high-concurrency minute-level incremental write.

    • Full write and incremental batch write for various data sources: MaxCompute SQL and other interfaces can be used to support efficient full write and incremental batch write.

  • Compute engine

    This module mainly includes the SQL engine that is developed by MaxCompute. The SQL engine is used for syntax parsing, optimization, and execution of DDL, DML, and DQL statements when you perform time travel queries and incremental queries.

  • Data optimization service

    The storage service of MaxCompute intelligently manages incremental data files. The management of incremental data files includes optimization operations, such as clustering of small files, data compaction, and data sorting. The storage service automatically executes data optimization tasks for some of these operations based on a comprehensive evaluation of multiple dimensions, such as data characteristics and time series, to help maintain stable and efficient data storage and computing.

  • Metadata management

    This module is used for conflict management of concurrent transactions, data version management, time travel management, and metadata updates and analytics.

  • Data file organization

    This module is used to manage the formats of full and incremental data files.