Hologres and MaxCompute are deeply integrated at the underlying layer, enabling seamless bidirectional access to the storage system of each other and one-click metadata mapping capabilities, which significantly enhances data access efficiency. This topic describes the capabilities and limits of using Hologres to accelerate data reads from and data writes to MaxCompute.
Background information
MaxCompute is a fast and fully managed computing platform for large-scale data warehousing. It can process up to exabytes of data. MaxCompute is used to batch store and compute large amounts of structured data. MaxCompute provides various data warehousing solutions and big data analysis and modeling services. For more information, see What is MaxCompute?
Hologres is an all-in-one real-time data warehouse service developed by Alibaba Cloud and allows you to write, update, process, and analyze large amounts of data in real time. Hologres is deeply integrated with MaxCompute. It allows you to create external databases or foreign tables to map to MaxCompute metadata and accelerate queries of MaxCompute data. This allows you to quickly obtain query results based on one set of storage data and provide external services without the need to import data. In addition, Hologres uses the standard PostgreSQL protocol to seamlessly integrate with all mainstream BI tools in the PostgreSQL ecosystem. For more information, see BI analysis and visualization.
Starting from Hologres V3.1, the direct read capability of Hologres for MaxCompute data is fully upgraded to version 2.0. This version rebuilds the underlying direct-read mechanism based on the MaxCompute C++ Native SDK, further enhancing the performance and user experience when Hologres accesses MaxCompute data sources. Compared to the approach of using the Storage API, the direct-read method offers significant advantages. Since it directly reads data from the Apsara Distributed File System, there are no additional computational costs or rate limiting on the MaxCompute side, and the read performance is more than twice as fast as the Storage API method.
NoteTo access a MaxCompute table using the new routing path, you need to add a GUC parameter. It is recommended to enable it at the session level. Add the
SET hg_experimental_external_catalog_routing = 'odps:common_table';
command before your SQL query statement and execute them together.
Scenarios
Use external databases or foreign tables to accelerate queries of MaxCompute data
In a solution that integrates real-time and offline data processing, the most critical aspect is the simplification of data exchange. It is expected that users do not need to move data and a single storage system can support both offline and real-time computing workflows. The capabilities of using external databases or foreign tables to query data effectively meet this requirement.
You can execute the CREATE EXTERNAL DATABASE statement to create an external database in Hologres to map schemas and tables in a MaxCompute project. You can also execute the IMPORT FOREIGN SCHEMA statement to create multiple foreign tables at a time. If you want to accelerate queries on a small number of MaxCompute tables or on specific fields in a MaxCompute table, you can execute the CREATE FOREIGN TABLE statement. Select a method based on your business characteristics and scenarios.
Starting from Hologres V3.0, the following limits are eliminated when you use a foreign table to query MaxCompute data: A maximum of 200 GB of data in a maximum of 512 partitions can be scanned at a time. Large-scale queries consume more resources in Hologres. We recommend that you perform resource isolation. For more information, see Getting started with virtual warehouse instances. You can also use serverless computing resources to perform occasional large-scale queries. For more information, see Overview of Serverless Computing.
Import MaxCompute data to Hologres
To improve query performance, you can import data to Hologres and then query the data. This method can make full use of the indexing capabilities and storage optimization of Hologres to further accelerate queries and implement real-time BI analysis, high-concurrency point queries, and complex association queries on multiple tables. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.
Export data from Hologres to MaxCompute
In actual business scenarios, users typically need to write the data that is written to Hologres in real time and the aggregated analysis results back to MaxCompute. This helps meet requirements for historical data archiving, tiered storage management for hot and cold data, and offline data warehouse replenishment. You can execute the INSERT INTO or INSERT OVERWRITE statement to write data back to MaxCompute. For more information, see Export data to MaxCompute.
Feature description
The following table describes the capabilities provided by Hologres for accelerated queries on MaxCompute data by using external databases and foreign tables.
Hologres version | Supported capability | Description |
V3.1 | The direct read capability of Hologres for MaxCompute data is fully upgraded to version 2.0. This version rebuilds the underlying direct-read mechanism based on the MaxCompute C++ Native SDK, further enhancing the performance and user experience when Hologres accesses MaxCompute data sources. | Compared to the read method based on the Storage API, the direct-read mechanism in version 2.0 can directly access the Apsara Distributed File System, eliminating additional computational costs on the MaxCompute side and avoiding rate limiting. In addition, in terms of read performance, the direct-read approach offers more than twice the performance advantage. |
Hologres supports direct reading of MaxCompute Delta table data. Currently, it only supports full-access to the latest snapshot of Delta tables. | A Delta table is an incremental data table format introduced by MaxCompute, supporting data ingestion at a minute-level granularity. It uses AliORC as the underlying file format, supports UPSERT semantics, and enables reading and writing incremental data in Change Data Capture (CDC) format. For more information, see Access MaxCompute Delta tables from Hologres. | |
Hologres supports direct reading of MaxCompute data for which the dynamic data masking feature is enabled. | The dynamic data masking feature of MaxCompute helps customers hide or replace sensitive data in real time during data access or display, to prevent sensitive data leakage. For more information, see Dynamic data masking. | |
Hologres supports direct reading of MaxCompute Schema Evolution tables. | The supported operations include adding columns, dropping columns, modifying column types, and reordering columns. | |
Hologres supports direct reading of MaxCompute clustered tables, including range-clustered tables that use fields of the DATETIME, TIMESTAMP, or DECIMAL types as cluster columns. | None. | |
V3.0.7 | The CREATE EXTERNAL DATABASE statement is supported to create an external database in Hologres for mapping schemas and tables in a MaxCompute project. | You can execute the CREATE EXTERNAL DATABASE statement to create an external database in Hologres and use the external database to load the metadata of an external data source to Hologres. This allows you to manage internal and external data in Hologres and helps implement centralized metadata management by using the integrated lakehouse architecture. For more information, see CREATE EXTERNAL DATABASE. |
The limits that a maximum of 200 GB of MaxCompute data in a maximum of 512 partitions can be scanned by using a foreign table at a time are removed. | This way, you can use Hologres to query large amounts of MaxCompute data. You can use virtual warehouse instances and the Serverless Computing feature to isolate resources. This prevents large queries from consuming excessive instance resources and affecting the execution of other queries and write jobs. | |
V2.2.1 | The Auto Load feature supports MaxCompute projects that use the three-layer model. | Schemas in the three-layer model of MaxCompute are used to classify tables, resources, and functions in a project. For more information about schemas, see Schema. For more information about the Auto Load feature, see Auto Load. |
The Auto Load feature supports automatic updates of metadata. | None. | |
The Auto Load feature supports schema evolution on MaxCompute foreign tables. | Schema evolution on MaxCompute foreign tables includes operations such as deleting columns, changing the order of columns, and changing the data types of columns. For more information, see Auto Load. | |
The service-linked role of Hologres can be used to access MaxCompute. | A service-linked role is a RAM role whose trusted entity is an Alibaba Cloud service. Service-linked roles can implement authorized access across Alibaba Cloud services. For more information, see Service-linked role for Hologres. | |
V2.1 | You can write data to tables in a specified schema in a MaxCompute project that uses the three-layer model. | Schemas in the three-layer model of MaxCompute are used to classify tables, resources, and functions in a project. For more information about schemas, see Schema. |
You can access tables in a specified schema in a MaxCompute project that uses the three-layer model in the HoloWeb console. | For more information, see Create foreign tables mapped to MaxCompute in the HoloWeb console. | |
V2.0 | The engine for accelerated queries on MaxCompute data is upgraded to Hologres Query Engine (HQE) to improve the compatibility and stability. | None. |
V1.3 | You can use foreign tables to query data in MaxCompute transactional tables. | None. |
You can read data from MaxCompute tables in the schema evolution state. | A table is in the schema evolution state if its columns are deleted, the order of the columns is changed, or the data types of the columns are changed in MaxCompute. | |
Dual-signature authentication between Hologres and MaxCompute is supported. | For more information, see Create a Hologres external table in dual-signature mode. | |
You can access MaxCompute tables that are encrypted based on Bring Your Own Key (BYOK). | For more information, see Query MaxCompute data encrypted based on BYOK. | |
V1.1.43 | The Auto Load feature is supported. Foreign tables can be automatically created for queried MaxCompute tables based on your business requirements or all MaxCompute tables. | For more information, see Auto Load. |
Precautions
When you use external databases or foreign tables in Hologres to accelerate queries on MaxCompute data, take note of the following items:
You must make sure that your Hologres instance is in the same region as the MaxCompute project. Otherwise, the stability of data queries cannot be ensured because of potential network issues.
You can accelerate queries on data only in internal tables in MaxCompute but not external tables or views in MaxCompute.
Partition fields in MaxCompute tables are mapped to regular fields in Hologres. You can use values of partition fields as filter conditions to query data.
You cannot query data of the MAP or STRUCT type from MaxCompute tables.
Range-clustered tables in MaxCompute that use columns of the DATETIME, TIMESTAMP, or DECIMAL type as clustering key columns are not supported.
You can only query MaxCompute data that uses the standard storage but not MaxCompute data that uses the infrequent access (IA) storage or long-term storage.