All Products
Search
Document Center

Hologres:Create a foreign table in Hologres to accelerate queries of MaxCompute data

Last Updated:Jul 20, 2023

This topic describes how to create a foreign table in Hologres to achieve accelerated queries of MaxCompute data.

Background information

MaxCompute is a fast and fully managed computing platform for large-scale data warehousing. It can process exabytes of data. MaxCompute is used to store and compute large amounts of structured data at a time. It provides various data warehousing solutions as well as big data analytics and modeling services.

Hologres is a real-time interactive analytics engine that is compatible with PostgreSQL. Hologres seamlessly integrates with MaxCompute storage resources. In Hologres, you can perform accelerated queries of MaxCompute data by creating foreign tables, without the need to store data in Hologres or import data to and export data from Hologres. You can use Hologres to connect to mainstream business intelligence (BI) tools.

You can also import data to Hologres before you query the data. Compared with other similar services that are not oriented to the big data ecosystem, Hologres offers higher performance and faster data import and export.

You can select one of the following query methods based on the business features and scenarios:

  • Query MaxCompute data in Hologres.

    This method is suitable for scenarios in which the amount of underlying data to be scanned in a query is less than 200 GB, and the number of partitions to be queried at a time is less than 512.

    Note

    The amount of underlying data to be scanned in a query is the volume of data in the partitions that hit the filter conditions, not the size of queried fields. The limit on the amount of underlying data to be scanned in a query ensures the stability of data queries. If you perform accelerated queries of MaxCompute data by creating foreign tables, Hologres loads the MaxCompute data in the queried partitions to its memory and cache. If the amount of underlying data to be scanned is too large, a lot of bandwidth and computing resources are occupied, which affects the performance of concurrent queries.

  • Import MaxCompute data to Hologres before you query the data.

    This method has no limit on the amount of underlying data to be scanned in a query. The method is suitable for the following scenarios: complex queries, index-based queries, and queries that involve UPDATE, INSERT, and DELETE operations.

Prerequisites

Usage notes

When you create a foreign table in Hologres to accelerate queries of MaxCompute data, take note of the following items:

  • You can accelerate queries of only internal tables in MaxCompute. Queries of foreign tables and views in MaxCompute cannot be accelerated.

  • The amount of underlying data to be scanned in a query cannot exceed 200 GB, and the number of partitions to be queried at a time cannot exceed 512. However, if you import MaxCompute data to Hologres internal tables before you query the data, the amount of underlying data to be scanned and the number of partitions to be queried are not limited.

  • After data in a MaxCompute table is updated, the cache in Hologres is automatically updated within 5 minutes. Before that, you cannot accelerate queries of the updated data. If you want to query the updated data in real time, execute the IMPORT FOREIGN SCHEMA statement to update the metadata of the foreign table. If you import data from the MaxCompute table to Hologres V1.1.25 or later, you do not need to manually update the metadata of the foreign table. After you execute the statement used to import data, Hologres automatically obtains the updated metadata of the MaxCompute table.

  • If the schema of a MaxCompute table is updated, the cache in Hologres is not automatically updated. You must manually update the foreign table.

  • Partition fields in MaxCompute tables are mapped to regular fields in Hologres tables. Therefore, you can import data from a partitioned MaxCompute table to a partitioned or non-partitioned Hologres table.

  • Data types that are supported by MaxCompute and those supported by Hologres have a one-to-one mapping relationship. You can create tables based on the mappings between data types. For more information, see Data types.

  • You can query MaxCompute data across regions. However, if your Hologres instance resides in China, you can accelerate only the queries of data in MaxCompute tables that reside in China. In addition, if your Hologres instance resides outside China, you cannot accelerate queries of MaxCompute data across regions. We recommend that you do not accelerate queries of MaxCompute data across regions. We recommend that you create a foreign table in the region in which a MaxCompute table resides. Otherwise, the stability of data queries cannot be ensured because of possible network errors.

  • For encrypted MaxCompute data, only data that is encrypted based on Bring Your Own Key (BYOK) can be queried. For more information, see Query MaxCompute data encrypted based on BYOK.

  • Foreign tables do not store data and are used only to map the source data. The source data is stored in MaxCompute tables.

  • MaxCompute transactional tables can be used as foreign tables for querying MaxCompute data in Hologres V1.3 and later.

  • If data is written to a MaxCompute table by using Streaming Tunnel, the table is in the streaming state before it is asynchronously merged to the ORC format. Hologres cannot read data from a MaxCompute table that is in the streaming state. In this case, you can pause the data write, manually perform the merge operation, and then read data from the MaxCompute table.

  • MaxCompute tables on which schema-related operations have been performed are supported in Hologres V1.3 and later. Schema-related operations in MaxCompute include deleting columns, changing the order of columns, and changing the data types of columns.

  • The three-layer model of MaxCompute is supported in Hologres V1.3 and later. This model allows you to create schemas in MaxCompute projects and use these schemas to classify objects such as tables. You can read data from but cannot write data to these schemas. For more information, see Schema-related operations.

  • Dual-signature authentication between Hologres and MaxCompute is supported in Hologres V1.3 and later. For more information, see Create a Hologres foreign table in dual-signature mode.

Query data in a non-partitioned MaxCompute table

  1. Create a non-partitioned table in MaxCompute.

    Create a non-partitioned table in MaxCompute and import data to the table. For more information, see Create tables. You can also use an existing non-partitioned MaxCompute table.

    In this example, an existing non-partitioned MaxCompute table is used. To create a non-partitioned table in MaxCompute and import data to the table, execute the following statements:

    CREATE TABLE weather (
        city            STRING ,
        temp_lo         int,           -- The lowest temperature.
        temp_hi         int           -- The highest temperature.
    );
    INSERT INTO weather VALUES 
    ('beijing',40,50),
    ('hangzhou',46,55);
  2. Create a foreign table in Hologres.

    Create a foreign table in Hologres to map the source table in MaxCompute. You can configure the foreign table to include some or all fields in the MaxCompute table. For example, you can execute the following statement to create a foreign table:

    CREATE FOREIGN TABLE weather1 (
     city text,
     temp_lo int4,
     temp_hi int4
    )
    SERVER odps_server
    OPTIONS (project_name '<projectname>',table_name 'weather');

    The following table describes the parameters in the statement.

    Parameter

    Description

    SERVER

    The server on which you want to create the foreign table.

    You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.

    project_name

    The name of the project in which the MaxCompute table to be queried resides.

    table_name

    The name of the MaxCompute table to be queried.

    Note
    • The data types of the fields in the foreign table must map those in the MaxCompute table. For more information about the mappings between data types, see Data type mappings between MaxCompute and Hologres.

    • Hologres allows you to execute the IMPORT FOREIGN SCHEMA statement to create multiple foreign tables at a time. For more information, see IMPORT FOREIGN SCHEMA.

    • You can accelerate queries of only internal tables in MaxCompute. Queries of foreign tables and views in MaxCompute cannot be accelerated.

  3. Query data in the non-partitioned MaxCompute table by using the foreign table in Hologres.

    After you create a foreign table, you can directly query the foreign table for data in the mapping MaxCompute table. For example, you can execute the following statement to query the foreign table:

    SELECT * FROM weather1;

Query data in a partitioned MaxCompute table

  1. Create a partitioned table in MaxCompute.

    Create a partitioned table in MaxCompute and import data to the table. For more information, see Partition and column operations. You can also use an existing partitioned MaxCompute table.

    In this example, an existing partitioned MaxCompute table is used. To create a partitioned table in MaxCompute, execute the following statement:

    create table odps_test
    (
        shop_name     string,
        customer_id   string,
        total_price   INT 
    )
    partitioned by (sale_date string);
  2. Create a foreign table in Hologres.

    Create a foreign table in Hologres to map the source table in MaxCompute. For example, you can execute the following statement to create a foreign table:

    CREATE FOREIGN TABLE table_odps (
     shop_name text,
     customer_id text,
     total_price int4,
     sale_date text
    )
    SERVER odps_server
    OPTIONS (project_name '<projectname>', table_name 'odps_test');
    Note

    A foreign table in Hologres is used to map fields in the source table, but is not used to store data. A partition field of the source table in MaxCompute maps a regular field of the foreign table in Hologres.

  3. Query data in the partitioned MaxCompute table by using the foreign table in Hologres.

    To query all the data in the partitioned table, execute the following statement:

    SELECT * FROM table_odps;

    To query the data in a specific partition, execute the following statement:

    SELECT * FROM table_odps 
    WHERE sale_date = '2013';

Create multiple foreign tables at a time

To accelerate queries of a MaxCompute table that contains a large amount of data, you can create multiple foreign tables at a time. In Hologres, you can create multiple foreign tables at a time by executing SQL statements or by using the Hologres console.

Create a foreign table in the HoloWeb console

You can use HoloWeb to create a foreign table and query table data in a visualized way, without the need to write SQL statements. To do so, perform the following steps:

  1. Go to the HoloWeb console. For more information, see Connect to HoloWeb.

  2. In the main menu bar of the HoloWeb console, choose Metadata Management > MaxCompute Acceleration > Create Foreign Table.

    You can also click Instances Connected in the left-side navigation pane of the Metadata Management tab. Click the instance that you want to manage and click the database that you want to manage. Right-click the schema that you want to manage and select New external table.

  3. On the New external table tab, configure the parameters.

    Parameter

    Description

    Schema

    The name of the schema.

    You can select the default schema named public or a custom schema.

    Table name

    The name of the Hologres foreign table.

    After you enter the name of the source MaxCompute table, the name of the foreign table must be the same as that of the source MaxCompute table. In this case, you cannot modify the name of the foreign table. To rename the foreign table, you must create the foreign table, find the foreign table in the left-side Instances Connected list, and then right-click the foreign table.

    Description

    The description of the Hologres foreign table.

    Type

    The service type of the source table.

    The default value is MaxCompute, which cannot be changed.

    Server List

    You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.

    Table

    The name of the source MaxCompute table to be mapped, including the name of the MaxCompute project.

    Format: project.table_name.

    Note
    • You cannot query data in a source table that is stored in a region different from that of the selected Hologres Shared Cluster (BI-based MaxCompute Acceleration Edition) instance.

    • After you enter the name of the source MaxCompute table, all the fields of the source table are displayed. By default, the created foreign table contains all the fields of the source table. If you need to create a foreign table that contains only specific fields of the source table, use SQL statements to create the foreign table. For more information, see CREATE FOREIGN TABLE.

    Note

    When you create a foreign table to synchronize data from a MaxCompute table, comments of both field names and columns of the MaxCompute table are synchronized to the foreign table.

  4. Click Submit form. After the foreign table is submitted, you can refresh the left-side instance lists. The created foreign table is displayed in the schema that you selected.

  5. Optional. Preview the data in the table.

    1. In the Instance Management pane, click Instances Connected and double-click the table whose data you want to preview.

    2. On the table information tab, click Data preview to preview the data in the table.

      Data preview tab
  6. Optional. View the DDL statements used to create the table.

    On the table information tab, click DDL statement to view the DDL statements used to create the table. DDL statement tab

Optimize the performance of querying MaxCompute tables in Hologres

If the performance of querying MaxCompute tables in Hologres does not meet your requirements, you can merge small files in MaxCompute or optimize Hologres SQL statements. Hologres V0.10 and later adopt a new engine to query MaxCompute data. Compared with the versions earlier than V0.10, Hologres V0.10 and later improve query performance by about 30% to 100%. For more information, see Optimize the performance of querying MaxCompute tables in Hologres.

FAQ

For more information about the answers to frequently asked questions about the permissions required when you create a foreign table in Hologres to accelerate queries of MaxCompute data, see Permissions on MaxCompute.

For more information about the answers to frequently asked questions about the feature of creating a foreign table in Hologres to accelerate queries of MaxCompute data, see Common errors and troubleshooting of integration with MaxCompute.