All Products
Search
Document Center

Hologres:Create a foreign table in Hologres to accelerate queries on MaxCompute data

Last Updated:Mar 27, 2024

Hologres allows you to create foreign tables to accelerate queries on MaxCompute data. This way, you can directly access and analyze data stored in MaxCompute by using foreign tables in Hologres. This improves query efficiency and simplifies data processing.

Background information

MaxCompute is a fast and fully managed computing platform for large-scale data warehousing. It can process up to exabytes of data. MaxCompute is used to store and compute large amounts of structured data at a time. MaxCompute provides various data warehousing solutions and big data analytics and modeling services.

Hologres is a real-time interactive analytics engine that is compatible with PostgreSQL and uses the standard PostgreSQL protocol. Hologres is natively integrated with MaxCompute storage resources. In Hologres, you can perform accelerated queries on MaxCompute data by using foreign tables, without the need to store data in Hologres or import data to or export data from Hologres. You can use Hologres to connect to mainstream business intelligence (BI) tools.

You can also import data to Hologres before you query the data. Compared with other similar services that are not oriented for the big data ecosystem, Hologres allows you to import or export data at a higher speed.

You can select one of the following query methods based on the business characteristics and scenarios:

  • Query MaxCompute data in Hologres.

    This method is suitable for scenarios in which the amount of underlying data to be scanned in a query is less than 200 GB, and the number of partitions to be queried at a time is less than 512.

    Note

    The amount of underlying data to be scanned in a query is the amount of data in the partitions that meet the filter conditions, not the size of queried fields. The upper limit on the amount of underlying data to be scanned in a query is used to ensure the stability of data queries. If you perform accelerated queries on MaxCompute data by using foreign tables, Hologres loads the MaxCompute data in the queried partitions to its memory and cache. If the amount of underlying data to be scanned is too large, a lot of bandwidth and computing resources are occupied, which affects the performance of concurrent queries.

  • Import MaxCompute data to Hologres before you query the data.

    This method has no limit on the amount of underlying data to be scanned in a query. The method is suitable for the following scenarios: complex queries, index-based queries, and queries that involve UPDATE, INSERT, or DELETE operations.

Prerequisites

Usage notes

When you use a foreign table in Hologres to accelerate queries on MaxCompute data, take note of the following items:

  • You can accelerate queries on data only in internal tables in MaxCompute but not external tables or views in MaxCompute.

  • The amount of underlying data to be scanned in a query cannot exceed 200 GB, and the number of partitions to be queried at a time cannot exceed 512. However, if you import MaxCompute data to Hologres internal tables before you query the data, the amount of underlying data to be scanned and the number of partitions to be queried are not limited.

  • After data in a MaxCompute table is updated, the cache in Hologres is automatically updated within 5 minutes. Before the cache is updated, you cannot query the updated data. If you want to query the updated data in real time, execute the IMPORT FOREIGN SCHEMA statement to update the metadata of the foreign table. If you import data from a MaxCompute table to a Hologres instance of V1.1.25 or later, you do not need to manually update the metadata of the foreign table. After you execute the statement used to import data, Hologres automatically obtains the updated metadata of the MaxCompute table.

  • If the schema of a MaxCompute table is updated, the cache in Hologres is not automatically updated. You must manually update the foreign table.

  • Partition fields in MaxCompute tables are mapped to regular fields in Hologres foreign tables.

  • Data types that are supported by MaxCompute and those supported by Hologres have a one-to-one mapping relationship. You can create tables based on the data type mappings. For more information, see Data types.

  • You can query MaxCompute data across regions. However, if your Hologres instance resides in China, you can accelerate only the queries on data in MaxCompute projects that reside in China. If your Hologres instance resides outside China, you cannot accelerate queries on MaxCompute data across regions. We recommend that you do not accelerate queries on MaxCompute data across regions. We recommend that you create a foreign table in the same region as the MaxCompute project whose table data you want to query. Otherwise, the stability of data queries cannot be ensured because of possible network errors.

  • For encrypted MaxCompute data, only data that is encrypted based on Bring Your Own Key (BYOK) can be queried. For more information, see Query MaxCompute data encrypted based on BYOK.

  • Foreign tables do not store data and are used only to map the source data. The source data is stored in MaxCompute tables.

  • In Hologres V1.3 and later, you can use foreign tables to query data in MaxCompute transactional tables.

  • If data is written to a MaxCompute table by using Streaming Tunnel, the table is in the streaming state before it is asynchronously merged to the ORC format. Hologres cannot read data from a MaxCompute table that is in the streaming state. In this case, you can pause the data write, manually perform the merge operation, and then read data from the MaxCompute table.

  • MaxCompute tables on which schema-related operations have been performed are supported in Hologres V1.3 and later. Schema-related operations in MaxCompute include deleting columns, changing the order of columns, and changing the data types of columns. If you use Hologres foreign tables to read data from Optimized Row Columnar (ORC)-formatted MaxCompute tables on which schema-related operations are performed, Seahawks Query Engine (SQE) and the CFile format are used. As a result, the performance deteriorates.

  • The three-layer model of MaxCompute is supported in Hologres V1.3 and later. This model allows you to create schemas in MaxCompute projects and use these schemas to classify objects such as tables. You can read data from but cannot write data to these schemas. For more information, see Schema-related operations.

  • Dual-signature authentication between Hologres and MaxCompute is supported in Hologres V1.3 and later. For more information, see Hologres external tables.

Query data in a non-partitioned MaxCompute table

  1. Create a non-partitioned table in MaxCompute.

    Create a non-partitioned table in MaxCompute and import data to the table. For more information, see Create tables. You can also use an existing non-partitioned MaxCompute table.

    In this example, an existing non-partitioned MaxCompute table is used. Sample statements:

    CREATE TABLE weather (
        city            STRING ,
        temp_lo         int,           -- The lowest temperature.
        temp_hi         int           -- The highest temperature.
    );
    INSERT INTO weather VALUES 
    ('beijing',40,50),
    ('hangzhou',46,55);
  2. Create a foreign table in Hologres.

    Create a foreign table in Hologres to map the source table in MaxCompute. You can configure the foreign table to include some or all fields in the MaxCompute table. Sample statement:

    CREATE FOREIGN TABLE weather1 (
     city text,
     temp_lo int4,
     temp_hi int4
    )
    SERVER odps_server
    OPTIONS (project_name '<projectname>',table_name 'weather');

    The following table describes the parameters in the preceding statement.

    Parameter

    Description

    SERVER

    The server on which you want to create the foreign table.

    You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see Postgres FDW.

    project_name

    The name of the project where the MaxCompute table that you want to query resides.

    table_name

    The name of the MaxCompute table that you want to query.

    Note
    • The data types of the fields in the foreign table must map those in the MaxCompute table. For more information about data type mappings, see Data type mappings between MaxCompute and Hologres in the "Data types" topic.

    • Hologres allows you to execute the IMPORT FOREIGN SCHEMA statement to create multiple foreign tables at a time. For more information, see IMPORT FOREIGN SCHEMA.

    • You can accelerate queries on only internal tables in MaxCompute but not external tables and views in MaxCompute.

  3. Query data in the MaxCompute table by using the foreign table in Hologres.

    After you create a foreign table, you can directly query data in the MaxCompute table by using the foreign table. Sample statement:

    SELECT * FROM weather1;

Query data in a partitioned MaxCompute table

  1. Create a partitioned table in MaxCompute.

    Create a partitioned table in MaxCompute and import data to the table. For more information, see Partition and column operations. You can also use an existing partitioned MaxCompute table.

    In this example, an existing partitioned MaxCompute table is used. Sample statement:

    create table odps_test
    (
        shop_name     string,
        customer_id   string,
        total_price   INT 
    )
    partitioned by (sale_date string);
  2. Create a foreign table in Hologres.

    Create a foreign table in Hologres to map the source table in MaxCompute. Sample statement:

    CREATE FOREIGN TABLE table_odps (
     shop_name text,
     customer_id text,
     total_price int4,
     sale_date text
    )
    SERVER odps_server
    OPTIONS (project_name '<projectname>', table_name 'odps_test');
    Note

    A foreign table in Hologres is used to map fields in the source table, but is not used to store data. A partition field of the source table in MaxCompute maps to a regular field of the foreign table in Hologres.

  3. Query data in the partitioned MaxCompute table.

    To query all the data in the table, execute the following statement:

    SELECT * FROM table_odps;

    To query the data in a specific partition, execute the following statement:

    SELECT * FROM table_odps 
    WHERE sale_date = '2013';

Create multiple foreign tables at a time

To accelerate queries of a large number of MaxCompute tables, you can create multiple foreign tables at a time. In Hologres, you can create multiple foreign tables at a time by executing SQL statements or by using the Hologres console.

Create a foreign table in the HoloWeb console

You can use HoloWeb to create a foreign table and query table data in a visualized way, without the need to write SQL statements. Procedure:

  1. Go to the HoloWeb console. For more information, see Connect to HoloWeb.

  2. In the main menu bar of the HoloWeb console, choose Metadata Management > MaxCompute Acceleration > Create Foreign Table.

    You can also click Instances Connected in the left-side navigation pane of the Metadata Management tab. Click the instance that you want to manage and click the database that you want to manage. Right-click the schema that you want to manage and select New foreign table.

  3. On the New foreign table tab, configure the parameters.

    Section

    Parameter

    Description

    Source MaxCompute Table

    Project Name

    The name of the MaxCompute project.

    Schema Name

    The name of the schema in which the MaxCompute table resides. If your MaxCompute project uses the two-layer model, this parameter is not displayed by default. If your MaxCompute project uses the three-layer model, you can select an authorized schema from the drop-down list.

    Table Name

    The name of the MaxCompute table. Prefix-based fuzzy search is supported.

    Destination Hologres Table

    Database Name

    The name of the Hologres database to which the foreign table belongs.

    Schema Name

    Schema Name

    You can select the default schema named public or a custom schema.

    Table Name

    The name of the Hologres foreign table.

    The name of the source MaxCompute table is automatically specified for this parameter. You can also manually rename the table.

    Destination Table Description

    The description of the Hologres foreign table that you create, which is user-defined.

    Note

    When you create a foreign table to synchronize data from a MaxCompute table, comments of both field names and columns of the MaxCompute table are synchronized to the foreign table.

  4. Click Submit. After the foreign table is submitted, you can refresh the left-side instance lists. The created foreign table is displayed in the schema that you selected.

  5. Optional. Preview the data in the table.

    1. In the Instance Management pane, click Instances Connected and double-click the desired table.

    2. On the table information tab, click Data preview to preview the data in the table.

      数据预览

  6. Optional. View the DDL statements of the table.

    On the table details tab, click DDL statement to view the DDL statements of the table.ddl语句

Optimize the performance of querying MaxCompute tables by using Hologres foreign tables

If the performance of querying MaxCompute tables in Hologres does not meet your requirements, you can merge small files in MaxCompute or optimize Hologres SQL statements. Hologres V0.10 and later adopt a new engine to query MaxCompute data. Compared with the versions earlier than V0.10, Hologres V0.10 and later improve query performance by about 30% to 100%. For more information, see Optimize the performance of querying MaxCompute tables in Hologres.

FAQ

For more information about the answers to frequently asked questions about the permissions required when you create a foreign table in Hologres to accelerate queries on MaxCompute data, see Permissions on MaxCompute.

For more information about the answers to frequently asked questions about the feature of creating a foreign table in Hologres to accelerate queries on MaxCompute data, see FAQ about integration with MaxCompute.

References

You can also execute MaxCompute SQL statements in Hologres to quickly perform operations on MaxCompute. For more information, see Execute MaxCompute SQL statements.