This topic describes how to create a foreign table in Hologres to achieve accelerated queries of MaxCompute data.

Background information

MaxCompute is a fast and fully managed computing platform for large-scale data warehousing. It can process exabytes of data. MaxCompute is used to store and compute large amounts of structured data at a time. It provides various data warehousing solutions as well as big data analytics and modeling services.

Hologres is a real-time interactive analytics engine that is compatible with PostgreSQL. Hologres seamlessly integrates with MaxCompute storage resources. In Hologres, you can perform accelerated queries of MaxCompute data by creating foreign tables, without the need to store data in Hologres or import data to and export data from Hologres. You can use Hologres to connect to mainstream business intelligence (BI) tools.

You can also import data to Hologres before you query the data. Compared with other similar services that are not oriented for the big data ecosystem, Hologres offers higher performance and faster data import and export.

You can select one of the following query methods based on the business features and scenarios:
  • Query MaxCompute data in Hologres.
    This method is suitable for scenarios in which the amount of underlying data to be scanned in a query is less than 200 GB, and the number of partitions to be queried at a time is less than 512.
    Note The amount of underlying data to be scanned in a query is the volume of data in the partitions that hit the filter conditions, not the size of queried fields. The limit on the amount of underlying data to be scanned in a query ensures the stability of data queries. If you perform accelerated queries of MaxCompute data by creating foreign tables, Hologres loads the MaxCompute data in the queried partitions to its memory and cache. If the amount of underlying data to be scanned is too large, a lot of bandwidth and computing resources are occupied, which affects the performance of concurrent queries.
  • Import MaxCompute data to Hologres before you query the data.

    This method has no limit on the amount of underlying data to be scanned in a query. The method is suitable for the following scenarios: complex queries, index-based queries, and queries that involve UPDATE, INSERT, and DELETE operations.

Usage notes

When you create a foreign table in Hologres to accelerate queries of MaxCompute data, take note of the following items:

  • You can accelerate queries of only internal tables in MaxCompute. Queries of foreign tables and views in MaxCompute cannot be accelerated.
  • The amount of underlying data to be scanned in a query cannot exceed 200 GB, and the number of partitions to be queried at a time cannot exceed 512. However, if you import MaxCompute data to Hologres internal tables before you query the data, the amount of underlying data to be scanned and the number of partitions to be queried are not limited.
  • After data in a MaxCompute table is updated, the cache in Hologres is automatically updated within 5 minutes. Before that, you cannot accelerate queries of the updated data. If you want to query the updated data in real time, execute the IMPORT FOREIGN SCHEMA statement to update the metadata of the foreign table. If you import data from the MaxCompute table to Hologres V1.1.25 or later, you do not need to manually update the metadata of the foreign table. After you execute the statement used to import data, Hologres automatically obtains the updated metadata of the MaxCompute table.
  • If the schema of a MaxCompute table is updated, the cache in Hologres is not automatically updated. You must manually update the foreign table.
  • Partition fields in MaxCompute tables can be mapped to partition fields or regular fields in Hologres tables. Therefore, you can import data from a partitioned MaxCompute table to a partitioned or non-partitioned Hologres table.
  • Data types that are supported by MaxCompute and those supported by Hologres have a one-to-one mapping relationship. You can create tables based on the mappings between data types. For more information, see Data types.
  • You can query MaxCompute data across regions. However, if your Hologres instance resides in China, you can accelerate only the queries of data in MaxCompute tables that reside in China. In addition, if your Hologres instance resides outside China, you cannot accelerate queries of MaxCompute data across regions. We recommend that you do not accelerate queries of MaxCompute data across regions. We recommend that you create a foreign table in the region in which a MaxCompute table resides. Otherwise, the stability of data queries cannot be ensured because of possible network errors.
  • Foreign tables do not store data and are used only to map the source data. The source data is stored in MaxCompute tables.
  • MaxCompute transactional tables and tables in the schema evolution state are not supported.
  • When you use a streaming tunnel to write data to a MaxCompute table, the table is in the streaming state. The data in the table is asynchronously compacted and converted to the Optimized Row Columnar (ORC) format. Hologres V1.1.45 and later can read tables in the streaming state. Hologres of earlier versions cannot read tables whose compaction is not complete.

Query data in a non-partitioned MaxCompute table

  1. Create a non-partitioned table in MaxCompute.
    Create a non-partitioned table in MaxCompute and import data to the table. For more information, see Create tables. You can also use an existing non-partitioned MaxCompute table.
    In this example, an existing non-partitioned MaxCompute table is used. To create a non-partitioned table in MaxCompute and import data to the table, execute the following statements:
    CREATE TABLE weather (
        city            STRING ,
        temp_lo         int,           -- The lowest temperature.
        temp_hi         int           -- The highest temperature.
    );
    INSERT INTO weather VALUES 
    ('beijing',40,50),
    ('hangzhou',46,55);
  2. Create a foreign table in Hologres.
    Create a foreign table in Hologres to map the source table in MaxCompute. You can configure the foreign table to include some or all fields in the MaxCompute table. For example, you can execute the following statement to create a foreign table:
    CREATE FOREIGN TABLE weather1 (
     city text,
     temp_lo int8,
     temp_hi int8
    )
    SERVER odps_server
    OPTIONS (project_name '<projectname>',table_name 'weather');
    The following table describes the parameters in the statement.
    Parameter Description
    SERVER The server on which you want to create the foreign table.

    You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.

    project_name The name of the project in which the MaxCompute table to be queried resides.
    table_name The name of the MaxCompute table to be queried.
    Note
    • The data types of the fields in the foreign table must map those in the MaxCompute table. For more information about the mappings between data types, see Data type mappings between MaxCompute and Hologres when you create a foreign table.
    • Hologres allows you to execute the IMPORT FOREIGN SCHEMA statement to create multiple foreign tables at a time. For more information, see IMPORT FOREIGN SCHEMA.
    • You can accelerate queries of only internal tables in MaxCompute. Queries of foreign tables and views in MaxCompute cannot be accelerated.
  3. Query data in the non-partitioned MaxCompute table by using the foreign table in Hologres.
    After you create a foreign table, you can directly query the foreign table for data in the mapping MaxCompute table. For example, you can execute the following statement to query the foreign table:
    SELECT * FROM weather1;

Query data in a partitioned MaxCompute table

  1. Create a partitioned table in MaxCompute.
    Create a partitioned table in MaxCompute and import data to the table. For more information, see Partition and column operations. You can also use an existing partitioned MaxCompute table.
    In this example, an existing partitioned MaxCompute table is used. To create a partitioned table in MaxCompute, execute the following statement:
    create table odps_test
    (
        shop_name     string,
        customer_id   string,
        total_price   INT 
    )
    partitioned by (sale_date string);
  2. Create a foreign table in Hologres.
    Create a foreign table in Hologres to map the source table in MaxCompute. For example, you can execute the following statement to create a foreign table:
    CREATE FOREIGN TABLE table_odps (
     shop_name text,
     customer_id text,
     total_price int8,
     sale_date text
    )
    SERVER odps_server
    OPTIONS (project_name '<projectname>', table_name 'odps_test');
    Note A foreign table in Hologres is used to map fields in the source table, but is not used to store data. A partition field of the source table in MaxCompute maps a regular field of the foreign table in Hologres.
  3. Query data in the partitioned MaxCompute table by using the foreign table in Hologres.
    To query all the data in the partitioned table, execute the following statement:
    SELECT * FROM table_odps;
    To query the data in a specific partition, execute the following statement:
    SELECT * FROM table_odps 
    WHERE sale_date = '2013';

Create multiple foreign tables at a time

To accelerate queries of a MaxCompute table that contains a large amount of data, you can create multiple foreign tables at a time. In Hologres, you can create multiple foreign tables at a time by executing SQL statements or by using the Hologres console.

Create a foreign table in the HoloWeb console

You can use HoloWeb to create a foreign table and query table data in a visualized way, without the need to write SQL statements. To do so, perform the following steps:

  1. Log on to the HoloWeb console. For more information, see HoloWeb quick start.
  2. In the top navigation bar of the HoloWeb console, choose Metadata Management > MaxCompute Acceleration > Create Foreign Table.
    You can also find the left-side Instances Connected list on the Metadata Management tab. Click the instance that you want to manage and click the database that you want to manage. Right-click the schema that you want to manage and select New external table. Create an internal table
  3. On the New external table tab, set the parameters as required. Click Submit form. Create a foreign table
    Parameter Description
    Instance Name The name of the current instance.
    Database The name of the current database.
    Table name The name of the Hologres foreign table.

    After you enter the name of the source MaxCompute table, the name of the foreign table must be the same as that of the source MaxCompute table. In this case, you cannot modify the name of the foreign table. To rename the foreign table, you must create the foreign table, find the foreign table in the left-side Instances Connected list, and then right-click the foreign table.

    Description The description of the Hologres foreign table.
    Schema The name of the schema.

    You can select the default schema public or a custom schema.

    Type The service type of the source table.

    The default value is MaxCompute, which cannot be changed.

    Server List You can select the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.
    Table The name of the source MaxCompute table to be mapped, including the name of the MaxCompute project.

    Format: project.table_name.

    Note
    • You cannot query data in a source table that is stored in a region different from that of the selected Hologres instance.
    • After you enter the name of the source MaxCompute table, all the fields of the source table are displayed. By default, the created foreign table contains all the fields of the source table. If you need to create a foreign table that contains only specific fields of the source table, use SQL statements to create the foreign table. For more information, see CREATE FOREIGN TABLE.
    Note When you create a foreign table to synchronize data from a MaxCompute table, the table comments and column comments of the MaxCompute table are synchronized to the foreign table.
  4. Click Submit form. The foreign table is created. After the foreign table is submitted, you can refresh the left-side instance lists. The created foreign table is displayed under the schema that you selected.
  5. Optional. Preview the data in the table.
    1. In the Instance Management pane, click Instances Connected and double-click the table whose data you want to preview.
    2. On the table information tab, click Data preview to preview the data in the table.
      Data preview tab
  6. Optional. View the DDL statements used to create the table.
    On the table information tab, click DDL statement to view the DDL statements used to create the table. DDL statement

Optimize the performance of querying MaxCompute tables in Hologres

If the performance of querying MaxCompute tables in Hologres does not meet your requirements, you can merge small files in MaxCompute or optimize Hologres SQL statements. Hologres V0.10 and later adopt a new engine to query MaxCompute data. Compared with the versions earlier than V0.10, Hologres V0.10 and later improve query performance by about 30% to 100%. For more information, see Optimize the performance of querying MaxCompute tables in Hologres.

FAQ

For more information about the answers to frequently asked questions about the feature of creating a foreign table in Hologres to accelerate queries of MaxCompute data, see Common errors and troubleshooting methods of integration with MaxCompute.