This topic describes how to import data from MaxCompute to Hologres by executing SQL statements.

Background information

Foreign tables do not store data or provide indexing capabilities. In specific scenarios, you may have more than 200 GB of business data in MaxCompute and want to receive response to complex data queries within seconds. To meet this requirement, you can import MaxCompute data to Hologres for queries. This enables faster queries than queries based on foreign tables.

Usage notes

When you import data from MaxCompute to Hologres by executing SQL statements, take note of the following items:

  • Partition fields in MaxCompute tables can be mapped to partition fields or regular fields in Hologres tables. Therefore, data can be imported from a partitioned MaxCompute table to a partitioned or non-partitioned Hologres table.
  • In a partitioned Hologres table, the partition key can include only a single partition field. If you import data from a partitioned MaxCompute table whose partition key includes multiple partition fields to a partitioned Hologres table, you need only to map one of the partition fields in the MaxCompute table to the partition key of the Hologres table. Other partition fields in the MaxCompute table are mapped to regular fields in the Hologres table.
  • To update and overwrite existing data when you import data, you must use an INSERT ON CONFLICT statement.
  • For information about data type mappings between MaxCompute and Hologres, see Data types.
  • After data in MaxCompute is updated, up to 10 minutes are required to update data in the Hologres table. We recommend that you use an IMPORT FOREIGN SCHEMA statement to update the foreign table before you import data.
  • To import data from MaxCompute to Hologres, we recommend that you use SQL statements instead of the Data Integration service of DataWorks. This is because data import by executing SQL statements delivers better performance.

Import data from a non-partitioned MaxCompute table to Hologres for queries

  1. Create a non-partitioned table in MaxCompute.
    Create a non-partitioned table in MaxCompute. You can also use an existing non-partitioned table in MaxCompute.
    In this example, the customer table in the MaxCompute public dataset named public_data is used. You can log on to the MaxCompute console and query the dataset. For more information, see Overview. The following code shows the DDL statement that is used to create the table and the SELECT statement that is used to query the table:
    --The DDL statement that is used to create the customer table.
    CREATE TABLE IF NOT EXISTS public_data.customer(
      c_customer_sk BIGINT,
      c_customer_id STRING,
      c_current_cdemo_sk BIGINT,
      c_current_hdemo_sk BIGINT,
      c_current_addr_sk BIGINT,
      c_first_shipto_date_sk BIGINT,
      c_first_sales_date_sk BIGINT,
      c_salutation STRING,
      c_first_name STRING,
      c_last_name STRING,
      c_preferred_cust_flag STRING,
      c_birth_day BIGINT,
      c_birth_month BIGINT,
      c_birth_year BIGINT,
      c_birth_country STRING,
      c_login STRING,
      c_email_address STRING,
      c_last_review_date STRING,
      useless STRING);
    
    --The SELECT statement that is used to query the customer table.
    SELECT * FROM public_data.customer;
    The following figure shows part of the data in the customer table. customer
  2. Create a foreign table in Hologres.
    Create a foreign table in Hologres to map the source table in MaxCompute. For example, you can execute the following SQL statement:
    CREATE FOREIGN TABLE foreign_customer (
        "c_customer_sk" int8,
        "c_customer_id" text,
        "c_current_cdemo_sk" int8,
        "c_current_hdemo_sk" int8,
        "c_current_addr_sk" int8,
        "c_first_shipto_date_sk" int8,
        "c_first_sales_date_sk" int8,
        "c_salutation" text,
        "c_first_name" text,
        "c_last_name" text,
        "c_preferred_cust_flag" text,
        "c_birth_day" int8,
        "c_birth_month" int8,
        "c_birth_year" int8,
        "c_birth_country" text,
        "c_login" text,
        "c_email_address" text,
        "c_last_review_date" text,
        "useless" text
    )
    SERVER odps_server
    OPTIONS (project_name 'public_data', table_name 'customer');
    Parameter Description
    Server The server where the foreign table resides. You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.
    Project_Name The name of the project where the MaxCompute table to be queried resides.
    Table_Name The name of the MaxCompute table to be queried.
    The data types of the fields in the foreign table must map those in the non-partitioned MaxCompute table. For more information about the mappings between data types, see Data type mappings between MaxCompute and Hologres when you create a foreign table.
  3. Create a table in Hologres to store imported data.
    Create a table in Hologres to store the data imported from the non-partitioned MaxCompute table.
    The following sample statements are for reference only. When you create your own table, set the schema based on your business needs and create appropriate indexes to achieve better query performance. For more information about table properties, see CREATE TABLE.
    --Create a column-oriented table to store imported data.
    BEGIN;
    CREATE TABLE public.holo_customer (
     "c_customer_sk" int8,
     "c_customer_id" text,
     "c_current_cdemo_sk" int8,
     "c_current_hdemo_sk" int8,
     "c_current_addr_sk" int8,
     "c_first_shipto_date_sk" int8,
     "c_first_sales_date_sk" int8,
     "c_salutation" text,
     "c_first_name" text,
     "c_last_name" text,
     "c_preferred_cust_flag" text,
     "c_birth_day" int8,
     "c_birth_month" int8,
     "c_birth_year" int8,
     "c_birth_country" text,
     "c_login" text,
     "c_email_address" text,
     "c_last_review_date" text,
     "useless" text
    );
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'orientation', 'column');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'bitmap_columns', 'c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,c_last_review_date,useless');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'dictionary_encoding_columns', 'c_customer_id:auto,c_salutation:auto,c_first_name:auto,c_last_name:auto,c_preferred_cust_flag:auto,c_birth_country:auto,c_login:auto,c_email_address:auto,c_last_review_date:auto,useless:auto');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'time_to_live_in_seconds', '3153600000');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'storage_format', 'segment');
    COMMIT;
  4. Import data to Hologres.
    Execute an INSERT statement to import data from the non-partitioned MaxCompute table to Hologres. You can import part of or all the fields. If you import part of the fields, make sure that the fields are arranged in the same order as the mapping fields in the Hologres table created in Step 3. For example, you can execute one of the following statements:
    --Import part of the fields.
    INSERT INTO holo_customer
    SELECT 
        "c_customer_sk",
        "c_customer_id",
        "c_email_address",
        "c_last_review_date",
        "useless"
    FROM foreign_customer;
    
    --Import all the fields.
    INSERT INTO holo_customer
    SELECT * FROM foreign_customer;
  5. Query the data imported from the non-partitioned MaxCompute table in Hologres.
    Execute a SELECT statement to query the data that is imported from the non-partitioned MaxCompute table in Hologres. For example, you can execute the following SQL statement:
    SELECT * FROM holo_customer;

Import data from a partitioned MaxCompute table to Hologres for queries

For more information, see Import data from a partitioned MaxCompute table.

Synchronize data by using a visualization tool or periodic scheduling

You can synchronize a large amount of data at a time to Hologres by using a visualization tool or periodic scheduling.

  • To synchronize data from MaxCompute by using HoloWeb, perform the following steps:
    1. Go to the HoloWeb console. For more information, see HoloWeb quick start.
    2. In the top navigation bar of the HoloWeb console, choose Metadata Management > MaxCompute Acceleration > Import MaxCompute Data.
    3. On the Create MaxCompute Data Import Task page, configure the parameters described in the following table. Import MaxCompute Data
      Note In the SQL Statements section, the SQL statements of the visualized operations are automatically generated in the SQL editor. You cannot modify SQL statements in the SQL Statements section. However, you can copy the SQL statements to other places and modify them. Then, you can execute the modified SQL statements.
      Section Parameter Description
      Connection Information Instance Name The name of the current instance.
      Database The name of the current database.
      Select MaxCompute Source Table Source Table Type
      • Existing Foreign Table: uses an existing foreign table that maps the MaxCompute table.
      • New external table: creates a foreign table that is used to map the MaxCompute table.
      Source Schema The name of the schema in which the foreign table that maps the MaxCompute table resides.

      This parameter is available if you set Source Table Type to Existing Foreign Table.

      Foreign Table Name The name of the foreign table that maps the MaxCompute table.

      This parameter is available if you set Source Table Type to Existing Foreign Table.

      Server The server on which the foreign table is stored. You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see postgres_fdw.

      This parameter is available if you set Source Table Type to New external table.

      Table Name The name of the MaxCompute table, including the name of the MaxCompute project.

      Format: project.table_name.

      This parameter is available if you set Source Table Type to New external table.

      Destination Table Destination Schema The name of the schema in which the destination table resides.

      If you have not created a schema, select the default public schema. If you have created a schema, you can select the created schema.

      Destination Table Name The name of the destination table. The name of the Hologres internal table that is used to store MaxCompute data cannot be the same as the name of the foreign table in the same schema.
      Destination Table Description The description of the destination table.
      Import Task Field The fields to be synchronized from the MaxCompute table.

      You can import all fields but not part of the fields.

      Partition Partition Field If you select a partition field, Hologres automatically creates a partitioned table as the destination table.

      Hologres supports a maximum of one level of partitioning. When you import data from a MaxCompute table that involves multiple levels of partitioning, you must set only the first-level partition field of the MaxCompute table for the destination table. Other partition fields in the MaxCompute table are mapped to regular fields in the destination table.

      Data Timestamp If a MaxCompute table is partitioned by date, you can specify a date. The system automatically imports data of the specified date to the destination table.
      Property Storage Format
      • Column storage: This mode is applicable to various complex queries.
      • Row storage: This mode is applicable to point queries and scans based on primary keys.

      Default value: Column storage.

      Lifecycle The lifecycle of table data. Default value: Forever.

      If the data is not updated within the specified period, the system deletes the data after the period expires.

      Clustered Index The clustering key that is used to sort columns.

      The type of index determines the order of fields. Hologres can use clustered indexes to accelerate range and filter queries on index fields.

      Event Time Column The fields that are used to segment data. If the specified fields are involved in the query conditions, Hologres can find the storage location of data based on the fields.
      Dictionary Encoding The fields based on whose values a dictionary mapping is built.

      Dictionary encoding can convert string comparisons to numeric comparisons to accelerate queries such as GROUP BY and FILTER.

      By default, the system selects all the fields of the text type for this parameter.

      Bitmap Column The bit fields on which bit codes are built.

      You can filter the data that meets the query conditions based on the specified fields by using Hologres.

      By default, the system selects all the fields of the text type for this parameter.

      Distribution Column The distribution key. Hologres shuffles data to each shard based on the specified column. Data entries with the same distribution key value are distributed to the same shard. If you use a distribution key as a filter condition, the execution efficiency can be improved.
    4. Click Submit. After data is imported, you can query internal table data in Hologres.
  • HoloWeb does not support synchronization by using periodic scheduling. To synchronize a large amount of historical data or import data by using periodic scheduling, you must use the DataStudio service of DataWorks. For more information, see Use DataWorks to periodically import MaxCompute data.

Troubleshoot a common issue

An out-of-memory (OOM) issue occurs when data is imported from MaxCompute to Hologres, and the returned error message indicates that the memory limit is exceeded. The Query executor exceeded total memory limitation xxxxx: yyyy bytes used error is reported. The following part describes four possible causes for the error and their corresponding solutions.

  • Step 1
    • Possible cause:

      The INSERT statement contains a SELECT statement, but the ANALYZE statement has not been executed for some tables. Alternatively, the ANALYZE statement has been executed, but the data is updated, which causes inaccuracy. Consequently, the join order decision of the query optimizer is invalid, and excessive memory overheads are caused.

    • Solution:

      Execute the ANALYZE statement for all involved internal and foreign tables to update the statistical metadata of the tables. This can help the query optimizer generate better execution plans and resolve the OOM issue.

  • Step 2
    • Possible cause:

      The table from which the data is imported contains a large number of columns and each row contains a amount of data. In this case, a large amount of data is read at a time. This causes excessive memory overheads.

    • Solution:
      Add a set flag before the INSERT statement to control the number of data rows to be read at a time, as shown in the following code. This can reduce OOM issues.
      set hg_experimental_query_batch_size = 1024;--Default value: 8192.
      insert into holo_table select * from mc_table;
  • Step 3
    • Possible cause:

      A large amount of data is imported, which consumes large amounts of CPU resources and affects queries in internal tables.

    • Solution:
      For Hologres versions earlier than V1.1, the concurrency is specified by the hg_experimental_foreign_table_executor_max_dop parameter. The default concurrency is the number of cores of the instance. Specify a smaller value for the hg_experimental_foreign_table_executor_max_dop parameter in the set flag before the INSERT statement. This can reduce memory usage during data import and resolve the OOM issue. This parameter takes effect on all jobs executed on the foreign table. Sample code:
      set hg_experimental_foreign_table_executor_max_dop = 8;
      insert into holo_table select * from mc_table;
  • Step 4
    • Possible cause:

      A large amount of data is imported, which consumes large amounts of CPU resources and affects queries in internal tables.

    • Solution:
      For Hologres versions earlier than V1.1, the concurrency is specified by the hg_experimental_foreign_table_executor_dml_max_dop parameter. The default value is 32. Specify a smaller value for the hg_experimental_foreign_table_executor_dml_max_dop parameter in the set flag before the INSERT statement. This can decrease the concurrency of DML statements in data import and export scenarios and prevent DML statements from occupying excessive resources. Sample code:
      set hg_experimental_foreign_table_executor_dml_max_dop = 8;
      insert into holo_table select * from mc_table;