All Products
Search
Document Center

Hologres:Import data from MaxCompute to Hologres by executing SQL statements

Last Updated:Mar 19, 2024

If the size of your business data in MaxCompute exceeds 200 GB, the query complexity is high, and responses are required to be returned within seconds, you can import your business data from MaxCompute to Hologres internal tables for queries. This query method allows you to configure indexes and helps achieve a higher query efficiency than queries based on foreign tables. This topic describes how to import data from MaxCompute to Hologres and provides answers to some frequently asked questions.

Usage notes

When you import data from MaxCompute to Hologres by executing SQL statements, take note of the following items:

  • Partition fields in MaxCompute tables can be mapped to partition fields or regular fields in Hologres tables. Data can be imported from a partitioned MaxCompute table to a partitioned or non-partitioned Hologres table.

  • Hologres supports one level of partitioning. MaxCompute supports multiple levels of partitioning. When you import data from a MaxCompute table that involves multiple levels of partitioning to a partitioned Hologres table, you need to map only the first-level partition field of the MaxCompute table to the partition field of the Hologres table. Other partition fields in the MaxCompute table are mapped to regular fields in the Hologres table.

  • To overwrite existing data when you import data, you must use the INSERT ON CONFLICT(UPSERT) statement.

  • For more information about the data type mappings between MaxCompute and Hologres, see Data types.

  • After data in the MaxCompute table is updated, the data cached in Hologres is updated within 10 minutes. We recommend that you use the IMPORT FOREIGN SCHEMA statement to update the foreign table before you import data.

  • To import data from MaxCompute to Hologres, we recommend that you use SQL statements instead of the Data Integration service of DataWorks. This is because data import by executing SQL statements delivers better performance.

Import data from a MaxCompute non-partitioned table to a Hologres table for queries

  1. Create a non-partitioned table in MaxCompute.

    Create a non-partitioned table in MaxCompute. You can also use an existing non-partitioned table in MaxCompute.

    In this example, the customer table in the MaxCompute public dataset named public_data is used. You can log on to the MaxCompute console and query the dataset. For more information, see Public dataset reference. The following sample code shows the DDL statement that is used to create the customer table and the SELECT statement that is used to query data from the table:

    -- The DDL statement that is used to create the customer table.
    CREATE TABLE IF NOT EXISTS public_data.customer(
      c_customer_sk BIGINT,
      c_customer_id STRING,
      c_current_cdemo_sk BIGINT,
      c_current_hdemo_sk BIGINT,
      c_current_addr_sk BIGINT,
      c_first_shipto_date_sk BIGINT,
      c_first_sales_date_sk BIGINT,
      c_salutation STRING,
      c_first_name STRING,
      c_last_name STRING,
      c_preferred_cust_flag STRING,
      c_birth_day BIGINT,
      c_birth_month BIGINT,
      c_birth_year BIGINT,
      c_birth_country STRING,
      c_login STRING,
      c_email_address STRING,
      c_last_review_date STRING,
      useless STRING);
    
    -- The SELECT statement that is used to query the customer table.
    SELECT * FROM public_data.customer;

    The following figure shows part of the data in the customer table.customer

  2. Create a foreign table in Hologres.

    Create a foreign table in Hologres that is mapped to the source table in MaxCompute. Sample statement:

    CREATE FOREIGN TABLE foreign_customer (
        "c_customer_sk" int8,
        "c_customer_id" text,
        "c_current_cdemo_sk" int8,
        "c_current_hdemo_sk" int8,
        "c_current_addr_sk" int8,
        "c_first_shipto_date_sk" int8,
        "c_first_sales_date_sk" int8,
        "c_salutation" text,
        "c_first_name" text,
        "c_last_name" text,
        "c_preferred_cust_flag" text,
        "c_birth_day" int8,
        "c_birth_month" int8,
        "c_birth_year" int8,
        "c_birth_country" text,
        "c_login" text,
        "c_email_address" text,
        "c_last_review_date" text,
        "useless" text
    )
    SERVER odps_server
    OPTIONS (project_name 'public_data', table_name 'customer');

    Parameter

    Description

    Server

    The server in which the foreign table resides. You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see Postgres FDW.

    Project_Name

    The name of the project where the MaxCompute table resides.

    Table_Name

    The name of the MaxCompute table that you want to query.

    The data types of the fields in the foreign table must map to those in the MaxCompute table. For more information about the data type mappings, see Data type mappings between MaxCompute and Hologres.

  3. Create a table in Hologres to store imported data.

    Create a table in Hologres to store the data imported from the MaxCompute source table.

    The following sample statements are for reference only. When you create a table, configure the table schema based on your business requirements and create appropriate indexes to deliver better query performance. For more information about table properties, see Overview.

    -- Create a column-oriented table to store imported data.
    BEGIN;
    CREATE TABLE public.holo_customer (
     "c_customer_sk" int8,
     "c_customer_id" text,
     "c_current_cdemo_sk" int8,
     "c_current_hdemo_sk" int8,
     "c_current_addr_sk" int8,
     "c_first_shipto_date_sk" int8,
     "c_first_sales_date_sk" int8,
     "c_salutation" text,
     "c_first_name" text,
     "c_last_name" text,
     "c_preferred_cust_flag" text,
     "c_birth_day" int8,
     "c_birth_month" int8,
     "c_birth_year" int8,
     "c_birth_country" text,
     "c_login" text,
     "c_email_address" text,
     "c_last_review_date" text,
     "useless" text
    );
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'orientation', 'column');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'bitmap_columns', 'c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,c_last_review_date,useless');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'dictionary_encoding_columns', 'c_customer_id:auto,c_salutation:auto,c_first_name:auto,c_last_name:auto,c_preferred_cust_flag:auto,c_birth_country:auto,c_login:auto,c_email_address:auto,c_last_review_date:auto,useless:auto');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'time_to_live_in_seconds', '3153600000');
    CALL SET_TABLE_PROPERTY('public.holo_customer', 'storage_format', 'segment');
    COMMIT;
  4. Import data to the Hologres table.

    Execute the INSERT statement to import data from the MaxCompute source table to the Hologres table. You can import part of or all the fields. If you import part of the fields, make sure that the fields in the INSERT statement are arranged in the same order as the fields in the SELECT statement. Sample statements:

    -- Import part of the fields.
    INSERT INTO holo_customer (c_customer_sk,c_customer_id,c_email_address,c_last_review_date,useless)
    SELECT 
        c_customer_sk,
        c_customer_id,
        c_email_address,
        c_last_review_date,
        useless
    FROM foreign_customer;
    
    -- Import all the fields.
    INSERT INTO holo_customer
    SELECT * FROM foreign_customer;
  5. Query the data in Hologres.

    Query the data that is imported from the MaxCompute source table in Hologres. Sample statement:

    SELECT * FROM holo_customer;

Import data from a partitioned table in MaxCompute to a Hologres table for queries

For more information, see Import data from a partitioned MaxCompute table.

Best practices for the INSERT OVERWRITE statement

For more information, see INSERT OVERWRITE (Beta).

Synchronize data by using a visualization tool or periodic scheduling

You can synchronize a large amount of data at the same time to Hologres by using a visualization tool or periodic scheduling.

  • To synchronize data from MaxCompute by using HoloWeb, perform the following steps:

    1. Log on to the HoloWeb console. For more information, see Connect to HoloWeb.

    2. In the top navigation bar of the HoloWeb page, choose Metadata Management > MaxCompute Acceleration > Import MaxCompute Data.

    3. On the Create MaxCompute Data Import Task page, configure the parameters. 一键同步The following table describes the parameters.

      Note

      In the SQL Statements section, the SQL statements of the visualized operations are automatically generated. You cannot modify SQL statements in the SQL Statements section. However, you can copy the SQL statements to other places and modify them. Then, you can execute the modified SQL statements to synchronize data.

      Section

      Parameter

      Description

      Connection Information

      Instance Name

      The name of the instance.

      Database

      The name of the Hologres database.

      Select MaxCompute Source Table

      Source Table Type

      • Existing Foreign Table: uses an existing foreign table that maps the MaxCompute table.

      • New external table: creates a foreign table that is used to map the MaxCompute table.

      Source Schema

      The schema in which the foreign table that maps the MaxCompute table resides.

      This parameter is available if you set the Source Table Type parameter to Existing Foreign Table.

      External Table Name

      The name of the foreign table that maps the MaxCompute table.

      This parameter is available if you set the Source Table Type parameter to Existing Foreign Table.

      Server

      The server in which the foreign table resides. You can use the odps_server server that is created at the underlying layer of Hologres. For more information, see Postgres FDW.

      This parameter is available if you set the Source Table Type parameter to New external table.

      Table Name

      The name of the MaxCompute table, including the name of the MaxCompute project.

      Format: project.table_name.

      This parameter is available if you set the Source Table Type parameter to New external table.

      Destination Table

      Destination Schema

      The name of the schema in which the destination table resides.

      If a schema is not created, select the default public schema. If a schema is created, you can select the created schema.

      Destination Table Name

      The name of the Hologres internal table that is used to store MaxCompute data. The name cannot be the same as the name of the foreign table in the same schema.

      Destination Table Description

      The description of the destination table.

      Import Task

      Field

      The fields to be synchronized from the MaxCompute table.

      You can import all fields but not part of the fields.

      Partition

      Partition Field

      If you select a partition field, Hologres automatically creates a partitioned table as the destination table.

      Hologres supports one level of partitioning. MaxCompute supports multiple levels of partitioning. When you import data from a MaxCompute table that involves multiple levels of partitioning to a partitioned Hologres table, you need to set only the first-level partition field of the MaxCompute table for the destination table. Other partition fields in the MaxCompute table are mapped to regular fields in the destination table.

      Data Timestamp

      If a MaxCompute table is partitioned by date, you can specify a date. The system automatically imports data of the specified date to the destination table.

      Property

      Storage Format

      • Column storage: This mode is applicable to various complex queries.

      • Row storage: This mode is applicable to point queries and scans based on primary keys.

      Default value: Column storage.

      Data Lifecycle

      The lifecycle of table data. If you do not configure this parameter, the table data is permanently stored.

      If the data is not updated within the specified period, the system deletes the data after the period expires.

      Clustering Key

      The clustering key that is used to sort columns.

      The type of indexes determines the order of fields. Hologres can use clustering indexes to accelerate range and filter queries on index fields.

      Event Time Column

      The fields that are used to segment data. If the specified fields are involved in the query conditions, Hologres can find the storage location of data based on the fields.

      Dictionary Encoding

      The fields based on whose values a dictionary mapping is built.

      Dictionary encoding can convert string comparisons to numeric comparisons to accelerate GROUP BY and filter queries.

      By default, the system selects all the fields of the TEXT type for this parameter.

      Bitmap Column

      The bit fields on which bit codes are built.

      You can filter the data that meets the query conditions based on the specified fields.

      By default, the system selects all the fields of the TEXT type for this parameter.

      Distribution Column

      The distribution key. Hologres shuffles data to each shard based on the specified column. Data entries with the same distribution key value are distributed to the same shard. If you use a distribution key as a filter condition, the execution efficiency can be improved.

    4. Click Submit. After data is imported, you can query internal table data in Hologres.

  • HoloWeb does not support synchronization by using periodic scheduling. To synchronize a large amount of historical data or import data by using periodic scheduling, you must use the DataStudio service of DataWorks. For more information, see Use DataWorks to periodically import MaxCompute data.

FAQ

An out-of-memory (OOM) issue occurs when data is imported from MaxCompute to Hologres, and the returned error message indicates that the memory limit is exceeded. In most cases, the "Query executor exceeded total memory limitation xxxxx: yyyy bytes used" error message is reported. The following section describes possible causes for the error and the solutions.

  • Step 1

    • Possible causes:

      The INSERT statement contains a SELECT statement, but the ANALYZE statement is not executed on some tables. Another possible cause is that the ANALYZE statement is executed, but the data is updated, which causes inaccuracy. As a result, the join order determined by the query optimizer is invalid, and excessive memory overheads are generated.

    • Solution:

      Execute the ANALYZE statement on all involved internal and foreign tables to update the statistical metadata of the tables. This helps the query optimizer generate optimal execution plans and resolve the OOM issue.

  • Step 2

    • Possible causes:

      The table from which data is imported contains a large number of columns and each row contains a large amount of data. In this case, a large amount of data is read at the same time. This causes excessive memory overheads.

    • Solution:

      Run the following SET command before the INSERT statement to control the number of data rows to be read at a time. This reduces OOM issues.

      set hg_experimental_query_batch_size = 1024;-- Default value: 8192.
      insert into holo_table select * from mc_table;
  • Step 3

    • Possible causes:

      The concurrency of data import is high, which consumes large amounts of CPU resources and affects queries in internal tables.

    • Solution:

      For Hologres versions earlier than V1.1, the concurrency is specified by the hg_experimental_foreign_table_executor_max_dop parameter. The default concurrency is the number of CPU cores of the instance. Specify a smaller value for the hg_experimental_foreign_table_executor_max_dop parameter. This helps reduce memory usage during data import and resolve the OOM issue. This parameter takes effect on all jobs executed on the foreign table. Sample code:

      set hg_experimental_foreign_table_executor_max_dop = 8;
      insert into holo_table select * from mc_table;
  • Step 4

    • Possible causes:

      The concurrency of data import is high, which consumes large amounts of CPU resources and affects queries in internal tables.

    • Solution:

      For Hologres versions earlier than V1.1, the concurrency is specified by the hg_foreign_table_executor_dml_max_dop parameter. The default value is 32. Specify a smaller value for the hg_foreign_table_executor_dml_max_dop parameter. This helps decrease the concurrency of DML statements in data import and export scenarios and prevent DML statements from occupying excessive resources. Sample code:

      set hg_foreign_table_executor_dml_max_dop = 8;
      insert into holo_table select * from mc_table;