All Products
Search
Document Center

Hologres:FAQ about integration with MaxCompute

Last Updated:Mar 21, 2024

Background information

  • The following table compares Hologres with MaxCompute.

    Item

    MaxCompute

    Hologres

    Use scenario

    Data extract, transform, and load (ETL) at the data warehouse detail (DWD) and data warehouse service (DWS) layers.

    Interactive query and online data services at the application data service (ADS) layer.

    Usage

    Asynchronous MaxCompute jobs.

    Synchronous queries.

    Cluster

    Large shared software as a service (SaaS)-based clusters.

    Exclusive platform as a service (PaaS)-based clusters.

    Compute engine

    Uses the Job Execution model to convert MaxCompute jobs to stages and applies for computing resources for each stage based on your business requirements. File-based persistence is implemented during the job execution.

    Uses the Massively Parallel Processing (MPP) model to manage memory resources in a fine-grained manner, uses resident memory resources for the query engine (QE), and schedules fine-grained SQL operators in user mode. Computing results are not stored on disks.

    Scheduling mode

    Supports the scheduling of processes. Resources are applied for and allocated based on your business requirements.

    Supports lightweight threads. Resources are reserved upon startup.

    Scalability

    Almost unrestricted.

    Avoids data shuffle across nodes in complex queries.

    Storage mode

    Column-oriented data storage.

    Row-oriented storage, column-oriented storage, or row-column hybrid storage.

    Storage cost

    Provides data storage based on Apsara Distributed File System, which is cost-effective.

    Provides data storage based on Apsara Distributed File System and cache-based acceleration based on SSDs. This brings relatively high costs.

    Supported interface

    Interfaces compatible with MaxCompute SQL, which is equivalent to Hive SQL.

    Interfaces compatible with PostgreSQL.

  • You can use Hologres internal tables or foreign tables to accelerate queries.

    • Create a foreign table to directly query external data.

      Foreign tables do not store data. Instead, data is stored in MaxCompute source tables. Indexes cannot be created for foreign tables, so data queries by using foreign tables consume a large number of CPU resources. Therefore, we recommend that you use this query acceleration method to query a small amount of data with low queries per second (QPS).

    • Import data to a Hologres internal table to accelerate queries.

      Data in Hologres internal tables is stored in Hologres. To maximize the advantages of Hologres, we recommend that you use this query acceleration method if the data that you want to query has been updated, the queries are complex, or the QPS is high.

Performance tuning

What do I do if the query performance significantly deteriorates after schema-related operations are performed on the foreign table of MaxCompute?

  • Cause: When you use a foreign table to accelerate the query on MaxCompute data from Hologres, the Hologres Query Engine (HQE) that uses the direct read mode is used by default. If the schema of the MaxCompute foreign table changes, the Seahawks Query Engine (SQE) is used. In this case, the access process is longer, and the query performance deteriorates.

  • Solution: Query the hologres.hg_query_log table to obtain slow query logs. If engine_type is SQE, this issue is located. In this case, recreate a MaxCompute foreign table for schema change.

FAQ

What do I do if the error message "SERVER_INTERNAL_ERROR message: GetRecordBatch() is not implemented" is reported?

The following error message is reported when I query data by using a MaxCompute foreign table: query next from foreign table executor failed, GetRecordBatch() is not implemented.

  • Cause: Hologres can read data from a MaxCompute table only if the table is of the CFlie type. The current MaxCompute table is not of the CFlie type.

  • Solution: Add the following Grand Unified Configuration (GUC) parameter setting before the SQL statement that you want to execute for a data query:

    set hg_experimental_enable_access_odps_with_table_api =on;

What do I do if the error message "ERROR: not an ORC file" is reported?

The following error message is reported when I query data by using a foreign table: ERROR: status { code: SERVER_INTERNAL_ERROR message: "hos_exception: Invalid argument: ../../../src/OrcFileMetaHelper.cc(234)-<operator()>: not an ORC file.

  • Cause: The foreign table maps to a stream table to which data is written by using MaxCompute Streaming Tunnel. Hologres cannot use a foreign table to read data from a MaxCompute table of the stream type.

  • Solution: Add the following GUC parameter settings before the SQL statement that you want to execute for a data query:

    set hg_experimental_enable_access_odps_with_table_api=on;
    set hg_experimental_enable_access_odps_orc_via_holo =off;

What do I do if the error message "specified partitions count in MaxCompute table: exceeds the limitation of 512, please add stricter partition filter or set axf_MaxCompute_partition_limit" or "Build desc failed: Exceeds the partition limitation of 512, current match xxx partitions" is reported?

  • Cause:

    The number of partitions queried in Hologres exceeds the upper limit of 512.

  • Solution:

    • Specify filter conditions to filter out unnecessary partitions before you execute the query statement. This ensures that the number of partitions in a query does not exceed 512.

    • Import data to a Hologres internal table to accelerate data queries. This way, no upper limit is configured on the number of partitions. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

    • Execute one of the following statements to configure the upper limit on the number of partitions. By default, the upper limit is 512. You can set it to an integer up to 1024. To ensure that query performance is not affected, we recommend that you do not set the upper limit to an integer that is excessively large.

      -- Hologres V1.1 and later
      set hg_foreign_table_max_partition_limit = 128;
      -- Hologres V0.10
      set hg_experimental_foreign_table_max_partition_limit = xxx ;
      Note

      If you set multiple levels of partitions for MaxCompute source tables, Hologres counts the number of queried partitions based on the most fine-grained level.

What do I do if the error message "Build desc failed: Exceeds the scan limitation of 200 GB, current scan xxx GB" is reported?

  • Cause:

    By default, Hologres allows a maximum of 200 GB underlying data to be scanned in a query. The upper limit is the limit on the amount of data scanned in MaxCompute partitions, but not on the amount of data stored in MaxCompute partitions. The preceding error message is reported if the amount of scanned data exceeds the upper limit.

  • Solution:

    • Specify filter conditions to filter out unnecessary partitions before you execute the query statement. This ensures that the amount of underlying data to be scanned in a query does not exceed 200 GB.

    • Import data from the MaxCompute table to a Hologres internal table and query data from the internal table. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

    • (Not recommended) Execute the set hg_experimental_foreign_table_max_scan_size = xxx; statement to increase the upper limit on the amount of underlying data to be scanned. Replace xxx with an integer, such as 400. Unit: GB. However, if the integer you entered is excessively large, the query performance may deteriorate or an out of memory (OOM) exception may occur.

What do I do if the error message "query next from foreign table executor failed, Not implemented" is reported?

  • Cause: The MaxCompute Streaming Tunnel command tunnel.createStreamUploadSession is executed to write data to the MaxCompute table from which you read data. When you use Hologres to read data from the table, you must configure two GUC parameters.

  • Solution:

    • Add the following GUC parameter settings:

      Note

      Only Hologres V1.3 and later support the GUC parameters.

      set hg_experimental_enable_access_odps_with_table_api=on;
      set hg_experimental_enable_access_odps_orc_via_holo =off;
    • If the version of your Hologres instance is earlier than V1.3, we recommend that you stop writing data to the MaxCompute table by using MaxCompute Streaming Tunnel. Then, execute the following statements to merge data:

      set odps.merge.task.mode=sql;
      set odps.merge.lock.expire.time=0;
      ALTER TABLE tablename [PARTITION] MERGE SMALLFILES;

What do I do if the following error message is reported: Build desc failed: failed to check permission: Currently not supported table type "view"?

Cause: Hologres does not support queries of data in MaxCompute views.

What do I do if the error message "Build desc failed: failed to get foreign table split:MaxCompute-0010000: System internal error - get input pangu dir meta fai" is reported?

  • Cause:

    The parameters that you configure to read MaxCompute data are not updated.

  • Solution:

    Wait a few minutes and try again. If the error persists after multiple retries, contact technical support for troubleshooting.

What do I do if the error message "ERROR: Query:[xxx] Build desc failed: failed to get foreign table split:ERPC_ERROR_CONNECTION_CLOSED" is reported?

  • Cause:

    Due to a large number of small files in MaxCompute, the amount of metadata of the request exceeds the upper limit of 1 GB governed by the Remote Procedure Call (RPC) protocol.

  • Solution:

    • Execute the following statements to merge small files:

      set MaxCompute.merge.task.mode=sql;
      set MaxCompute.merge.lock.expire.time=0;
      ALTER TABLE <tablename> [PARTITION] MERGE SMALLFILES;
    • Upgrade your Hologres instance. This issue is resolved in Hologres V0.10.21 and later. For more information, see Instance upgrades.

    • Contact technical support personnel of MaxCompute if the amount of data that you want to query is small. They can help you write the required data to Hologres.

What do I do if the following error message is reported: ERROR: status { code: SERVER_INTERNAL_ERROR message: "hos_exception: IO error: Failed to execute pangu open normal file, err: PanguParameterInvalidException" }?

  • Cause:

    Hologres Query Engine (HQE) cannot read MaxCompute data that is encrypted and stored in Apsara Distributed File System.

  • Solution:

    Execute the ALTER DATABASE <dbname> SET hg_experimental_enable_access_odps_orc_via_holo = false; statement to change the query engine (QE) of foreign tables to Seahawks Query Engine (SQE). SQE allows you to access encrypted data in MaxCompute. This configuration takes effect at the database level, and you need to recreate a connection to validate the configuration. You can also execute the SET hg_experimental_enable_access_odps_orc_via_holo = false; statement at the session level.

What do I do if the error message "failed to import foreign schema:Failed to get MaxCompute table:Not enable schema evolution" is reported?

  • Cause:

    The metadata of the MaxCompute table is modified.

  • Solution:

    • Upgrade the version of your Hologres instance to V1.3 or later. You can manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see the Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

    • Execute the IMPORT FOREIGN SCHEMA statement to recreate the corresponding foreign table. This operation is required if the schema of the MaxCompute source table is updated. For example, a column is created or deleted.

    • In some cases, the error persists after you execute the IMPORT FOREIGN SCHEMA statement. This is because the MaxCompute table is in the schema evolution state after its schema is modified. Tables in this state cannot be read by Hologres. To fix this error, recreate the MaxCompute table and then recreate the corresponding foreign table.

What do I do if the error message "Open ORC file failed for schema mismatch. Reader schema" is reported?

  • Cause:

    In cases where the MaxCompute table is in the Optimized Row Columnar (ORC) format, if the storage mode of data of the DECIMAL type in the table is modified, Hologres cannot read this type of data. This usually happens when fields of the DECIMAL type are added to the table or the configurations of the table are modified in canary release mode.

  • Solution:

    • Execute the set MaxCompute.storage.orc.enable.binary.decimal=false statement to import data from the MaxCompute table to a Hologres table again.

    • Convert the data of the DECIMAL type in the MaxCompute table to data of the DOUBLE type, and then update the data.

What do I do if the error message "failed to import foreign schema:Failed to get MaxCompute table:Not enable acid table" is reported?

  • Cause:

    The MaxCompute table is a transactional table.

  • Solution:

    Change the table to a regular MaxCompute table. Hologres does not allow you to read data from a MaxCompute transactional table.

What do I do if the error message "Request denied, may caused by server busy" is reported?

  • Cause:

    The resources allocated for the foreign table are fully occupied, and the CPU utilization exceeds the expected value.

  • Solution:

    • Optimize SQL statements to reduce the amount of resources to use. For more information, see Optimize the performance of querying MaxCompute tables in Hologres.

    • Reduce the concurrency of data queries.

      1. Execute the show hg_foreign_table_executor_max_dop; statement to view the current concurrency configurations.

      2. Execute the following statement to reduce the concurrency by 50%:

        -- Syntax
        set hg_foreign_table_executor_max_dop = <Concurrency>; 
        -- Sample statement 
        set hg_foreign_table_executor_max_dop = 18;

        Concurrency: the number of queries performed by an execution node. Valid values: 0 to 1024. Default value: 256. If the concurrency that you configure is high, an OOM exception may occur. As a result, the data import or query fails, or even the Hologres instance restarts. During the restart, services are unavailable. If the concurrency that you configure is low, the performance of querying foreign tables or importing MaxCompute data may be poor.

    • Import data from the MaxCompute table to a Hologres internal table. You can create indexes for the Hologres internal table, which improves data query performance. For more information, see Import data from MaxCompute to Hologres by executing SQL statements.

What do I do if the error message "Query executor exceeded total memory limitation xxxxx: yyyy bytes used" is reported when I import data?

  • Cause:

    The upper limit on the memory size is exceeded because the amount of imported data is oversized or the logic of data import is complex. A Hologres instance can contain multiple nodes. For each node, the upper limit on the memory size is 64 GB. Memory resources of each node are evenly allocated for data computing, data caching, and metadata storage. In this example, the memory resources for data computing are exhausted.

  • Solution:

    1. Query the execution plan.

      You can execute the explain analyze sql; statement to query the number of rows involved in the execution plan. In some cases, the INSERT statement contains a SELECT statement, but no ANALYZE operation has been performed on some tables. In other cases, the table data is updated after ANALYZE operations are executed, which leads to data inaccuracy. In all these cases, the join order decision of the query optimizer is invalid, and memory overheads are high.

      To fix this issue, you can execute the analyze tablename; statement on all involved internal and foreign tables to update the statistical metadata of the tables. This helps the query optimizer generate better execution plans.

    2. Limit the number of data entries imported at a time.

      If the MaxCompute table has a large number of columns and the amount of data in each row is large, the amount of data imported at a time is large. This causes excessive memory overheads. To fix this issue, you can add a statement for setting the hg_experimental_query_batch_size parameter before the INSERT statement.

      set hg_experimental_query_batch_size = 1024;  -- Default value: 8192.
      insert into holo_table select * from mc_table;
    3. Reduce the concurrency of data import.

      You can also reduce memory overheads during data import by reducing the data import concurrency. The concurrency is specified by the hg_foreign_table_executor_max_dop parameter. The default value of this parameter equals the number of CPU cores of the Hologres instance. You can configure a smaller value to reduce memory overheads during data import.

      set hg_foreign_table_executor_max_dop = 8;
      insert into holo_table select * from mc_table;
    4. Check whether the foreign table has a large amount of duplicate data.

      If you execute the insert on conflict statement to import data and the error persists after you perform the preceding steps, check whether the foreign table has a large amount of duplicate data. If the foreign table has a large amount of duplicate data, performance of data import is affected. We recommend that you remove the duplicate data from the MaxCompute table before you import data from the table. For more information, see Merge multiple rows of data into one row.

    5. Upgrade your Hologres instance to dynamically adjust the memory size.

      Hologres V1.1.24 or later can dynamically adjust the memory size allocated to a task. Hologres detects the overall memory usage in real time. When Hologres detects idle memory resources, it allocates the resources to the data import task. To use this feature, upgrade your Hologres instance. For more information, see Instance upgrades.

    6. Scale out your Hologres instance.

      If the error persists after you perform the preceding steps, scale out your Hologres instance. For more information, see Instances.

What do I do if the error message "Timestamp overflow detected while converting timestampfrom orc VectorBatch to arrow" is reported?

  • Cause:

    You have written data of the TIMESTAMP type to MaxCompute tables by using a streaming tunnel. After the data is written, the data is accurate to the nanosecond. However, Hologres does not support data of the TIMESTAMP type that is accurate to the nanosecond.

  • Solution:

    • Convert data of the TIMESTAMP type in the MaxCompute table into data of the DATETIME type.

    • Upgrade your Hologres instance to V1.1.70 or later.

What do I do if the error message "You have NO privilege 'MaxCompute:Select' on xxx" is reported?

  • Cause:

    The current user is not granted the SELECT permission on the data in specific MaxCompute tables.

  • Solution:

    Contact the MaxCompute administrator to grant the current user the SELECT permission on the data in the MaxCompute tables. For more information, see MaxCompute permissions.

What do I do if the error message "The sensitive label of column 'xxx' is 2, but your effective label is 0" is reported?

  • Cause:

    The current user is granted the permissions to query only part of fields in MaxCompute tables.

  • Solution:

    • Check whether the current account is granted the permissions to query the specified fields. If not, apply for the permissions to query the fields in MaxCompute tables, or query only fields that are permitted.

    • If the current account is granted the permissions to query the specified fields, the error may be brought by bugs in Hologres instances of earlier versions. In this case, add the following statements before the query statements to fix the bugs:

      set hg_experimental_enable_MaxCompute_executor=on;  
      set hg_experimental_enable_query_master=on;

      For more information about errors associated with MaxCompute permissions, see Permissions on MaxCompute.

What do I do if the error message "query next from foreign table executor failed validate userinfao" is reported?

  • Cause:

    Required permissions are not granted to the AliyunHologresEncryptionDefaultRole role. This issue may also occasionally occur due to the cache if required permissions are granted for less than 3 hours.

  • Solution:

    Attach the AliyunHologresEncryptionDefaultRolePolicy policy to the role. For more information, see Query MaxCompute data encrypted based on BYOK.

What do I do if the data query by using a foreign table is slow?

We recommend that you optimize the SQL statements. For more information, see Optimize the performance of querying MaxCompute tables in Hologres.