All Products
Search
Document Center

:FAQ about lakehouse and external tables

Last Updated:Mar 27, 2026

This topic answers frequently asked questions about external tables.

Category

FAQ

Object Storage Service (OSS) external table

Hologres external table

How do I resolve the ODPS-0130071 error when I directly read Hologres data from MaxCompute?

Performance issues

How do I troubleshoot slow SQL jobs on external tables?

ODPS-0123131 error: Reading DATETIME types with custom extractors

  • Issue

    When you use a custom extractor to read unstructured data, the following error is reported if a data field is of the DATETIME type, such as 2019-11-11 06:43:36.

    FAILED: ODPS-0123131:User defined function exception - Traceback:
    java.lang.IllegalArgumentException
        at java.sql.Date.valueOf(Date.java:143)
        at com.aliyun.odps.udf.example.text.TextExtractor.textLineToRecord(TextExtractor.java:194)
        at com.aliyun.odps.udf.example.text.TextExtractor.extract(TextExtractor.java:153)
        at com.aliyun.odps.udf.ExtractorHandler.extract(ExtractorHandler.java:120)       
  • Cause

    In the code Date.valueOf(parts[i]), the java.sql.Date.valueOf() function only supports STRING parameters in the "yyyy-[m]m-[d]d" format and does not support DATETIME parameters.

  • Resolution

    1. Add the Joda-Time dependency to your project and import the required classes into your code.

      -- Dependency.
      <dependency>
        <groupId>joda-time</groupId>
        <artifactId>joda-time</artifactId>
        <version>2.10</version>
      </dependency> 
      -- Import information.
      import org.joda.time.DateTime;
      import org.joda.time.format.DateTimeFormat;                           
    2. Use the DateTimeFormat.forPattern() function to convert a value of the DATETIME type to the STRING type for reading.

      record.setDate(index, new Date(DateTime.parse(parts[i], DateTimeFormat.forPattern("yyyy-MM-dd HH:mi:ss")).getMillis()));                           

    Example:

    1. Use the MaxCompute client to upload the JAR file for your extractor project.

      add jar /Users/gary/big_data/odps/text_extractor/target/text_extractor-1.0-SNAPSHOT.jar      

      /Users/gary/big_data/odps/text_extractor/target/text_extractor-1.0-SNAPSHOT.jar is the local storage path for the generated JAR file.

    2. Use the MaxCompute client to upload the third-party Joda-Time JAR file.

      add jar /Users/gary/.m2/repository/joda-time/joda-time/2.10/joda-time-2.10.jar                         

      /Users/gary/.m2/repository/joda-time/joda-time/2.10/joda-time-2.10.jar is the local path of the third-party Joda-Time JAR file.

    3. Upload test data to the specified directory in OSS. For a file named video_play_log.txt, the sample data is as follows.

      5c661071dba64d5080c91da085ff1073^Music-Click-Fast forward^26.12.XX.XX^2019-11-11 06:43:36                           
    4. Read data from the external table.

      select * from <project_name>.video_play_log;
      +------+-------+---+----------------+
      | uuid  | action  | ip  | time      |
      +------+-------+---+----------------+
      | 5c661071dba64d5080c91da085ff1073 | Music-Click-Fast forward | 26.12.XX.XX | 2019-11-11 06:43:36 |
      +------+-------+---+----------------+     

Memory overflow with UDFs on OSS external tables

  • Issue

    When you use a UDF to access an OSS external table, a memory overflow error occurs, even though the UDF passed local tests.

    FAILED: ODPS-0123131:User defined function exception - Traceback:
    java.lang.OutOfMemoryError: Java heap space        

    After you set the following parameters, the runtime increases, but the error persists.

    set odps.stage.mapper.mem = 2048; 
    set odps.stage.mapper.jvm.mem = 4096;       
  • Cause

    The external table contains too many object files, which consume a large amount of memory. In addition, no partitions are configured.

  • Resolution

    • Query a smaller amount of data.

    • Use partitions to reduce the amount of data scanned in each query.

Error: Inline data exceeds maximum allowed size

  • Issue

    The error Inline data exceeds the maximum allowed size occurs when you process data in Object Storage Service (OSS).

  • Cause

    When MaxCompute writes data to an OSS external table, it may attempt to create a single file that exceeds an operational size limit.

  • Resolution

    Adjust the following properties to control the execution plan. This ensures that individual files written to the OSS external table do not exceed the size limit.

    -- Adjust the amount of data read by each mapper. Unit: MB.
    SET odps.sql.mapper.split.size=256; 
    -- Adjust the number of workers in the reduce phase.
    SET odps.stage.reducer.num=100; 

Read JSON data by using OSS external tables

For instructions, see Read JSON data by using an OSS external table.

Merge small files with OSS external tables

Use the Logview log to check whether the last operator in the SQL execution plan is a Reducer or a Joiner. If it is a Reducer, execute the statement set odps.stage.reducer.num=1;. If it is a Joiner, execute the statement set odps.stage.joiner.num=1;.

ODPS-0130071 error: Reading Hologres data from MaxCompute

  • Issue

    When you directly read data from Hologres, the error "ODPS-0130071 Failed to split to equal size...max count: 7777" is reported. Example:

    ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: storage/table/src/input_splits_builder.cpp(195): StorageException: Failed to split to equal size, total size: 2143570729934, min size: 268435456, max size: 272629760, max count: 7777, split size: 275629513, split count: 7777
  • Cause

    This error occurs because MaxCompute's default mapper splitting strategy (input data volume / split.size = 256 MB) causes the number of concurrent mappers to exceed the maximum limit of 7,777.

    Note

    This limit prevents users from submitting jobs that generate a large number of mappers, which could affect the stability of Hologres or the network connection.

  • Resolution

    You can set the following parameters to resolve the error.

    -- Increase the mapper concurrency limit. The maximum value is 10,000.
    SET odps.external.holo.mapper.instances=10000; 
    -- Adjust task concurrency by increasing the split size. The maximum value is 512 MB.
    SET odps.sql.mapper.split.size=512; 

Slow SQL jobs on external tables

SQL jobs on external tables can be slow for several common reasons:

  • Slow reading of GZ compressed files from an OSS external table

    • Issue

      Reading a large, compressed file (such as a 200 GB GZ file) from an OSS external table is slow.

    • Cause

      The job is slow because compressed formats like GZ are not splittable, which forces MaxCompute to use only a single mapper.

    • Resolution

      • To improve performance, split the large compressed file into multiple smaller files in OSS. This allows MaxCompute to use multiple mappers in parallel, which increases read throughput.

        -- Sets the data split size for each mapper in MB.
        SET odps.sql.mapper.split.size=256;       
      • For unstructured data, check if the path for the Object Storage Service (OSS) external table contains only one file. If there is only one file, only one Mapper is generated because compressed unstructured data cannot be split. This slows down processing. Split the large OSS file into smaller files in the path of the OSS external table. This increases the number of Mappers and improves the read speed.

  • Slow data query in a MaxCompute external table by using an SDK

    • Issue

      Querying data from a MaxCompute external table by using an SDK is slow.

    • Resolution

      External tables support only full table scans, which can be slow. We recommend that you use MaxCompute internal tables instead.

  • Slow data query from a Tablestore external table

    • Issue

      A query on a Tablestore external table is significantly slower than an equivalent query on a MaxCompute internal table containing the same data.

    • Resolution

      This performance difference is expected because querying an external table involves remote data reads. For better performance, import the data into a MaxCompute internal table first and then run your queries.

"Couldn't connect to server" error with OSS external tables

  • Issue

    When you read data from an OSS external table, the following error is reported: ODPS-0123131:User defined function exception - common/io/oss/oss_client.cpp(95): OSSRequestException: req_id: , http status code: -998, error code: HttpIoError, message: Couldn't connect to server.

  • Causes

    • Reason 1: When you create an OSS external table, a public endpoint is used for the oss_endpoint in the oss_location instead of an internal endpoint.

    • Cause 2: When you create an OSS external table, the oss_endpoint specified in the oss_location path belongs to a different region.

  • Resolution

    • For Cause 1:

      You need to check the oss_location parameter in the CREATE TABLE statement for the OSS external table. If the oss_endpoint is a public endpoint, change it to an internal endpoint. For more information about the parameters, see Parameter description.

      For example, if you are in the Indonesia (Jakarta) region and use the oss://oss-ap-southeast-5.aliyuncs.com/<bucket>/.... endpoint to create an external table, you should change it to the corresponding internal endpoint: oss://oss-ap-southeast-5-internal.aliyuncs.com/<bucket>/.....

    • For Cause 2:

      You need to verify that the oss_endpoint in the oss_location parameter of the CREATE TABLE statement for the OSS external table is the endpoint of the region that you want to access. For more information about OSS classic network domain names, see Regions and Endpoints.

"Network is unreachable (connect failed)" error during table creation

  • Issue

    When you create an external table for Object Storage Service (OSS), the following error is reported:ODPS-0130071:[1,1] Semantic analysis exception - external table checking failure, error message: Cannot connect to the endpoint 'oss-cn-beijing.aliyuncs.com': Connect to bucket.oss-cn-beijing.aliyuncs.com:80 [bucket.oss-cn-beijing.aliyuncs.com] failed: Network is unreachable (connect failed).

  • Cause

    When creating an OSS external table, the oss_endpoint in the oss_location address uses a public endpoint, not an internal endpoint.

  • Resolution

    You need to check the CREATE TABLE statement for the OSS external table to verify that the oss_endpoint specified in the oss_location parameter is an internal endpoint. If it is a public endpoint, you need to change it to an internal endpoint. For more information about the parameters, see Parameter description.

    For example, in the China (Beijing) region, if you use the oss://oss-cn-beijing.aliyuncs.com/<bucket>/.... endpoint to create an external table, you should change it to the corresponding internal endpoint oss://oss-cn-beijing-internal.aliyuncs.com/<bucket>/.....