This topic answers frequently asked questions about external tables.
ODPS-0123131 error: Reading DATETIME types with custom extractors
Issue
When you use a custom extractor to read unstructured data, the following error is reported if a data field is of the DATETIME type, such as
2019-11-11 06:43:36.FAILED: ODPS-0123131:User defined function exception - Traceback: java.lang.IllegalArgumentException at java.sql.Date.valueOf(Date.java:143) at com.aliyun.odps.udf.example.text.TextExtractor.textLineToRecord(TextExtractor.java:194) at com.aliyun.odps.udf.example.text.TextExtractor.extract(TextExtractor.java:153) at com.aliyun.odps.udf.ExtractorHandler.extract(ExtractorHandler.java:120)Cause
In the code
Date.valueOf(parts[i]), thejava.sql.Date.valueOf()function only supports STRING parameters in the"yyyy-[m]m-[d]d"format and does not support DATETIME parameters.Resolution
Add the Joda-Time dependency to your project and import the required classes into your code.
-- Dependency. <dependency> <groupId>joda-time</groupId> <artifactId>joda-time</artifactId> <version>2.10</version> </dependency> -- Import information. import org.joda.time.DateTime; import org.joda.time.format.DateTimeFormat;Use the
DateTimeFormat.forPattern()function to convert a value of the DATETIME type to the STRING type for reading.record.setDate(index, new Date(DateTime.parse(parts[i], DateTimeFormat.forPattern("yyyy-MM-dd HH:mi:ss")).getMillis()));
Example:
Use the MaxCompute client to upload the JAR file for your extractor project.
add jar /Users/gary/big_data/odps/text_extractor/target/text_extractor-1.0-SNAPSHOT.jar/Users/gary/big_data/odps/text_extractor/target/text_extractor-1.0-SNAPSHOT.jaris the local storage path for the generated JAR file.Use the MaxCompute client to upload the third-party Joda-Time JAR file.
add jar /Users/gary/.m2/repository/joda-time/joda-time/2.10/joda-time-2.10.jar/Users/gary/.m2/repository/joda-time/joda-time/2.10/joda-time-2.10.jaris the local path of the third-party Joda-Time JAR file.Upload test data to the specified directory in OSS. For a file named
video_play_log.txt, the sample data is as follows.5c661071dba64d5080c91da085ff1073^Music-Click-Fast forward^26.12.XX.XX^2019-11-11 06:43:36Read data from the external table.
select * from <project_name>.video_play_log;+------+-------+---+----------------+ | uuid | action | ip | time | +------+-------+---+----------------+ | 5c661071dba64d5080c91da085ff1073 | Music-Click-Fast forward | 26.12.XX.XX | 2019-11-11 06:43:36 | +------+-------+---+----------------+
Memory overflow with UDFs on OSS external tables
Issue
When you use a UDF to access an OSS external table, a memory overflow error occurs, even though the UDF passed local tests.
FAILED: ODPS-0123131:User defined function exception - Traceback: java.lang.OutOfMemoryError: Java heap spaceAfter you set the following parameters, the runtime increases, but the error persists.
set odps.stage.mapper.mem = 2048; set odps.stage.mapper.jvm.mem = 4096;Cause
The external table contains too many object files, which consume a large amount of memory. In addition, no partitions are configured.
Resolution
Query a smaller amount of data.
Use partitions to reduce the amount of data scanned in each query.
Error: Inline data exceeds maximum allowed size
Issue
The error
Inline data exceeds the maximum allowed sizeoccurs when you process data in Object Storage Service (OSS).Cause
When MaxCompute writes data to an OSS external table, it may attempt to create a single file that exceeds an operational size limit.
Resolution
Adjust the following properties to control the execution plan. This ensures that individual files written to the OSS external table do not exceed the size limit.
-- Adjust the amount of data read by each mapper. Unit: MB. SET odps.sql.mapper.split.size=256; -- Adjust the number of workers in the reduce phase. SET odps.stage.reducer.num=100;
Read JSON data by using OSS external tables
For instructions, see Read JSON data by using an OSS external table.
Merge small files with OSS external tables
Use the Logview log to check whether the last operator in the SQL execution plan is a Reducer or a Joiner. If it is a Reducer, execute the statement set odps.stage.reducer.num=1;. If it is a Joiner, execute the statement set odps.stage.joiner.num=1;.
ODPS-0130071 error: Reading Hologres data from MaxCompute
Issue
When you directly read data from Hologres, the error "ODPS-0130071 Failed to split to equal size...max count: 7777" is reported. Example:
ODPS-0130071:[0,0] Semantic analysis exception - physical plan generation failed: storage/table/src/input_splits_builder.cpp(195): StorageException: Failed to split to equal size, total size: 2143570729934, min size: 268435456, max size: 272629760, max count: 7777, split size: 275629513, split count: 7777Cause
This error occurs because MaxCompute's default mapper splitting strategy (input data volume / split.size = 256 MB) causes the number of concurrent mappers to exceed the maximum limit of 7,777.
NoteThis limit prevents users from submitting jobs that generate a large number of mappers, which could affect the stability of Hologres or the network connection.
Resolution
You can set the following parameters to resolve the error.
-- Increase the mapper concurrency limit. The maximum value is 10,000. SET odps.external.holo.mapper.instances=10000; -- Adjust task concurrency by increasing the split size. The maximum value is 512 MB. SET odps.sql.mapper.split.size=512;
Slow SQL jobs on external tables
SQL jobs on external tables can be slow for several common reasons:
Slow reading of GZ compressed files from an OSS external table
Issue
Reading a large, compressed file (such as a 200 GB GZ file) from an OSS external table is slow.
Cause
The job is slow because compressed formats like GZ are not splittable, which forces MaxCompute to use only a single mapper.
Resolution
To improve performance, split the large compressed file into multiple smaller files in OSS. This allows MaxCompute to use multiple mappers in parallel, which increases read throughput.
-- Sets the data split size for each mapper in MB. SET odps.sql.mapper.split.size=256;For unstructured data, check if the path for the Object Storage Service (OSS) external table contains only one file. If there is only one file, only one Mapper is generated because compressed unstructured data cannot be split. This slows down processing. Split the large OSS file into smaller files in the path of the OSS external table. This increases the number of Mappers and improves the read speed.
Slow data query in a MaxCompute external table by using an SDK
Issue
Querying data from a MaxCompute external table by using an SDK is slow.
Resolution
External tables support only full table scans, which can be slow. We recommend that you use MaxCompute internal tables instead.
Slow data query from a Tablestore external table
Issue
A query on a Tablestore external table is significantly slower than an equivalent query on a MaxCompute internal table containing the same data.
Resolution
This performance difference is expected because querying an external table involves remote data reads. For better performance, import the data into a MaxCompute internal table first and then run your queries.
"Couldn't connect to server" error with OSS external tables
Issue
When you read data from an OSS external table, the following error is reported:
ODPS-0123131:User defined function exception - common/io/oss/oss_client.cpp(95): OSSRequestException: req_id: , http status code: -998, error code: HttpIoError, message: Couldn't connect to server.Causes
Reason 1: When you create an OSS external table, a public endpoint is used for the
oss_endpointin the oss_location instead of an internal endpoint.Cause 2: When you create an OSS external table, the
oss_endpointspecified in the oss_location path belongs to a different region.
Resolution
For Cause 1:
You need to check the
oss_locationparameter in theCREATE TABLEstatement for the OSS external table. If theoss_endpointis a public endpoint, change it to an internal endpoint. For more information about the parameters, see Parameter description.For example, if you are in the Indonesia (Jakarta) region and use the
oss://oss-ap-southeast-5.aliyuncs.com/<bucket>/....endpoint to create an external table, you should change it to the corresponding internal endpoint:oss://oss-ap-southeast-5-internal.aliyuncs.com/<bucket>/.....For Cause 2:
You need to verify that the
oss_endpointin theoss_locationparameter of the CREATE TABLE statement for the OSS external table is the endpoint of the region that you want to access. For more information about OSS classic network domain names, see Regions and Endpoints.
"Network is unreachable (connect failed)" error during table creation
Issue
When you create an external table for Object Storage Service (OSS), the following error is reported:
ODPS-0130071:[1,1] Semantic analysis exception - external table checking failure, error message: Cannot connect to the endpoint 'oss-cn-beijing.aliyuncs.com': Connect to bucket.oss-cn-beijing.aliyuncs.com:80 [bucket.oss-cn-beijing.aliyuncs.com] failed: Network is unreachable (connect failed).Cause
When creating an OSS external table, the
oss_endpointin the oss_location address uses a public endpoint, not an internal endpoint.Resolution
You need to check the
CREATE TABLEstatement for the OSS external table to verify that theoss_endpointspecified in theoss_locationparameter is an internal endpoint. If it is a public endpoint, you need to change it to an internal endpoint. For more information about the parameters, see Parameter description.For example, in the China (Beijing) region, if you use the
oss://oss-cn-beijing.aliyuncs.com/<bucket>/....endpoint to create an external table, you should change it to the corresponding internal endpointoss://oss-cn-beijing-internal.aliyuncs.com/<bucket>/.....