This topic describes how to create an OSS external table and use a custom storage handler to read data from and write data to the table.
Applicability
The cluster property is not supported for OSS external tables.
A single file cannot exceed 3 GB. If a file is larger than 3 GB, you must split it.
Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
col_name date_type,
...
)
[comment table_comment]
[partitioned BY (col_name data_type, ...)]
stored BY '<YOUR_DEFINED_STORAGEHANDLER>'
WITH serdeproperties (
['property_name'='property_value',...]
)
location 'oss_location'
USING 'jar_name';To avoid correctness issues, a custom storage handler does not shard data by default. If you confirm that your storage handler can handle sharding, you can enable the data sharding feature to start multiple mappers by running the following command:
SET odps.sql.unstructured.data.single.file.split.enabled=true;Common parameters
For information about common parameters, see Basic syntax parameters.
Exclusive parameters
Parameter | Required | Description |
your_defined_storagehandler | Yes | You can use a MaxCompute user-defined function (UDF) to write a custom resolver. For more information, see Develop UDFs. |
jar_name | Yes | The JAR package that contains the code of the custom storage handler. The JAR package must be added as a resource to the MaxCompute project. For more information about how to add resources, see Resource operations. |
resource_name | No | When you use a custom serde class, you must specify the dependency resource. The resource contains the custom serde class. The JAR package that contains the serde class must be added as a resource to the MaxCompute project. For more information about how to add resources, see Resource operations. |
Write data
For more information about the data write syntax of MaxCompute, see Write data to OSS.
Query and analysis
For more information about the SELECT syntax, see Read OSS data.
For more information about how to optimize query plans, see Query optimization.
Example: Create an OSS external table using a custom storage handler
Establish a mapping with the SampleData/ directory (custom storage handler) in Appendix: Prepare sample data. The procedure is as follows:
Prepare the data.
Log on to the OSS console and upload the test RCFILE format data file to the specified directory
oss-mc-test/Demo_rcfile+pt/dt=20250521/in your OSS bucket. For more information, see Upload files to OSS.Use MaxCompute Studio to create the following four Java classes: TextExtractor.java, TextOutputer.java, SplitReader.java, and TextStorageHandler.java. For more information about how to develop Java programs, see Develop a UDF.
Use the one-click packaging feature of MaxCompute Studio to package TextStorageHandler.java and upload it as a MaxCompute resource.
Assume that the resource name is
javatest-1.0-SNAPSHOT.jar. For more information about packaging and uploading, see Package, upload, and register.NoteIf multiple dependencies are required, package them separately and upload them as MaxCompute resources.
Execute the following statement to create an OSS external table:
CREATE EXTERNAL TABLE ambulance_data_txt_external ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) stored BY 'com.aliyun.odps.udf.example.text.TextStorageHandler' WITH serdeproperties ( 'delimiter'='|', 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/' USING 'javatest-1.0-SNAPSHOT.jar'; -- You can execute the desc extended ambulance_data_txt_external; statement to view the structure of the created external table.NoteThe delimiter parameter is a user-defined delimiter that is used to separate column values in each row of an OSS object. You can use any valid string as a delimiter.
Read data from OSS. Sample statement:
SELECT recordId, patientId, direction FROM ambulance_data_txt_external WHERE patientId > 25;The following result is returned:
+----------+-----------+-----------+ | recordid | patientid | direction | +----------+-----------+-----------+ | 1 | 51 | S | | 3 | 48 | NE | | 4 | 30 | W | | 5 | 47 | S | | 7 | 53 | N | | 8 | 63 | SW | | 10 | 31 | N | +----------+-----------+-----------+Write data to the OSS external table.
INSERT INTO ambulance_data_txt_external VALUES (1,16,76,1,'46.81006','-92.08174','9/14/2014 0:10','SW'); -- Query the data again to check whether the data is written. You can also check whether a new file is generated in the OSS directory. SELECT * FROM ambulance_data_txt_external WHERE recordId='16';
FAQ
Why is error ODPS-0123131 reported when I use a custom Extractor to read a DATETIME data field from unstructured data?
Symptom
When you use a custom Extractor to read unstructured data, the following error is returned if a data field is of the DATETIME type, such as 2019-11-11 06:43:36.
FAILED: ODPS-0123131:User defined function exception - Traceback: java.lang.IllegalArgumentException at java.sql.Date.valueOf(Date.java:143) at com.aliyun.odps.udf.example.text.TextExtractor.textLineToRecord(TextExtractor.java:194) at com.aliyun.odps.udf.example.text.TextExtractor.extract(TextExtractor.java:153) at com.aliyun.odps.udf.ExtractorHandler.extract(ExtractorHandler.java:120)Cause
The error occurs at
Date.valueOf(parts). Thejava.sql.Date.valueOf()function supports only string parameters in the"yyyy-[m]m-[d]d"format. The function does not support the DATETIME format.Solution
Add the Joda-Time dependency and import the required classes into your code.
-- Dependency. <dependency> <groupId>joda-time</groupId> <artifactId>joda-time</artifactId> <version>2.10</version> </dependency> -- Import information. import org.joda.time.DateTime; import org.joda.time.format.DateTimeFormat;Use the
DateTimeFormat.forPattern()function to convert the DATETIME value to a string when you read the data.record.setDate(index, new Date(DateTime.parse(parts[i], DateTimeFormat.forPattern("yyyy-MM-dd HH:mi:ss")).getMillis()));
The following is an example.
Use the MaxCompute client to upload the JAR package that is generated from the Extractor project.
add jar /Users/gary/big_data/odps/text_extractor/target/text_extractor-1.0-SNAPSHOT.jar/Users/gary/big_data/odps/text_extractor/target/text_extractor-1.0-SNAPSHOT.jaris the local path where the generated JAR package is stored.Use the MaxCompute client to upload the third-party Joda-Time JAR package.
add jar /Users/gary/.m2/repository/joda-time/joda-time/2.10/joda-time-2.10.jar/Users/gary/.m2/repository/joda-time/joda-time/2.10/joda-time-2.10.jaris the local path where the third-party Joda-Time JAR package is stored.Upload a test data file, such as
video_play_log.txt, to the specified directory in OSS. The file can contain the following sample data.5c661071dba64d5080c91da085ff1073^music-click-fast_forward^26.12.XX.XX^2019-11-11 06:43:36Read data from the external table.
select * from <project_name>.video_play_log;The following result is returned.
+------+-------+---+----------------+ | uuid | action | ip | time | +------+-------+---+----------------+ | 5c661071dba64d5080c91da085ff1073 | music-click-fast_forward | 26.12.XX.XX | 2019-11-11 06:43:36 | +------+-------+---+----------------+