This topic describes how to use a custom storage handler to create an OSS external table and read data from and write data into the table.
Prerequisites
The Alibaba Cloud account or RAM user is granted permissions to access OSS. Alibaba Cloud accounts, RAM users, and RAM roles can access OSS external tables. For more information about authorization, see STS authorization.
A MaxCompute project is created.
MaxCompute is deployed only in specific regions. To prevent a cross-region data connectivity issue, we recommend that you use a bucket in the same region as your MaxCompute project.
The Alibaba Cloud account or RAM user is granted the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.
(Optional) An OSS bucket, OSS directories, and OSS data files are prepared. For more information, see Create a bucket, Manage directories, and Simple upload.
MaxCompute can automatically create an OSS directory in OSS. For SQL statements that include external tables and UDFs, you can execute operations to read and write external tables and UDFs with a single SQL statement. You can also manually create an OSS directory.
Limits
OSS external tables do not support the clustering.
A single file cannot exceed 3 GB. If one does, we recommend that you split it into multiple files.
Precautions
To avoid correctness issues, a custom storage handler does not shard data by default. If you confirm that your storage handler can handle sharding, you can enable the data sharding feature to start multiple mappers by running the following command:
SET odps.sql.unstructured.data.single.file.split.enabled=true;Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
col_name date_type,
...
)
[comment table_comment]
[partitioned BY (col_name data_type, ...)]
stored BY '<your_defined_storagehandler>'
WITH serdeproperties (
['property_name'='property_value',...]
)
location 'oss_location'
USING 'jar_name';For more information about how to create OSS external tables in other formats, see OSS external tables.
Common parameters
For more information about common parameters, see Basic syntax parameters.
Exclusive parameters
Parameter | Required | Description |
your_defined_storagehandler | Yes | A custom storage handler that is written by using MaxCompute UDFs. For more information about how to write a MaxCompute UDF, see Develop a UDF. |
jar_name | Yes | The JAR package that contains the code of the custom storage handler. The JAR package must be added as a resource to the MaxCompute project. For more information about how to add resources, see Resource operations. |
resource_name | No | When you use a custom serde class, you must specify the dependency resource. The resource contains the custom serde class. The JAR package that contains the serde class must be added as a resource to the MaxCompute project. For more information about how to add resources, see Resource operations. |
Write data
For more information about the data write syntax of MaxCompute, see Write data to OSS.
Query data
For more information about the SELECT syntax, see Read OSS data.
For more information about how to optimize query plans, see Query optimization.
Example: Create an OSS external table by using a custom storage handler
Establish a mapping with the SampleData/ directory (custom storage handler) in Appendix: Prepare sample data. The procedure is as follows:
Use MaxCompute Studio to create the following four Java classes: TextExtractor.java, TextExtractor.java, SplitReader.java, and TextStorageHandler.java. For more information about how to develop Java programs, see Develop a UDF.
Use the one-click packaging feature of MaxCompute Studio to package TextStorageHandler.java and upload it as a MaxCompute resource.
Assume that the resource name is
javatest-1.0-SNAPSHOT.jar. For more information about packaging and uploading, see Package, upload, and register.NoteIf multiple dependencies are required, package them separately and upload them as MaxCompute resources.
Execute the following statement to create an OSS external table:
CREATE EXTERNAL TABLE ambulance_data_txt_external ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) stored BY 'com.aliyun.odps.udf.example.text.TextStorageHandler' WITH serdeproperties ( 'delimiter'='|', 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) location 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/' USING 'javatest-1.0-SNAPSHOT.jar'; -- You can execute the desc extended ambulance_data_txt_external; statement to view the structure of the created external table.NoteThe delimiter parameter is a user-defined delimiter that is used to separate column values in each row of an OSS object. You can use any valid string as a delimiter.
Read data from OSS. Sample statement:
SELECT recordId, patientId, direction FROM ambulance_data_txt_external WHERE patientId > 25;The following result is returned:
+----------+-----------+-----------+ | recordid | patientid | direction | +----------+-----------+-----------+ | 1 | 51 | S | | 3 | 48 | NE | | 4 | 30 | W | | 5 | 47 | S | | 7 | 53 | N | | 8 | 63 | SW | | 10 | 31 | N | +----------+-----------+-----------+Write data to the OSS external table.
INSERT INTO ambulance_data_txt_external VALUES (1,16,76,1,'46.81006','-92.08174','9/14/2014 0:10','SW'); -- Query the data again to check whether the data is written. You can also check whether a new file is generated in the OSS directory. SELECT * FROM ambulance_data_txt_external WHERE recordId='16';