When your data in OSS uses a format that MaxCompute's built-in handlers do not support — such as a proprietary binary format, a custom delimiter, or a domain-specific encoding — a custom storage handler lets you implement the read and write logic yourself. You implement the handler as a MaxCompute user-defined function (UDF) in Java, package it as a JAR, and reference it when creating the external table. MaxCompute then delegates all data reading and writing to your handler.
Usage notes
| Constraint | Detail |
|---|---|
cluster property | Not supported on OSS external tables. |
| Single file size | Cannot exceed 2 GB. Split files larger than 2 GB before referencing them. |
| Data sharding | Disabled by default to avoid correctness issues. Enable it only if your storage handler explicitly supports sharding. |
To enable sharding and start multiple mappers, run:
SET odps.sql.unstructured.data.single.file.split.enabled=true;Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name>
(
col_name data_type,
...
)
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type, ...)]
STORED BY '<storage_handler_class>'
WITH SERDEPROPERTIES (
['property_name'='property_value', ...]
)
LOCATION 'oss://<endpoint>/<bucket>/<path>/'
USING '<jar_name>';For common parameters such as column definitions and partition syntax, see Basic syntax parameters.
Parameters
| Parameter | Required | Description |
|---|---|---|
storage_handler_class | Yes | Fully qualified class name of your custom storage handler, implemented as a MaxCompute UDF. See Develop UDFs. |
jar_name | Yes | JAR package containing the storage handler code. Add the JAR as a resource to your MaxCompute project before creating the table. See Resource operations. |
resource_name | No | Additional JAR packages required when your handler uses a custom SerDe class. Add each JAR as a separate resource to the MaxCompute project. See Resource operations. |
Write data
For the data write syntax, see Write data to OSS.
Query and analysis
For SELECT syntax, see Read OSS data. To optimize query plans, see Query optimization.
Example: Create an OSS external table using a custom storage handler
This example maps a MaxCompute external table to the SampleData/ directory in OSS using a text-based custom storage handler. The handler reads pipe-delimited (|) rows from OSS objects and maps them to typed columns.
Prerequisites
Before you begin, ensure that you have:
An OSS bucket and folder in the same region as your MaxCompute project. See Create a bucket and Manage folders. MaxCompute can create OSS folders automatically when an SQL statement runs — manual creation is optional
OSS access permission granted via an Alibaba Cloud account, a Resource Access Management (RAM) user, or a RAM role. See Authorize access in STS mode for OSS
The
CreateTablepermission in the MaxCompute project. See MaxCompute permissions
Procedure
Use MaxCompute Studio to create the following four Java classes. The links point to reference implementations in the Alibaba Cloud Java SDK:
For Java UDF development guidance, see Develop a UDF.
Use the one-click packaging feature of MaxCompute Studio to package
TextStorageHandler.javaand upload it as a MaxCompute resource. This example assumes the resource is namedjavatest-1.0-SNAPSHOT.jar. For packaging and upload steps, see Package, upload, and register.If your storage handler has multiple dependencies, package each one separately and upload them as individual MaxCompute resources.
Create the external table:
CREATE EXTERNAL TABLE ambulance_data_txt_external ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) STORED BY 'com.aliyun.odps.udf.example.text.TextStorageHandler' WITH SERDEPROPERTIES ( 'delimiter'='|', 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/' USING 'javatest-1.0-SNAPSHOT.jar';The
delimiterproperty defines the column separator used in each row of OSS objects. Any valid string is accepted.To verify the table structure after creation, run:
DESC EXTENDED ambulance_data_txt_external;Read data from the table:
SELECT recordId, patientId, direction FROM ambulance_data_txt_external WHERE patientId > 25;Expected output:
+----------+-----------+-----------+ | recordid | patientid | direction | +----------+-----------+-----------+ | 1 | 51 | S | | 3 | 48 | NE | | 4 | 30 | W | | 5 | 47 | S | | 7 | 53 | N | | 8 | 63 | SW | | 10 | 31 | N | +----------+-----------+-----------+Write data to the table:
INSERT INTO ambulance_data_txt_external VALUES (1,16,76,1,'46.81006','-92.08174','9/14/2014 0:10','SW');To verify the write, query the row you inserted or check the OSS directory for a new file:
SELECT * FROM ambulance_data_txt_external WHERE recordId='16';
FAQ
Why does error ODPS-0123131 occur when reading a DATETIME field using a custom Extractor?
The root cause is java.sql.Date.valueOf(), which only accepts strings in "yyyy-[m]m-[d]d" format. When the field value includes a time component (for example, 2019-11-11 06:43:36), the call fails with IllegalArgumentException.
Fix this by adding the Joda-Time library and using DateTimeFormat.forPattern() to parse the value:
Add the Joda-Time dependency to your project:
<dependency> <groupId>joda-time</groupId> <artifactId>joda-time</artifactId> <version>2.10</version> </dependency>Import the required classes:
import org.joda.time.DateTime; import org.joda.time.format.DateTimeFormat;Replace the
Date.valueOf()call with a Joda-Time parse:record.setDate(index, new Date(DateTime.parse(parts[i], DateTimeFormat.forPattern("yyyy-MM-dd HH:mi:ss")).getMillis()));Upload the updated Extractor JAR and the Joda-Time JAR as MaxCompute resources:
ADD JAR /path/to/text_extractor-1.0-SNAPSHOT.jar; ADD JAR /path/to/joda-time-2.10.jar;Upload a test file to OSS. Sample row format:
5c661071dba64d5080c91da085ff1073^music-click-fast_forward^26.12.XX.XX^2019-11-11 06:43:36Query the external table to confirm successful parsing:
SELECT * FROM <project_name>.video_play_log;Expected output:
+----------------------------------+--------------------------+-------------+---------------------+ | uuid | action | ip | time | +----------------------------------+--------------------------+-------------+---------------------+ | 5c661071dba64d5080c91da085ff1073 | music-click-fast_forward | 26.12.XX.XX | 2019-11-11 06:43:36 | +----------------------------------+--------------------------+-------------+---------------------+