All Products
Search
Document Center

MaxCompute:Custom storage handler

Last Updated:Mar 26, 2026

When your data in OSS uses a format that MaxCompute's built-in handlers do not support — such as a proprietary binary format, a custom delimiter, or a domain-specific encoding — a custom storage handler lets you implement the read and write logic yourself. You implement the handler as a MaxCompute user-defined function (UDF) in Java, package it as a JAR, and reference it when creating the external table. MaxCompute then delegates all data reading and writing to your handler.

Usage notes

ConstraintDetail
cluster propertyNot supported on OSS external tables.
Single file sizeCannot exceed 2 GB. Split files larger than 2 GB before referencing them.
Data shardingDisabled by default to avoid correctness issues. Enable it only if your storage handler explicitly supports sharding.

To enable sharding and start multiple mappers, run:

SET odps.sql.unstructured.data.single.file.split.enabled=true;

Create an external table

Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name>
(
  col_name data_type,
  ...
)
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type, ...)]
STORED BY '<storage_handler_class>'
WITH SERDEPROPERTIES (
  ['property_name'='property_value', ...]
)
LOCATION 'oss://<endpoint>/<bucket>/<path>/'
USING '<jar_name>';

For common parameters such as column definitions and partition syntax, see Basic syntax parameters.

Parameters

ParameterRequiredDescription
storage_handler_classYesFully qualified class name of your custom storage handler, implemented as a MaxCompute UDF. See Develop UDFs.
jar_nameYesJAR package containing the storage handler code. Add the JAR as a resource to your MaxCompute project before creating the table. See Resource operations.
resource_nameNoAdditional JAR packages required when your handler uses a custom SerDe class. Add each JAR as a separate resource to the MaxCompute project. See Resource operations.

Write data

For the data write syntax, see Write data to OSS.

Query and analysis

For SELECT syntax, see Read OSS data. To optimize query plans, see Query optimization.

Example: Create an OSS external table using a custom storage handler

This example maps a MaxCompute external table to the SampleData/ directory in OSS using a text-based custom storage handler. The handler reads pipe-delimited (|) rows from OSS objects and maps them to typed columns.

Prerequisites

Before you begin, ensure that you have:

Procedure

  1. Use MaxCompute Studio to create the following four Java classes. The links point to reference implementations in the Alibaba Cloud Java SDK:

    For Java UDF development guidance, see Develop a UDF.

  2. Use the one-click packaging feature of MaxCompute Studio to package TextStorageHandler.java and upload it as a MaxCompute resource. This example assumes the resource is named javatest-1.0-SNAPSHOT.jar. For packaging and upload steps, see Package, upload, and register.

    If your storage handler has multiple dependencies, package each one separately and upload them as individual MaxCompute resources.
  3. Create the external table:

    CREATE EXTERNAL TABLE ambulance_data_txt_external
    (
      vehicleId INT,
      recordId INT,
      patientId INT,
      calls INT,
      locationLatitute DOUBLE,
      locationLongtitue DOUBLE,
      recordTime STRING,
      direction STRING
    )
    STORED BY 'com.aliyun.odps.udf.example.text.TextStorageHandler'
    WITH SERDEPROPERTIES (
      'delimiter'='|',
      'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
    )
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/SampleData/'
    USING 'javatest-1.0-SNAPSHOT.jar';

    The delimiter property defines the column separator used in each row of OSS objects. Any valid string is accepted.

    To verify the table structure after creation, run:

    DESC EXTENDED ambulance_data_txt_external;
  4. Read data from the table:

    SELECT recordId, patientId, direction FROM ambulance_data_txt_external WHERE patientId > 25;

    Expected output:

    +----------+-----------+-----------+
    | recordid | patientid | direction |
    +----------+-----------+-----------+
    | 1        | 51        | S         |
    | 3        | 48        | NE        |
    | 4        | 30        | W         |
    | 5        | 47        | S         |
    | 7        | 53        | N         |
    | 8        | 63        | SW        |
    | 10       | 31        | N         |
    +----------+-----------+-----------+
  5. Write data to the table:

    INSERT INTO ambulance_data_txt_external VALUES (1,16,76,1,'46.81006','-92.08174','9/14/2014 0:10','SW');

    To verify the write, query the row you inserted or check the OSS directory for a new file:

    SELECT * FROM ambulance_data_txt_external WHERE recordId='16';

FAQ

Why does error ODPS-0123131 occur when reading a DATETIME field using a custom Extractor?

The root cause is java.sql.Date.valueOf(), which only accepts strings in "yyyy-[m]m-[d]d" format. When the field value includes a time component (for example, 2019-11-11 06:43:36), the call fails with IllegalArgumentException.

Fix this by adding the Joda-Time library and using DateTimeFormat.forPattern() to parse the value:

  1. Add the Joda-Time dependency to your project:

    <dependency>
      <groupId>joda-time</groupId>
      <artifactId>joda-time</artifactId>
      <version>2.10</version>
    </dependency>
  2. Import the required classes:

    import org.joda.time.DateTime;
    import org.joda.time.format.DateTimeFormat;
  3. Replace the Date.valueOf() call with a Joda-Time parse:

    record.setDate(index, new Date(DateTime.parse(parts[i], DateTimeFormat.forPattern("yyyy-MM-dd HH:mi:ss")).getMillis()));
  4. Upload the updated Extractor JAR and the Joda-Time JAR as MaxCompute resources:

    ADD JAR /path/to/text_extractor-1.0-SNAPSHOT.jar;
    ADD JAR /path/to/joda-time-2.10.jar;
  5. Upload a test file to OSS. Sample row format:

    5c661071dba64d5080c91da085ff1073^music-click-fast_forward^26.12.XX.XX^2019-11-11 06:43:36
  6. Query the external table to confirm successful parsing:

    SELECT * FROM <project_name>.video_play_log;

    Expected output:

    +----------------------------------+--------------------------+-------------+---------------------+
    | uuid                             | action                   | ip          | time                |
    +----------------------------------+--------------------------+-------------+---------------------+
    | 5c661071dba64d5080c91da085ff1073 | music-click-fast_forward | 26.12.XX.XX | 2019-11-11 06:43:36 |
    +----------------------------------+--------------------------+-------------+---------------------+