All Products
Search
Document Center

MaxCompute:Hudi external tables

Last Updated:Aug 14, 2025

This topic describes how to create and read data from Hudi-formatted external tables in Object Storage Service (OSS).

Prerequisites

  • The Alibaba Cloud account or RAM user is granted permissions to access OSS. Alibaba Cloud accounts, RAM users, and RAM roles can access OSS external tables. For more information about authorization, see STS authorization.

  • A MaxCompute project is created.

    MaxCompute is deployed only in specific regions. To prevent a cross-region data connectivity issue, we recommend that you use a bucket in the same region as your MaxCompute project.
  • The Alibaba Cloud account or RAM user is granted the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.

  • (Optional) An OSS bucket, OSS directories, and OSS data files are prepared. For more information, see Create buckets, Manage directories, and Simple upload.

    MaxCompute can automatically create an OSS directory in OSS. For SQL statements that include external tables and UDFs, you can execute operations to read and write external tables and UDFs with a single SQL statement. You can also manually create an OSS directory.

Limits

  • OSS external tables do not support the clustering.

  • A single file cannot exceed 3 GB. If one does, we recommend that you split it into multiple files.

Notes

  • Hudi external tables can read data only from the files that are mapped to the table. They do not support automatic hiding of system columns, incremental reads, snapshot reads, or write operations. You can use MaxCompute Delta tables or Paimon external tables to implement read and write operations with atomicity, consistency, isolation, and durability (ACID).

  • The Hudi software development kit (SDK) version that is integrated into MaxCompute by default is org.apache.hudi:hudi-hadoop-mr-bundle:0.12.2-emr-1.0.6. MaxCompute cannot guarantee forward or backward compatibility for the Hudi SDK. The open source community is responsible for ensuring compatibility.

  • MaxCompute supports only the external table features available in the previously mentioned Hudi SDK version. MaxCompute will not provide version updates or feature enhancements. Use Paimon external tables to read data in data lake formats.

Supported data types

Note

In the following tables, 已开通 indicates supported and 未开通 indicates not supported.

For more information about MaxCompute data types, see Data types (Edition 1.0) and Data types (Edition 2.0).

Data type

Supported

Data type

Supported

TINYINT

已开通

STRING

未开通

SMALLINT

已开通

DATE

未开通

INT

已开通

DATETIME

未开通

BIGINT

已开通

TIMESTAMP

未开通

BINARY

已开通

TIMESTAMP_NTZ

未开通

FLOAT

未开通

BOOLEAN

已开通

DOUBLE

未开通

ARRAY

已开通

DECIMAL(precision,scale)

已开通

MAP

未开通

VARCHAR(n)

未开通

STRUCT

未开通

CHAR(n)

未开通

JSON

未开通

Create an external table

Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
   <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS 
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss_location';

Parameters

For more information, see Basic syntax parameters.

Query analysis

Example

This example shows how to create an external table for Hudi-formatted data using the built-in open source data parser and then query the data.

  1. Prepare the data.

    Log on to the OSS console and upload the test data file in Hudi format to the oss-mc-test/Demo_hudi_pt/dt=20250612/ directory in an OSS bucket. For more information, see Upload files.

  2. Create a Delta Lake external table.

    CREATE EXTERNAL TABLE vehicle_hudi_pt (
      _hoodie_commit_time  string,
      _hoodie_commit_seqno string,
      _hoodie_record_key string,
      _hoodie_partition_path string,
      _hoodie_file_name STRING ,
      id STRING ,
      name STRING 
    )
    PARTITIONED BY (ds STRING )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS
    INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_hudi_pt/';
  3. Import partition data. If the OSS external table is a partitioned table, you must import the partition data. For more information, see Syntax for adding partition data to OSS external tables.

    MSCK REPAIR TABLE vehicle_hudi_pt ADD PARTITIONS;
  4. Read data from the Hudi external table.

    SELECT * FROM vehicle_hudi_pt WHERE ds='20250612';