All Products
Search
Document Center

MaxCompute:Hudi external tables

Last Updated:Jan 01, 2026

This topic describes how to create, read, and write data to OSS external tables in Hudi format.

Important

MaxCompute only supports the external table feature for the published Hudi software development kit (SDK) version. No further version updates or feature enhancements will be provided. Use Paimon external tables to read data in data lake table formats.

Applicability

  • The cluster property is not supported for OSS external tables.

  • A single file cannot exceed 3 GB. If a file is larger than 3 GB, you must split it.

  • MaxCompute and OSS must be deployed in the same region.

  • Hudi external tables only support reading all data from the mapped files. Automatic hiding of system columns, incremental reads, snapshot reads, or write operations are not supported. For read and write operations that require atomicity, consistency, isolation, and durability (ACID), use features such as MaxCompute Delta tables or Paimon external tables.

  • The default Hudi SDK version integrated into MaxCompute is org.apache.hudi:hudi-hadoop-mr-bundle:0.12.2-emr-1.0.6. MaxCompute does not guarantee forward or backward compatibility for the Hudi SDK. The open source community is responsible for compatibility.

Supported data types

For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).

Data type

Support

Data type

Support

TINYINT

Activated

STRING

Not Enabled

SMALLINT

Enabled

DATE

Not enabled

INT

Enabled

DATETIME

Not enabled

BIGINT

Enabled

TIMESTAMP

Not enabled

BINARY

Enabled

TIMESTAMP_NTZ

Not enabled

FLOAT

Not enabled

BOOLEAN

Activated

DOUBLE

Not activated

ARRAY

Enabled

DECIMAL(precision,scale)

Enabled

MAP

Not enabled

VARCHAR(n)

Not activated

STRUCT

Not enabled

CHAR(n)

Not enabled

JSON

Not Activated

Create an external table

Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
   <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS 
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss_location';

Parameters

For more information, see Basic syntax parameters.

Query and analysis

Example scenario

Create a Hudi-formatted external table using a built-in open source data parser, write data to OSS, and query the data.

  1. Prepare the data.

    Log on to the OSS console and upload the test RCFILE format data file to the specified directory oss-mc-test/Demo_rcfile+pt/dt=20250521/ in your OSS bucket. For more information, see Upload files to OSS.

  2. Upload test data

    Log on to the OSS console and upload the Hudi-formatted test data file to the oss-mc-test/Demo_hudi_pt/dt=20250612/ folder in the OSS bucket. For more information, see Upload files to OSS.

  3. You can create a Delta Lake external table.

    CREATE EXTERNAL TABLE vehicle_hudi_pt (
      _hoodie_commit_time  string,
      _hoodie_commit_seqno string,
      _hoodie_record_key string,
      _hoodie_partition_path string,
      _hoodie_file_name STRING ,
      id STRING ,
      name STRING 
    )
    PARTITIONED BY (ds STRING )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS
    INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_hudi_pt/';
  4. Import partition data. After you create a partitioned OSS external table, you must add its partitions. For more information, see Syntax for adding partition data to an OSS external table.

    MSCK REPAIR TABLE vehicle_hudi_pt ADD PARTITIONS;
  5. Read the Hudi external table

    SELECT * FROM vehicle_hudi_pt WHERE ds='20250612';