All Products
Search
Document Center

MaxCompute:Hudi external tables

Last Updated:Mar 26, 2026

This topic describes how to create, read, and write data to OSS external tables in Hudi format.

Important

MaxCompute only supports the external table feature for the published Hudi software development kit (SDK) version. No further version updates or feature enhancements will be provided. Use Paimon external tables to read data in data lake table formats.

Limitations

Before creating a Hudi external table, review the following constraints:

  • OSS external tables do not support the cluster property.

  • The maximum size of a single file is 2 GB. Split files that exceed this limit before reading them.

  • MaxCompute and Object Storage Service (OSS) must be in the same region.

  • Hudi external tables support reading all data from mapped files only. The following operations are not supported:

    • Automatic hiding of system columns

    • Incremental reads

    • Snapshot reads

    • Write operations

  • For operations that require atomicity, consistency, isolation, and durability (ACID), use MaxCompute Delta tables or Paimon external tables instead.

  • The default Hudi SDK version integrated into MaxCompute is org.apache.hudi:hudi-hadoop-mr-bundle:0.12.2-emr-1.0.6. MaxCompute does not guarantee forward or backward compatibility for the Hudi SDK. The open source community is responsible for compatibility.

  • For partitioned external tables, you must run MSCK REPAIR TABLE <table_name> ADD PARTITIONS; after creating the table to register existing OSS partitions with MaxCompute. Queries on a partitioned table return no data until this step is complete.

Supported data types

For the full type reference, see Data types (Version 1.0) and Data types (Version 2.0).

Data type Support Data type Support
TINYINT STRING
SMALLINT DATE
INT DATETIME
BIGINT TIMESTAMP
BINARY TIMESTAMP_NTZ
FLOAT BOOLEAN
DOUBLE ARRAY
DECIMAL(precision,scale) MAP
VARCHAR(n) STRUCT
CHAR(n) JSON

Create an external table

Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
   <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss_location';

The INPUTFORMAT, OUTPUTFORMAT, and ROW FORMAT SERDE values must be used exactly as shown. For descriptions of other parameters, see Basic syntax parameters.

Query data

Example

This example creates a partitioned Hudi external table, registers its partitions, and queries the data.

Prerequisites

Before you begin, ensure that you have:

Step 1: Upload test data

Log on to the OSS console and upload the Hudi-formatted test data file to the oss-mc-test/Demo_hudi_pt/dt=20250612/ folder. For details, see Upload files to OSS.

Step 2: Create a Hudi external table

Run the following statement to create a partitioned Hudi external table mapped to the OSS location:

CREATE EXTERNAL TABLE vehicle_hudi_pt (
  _hoodie_commit_time  string,
  _hoodie_commit_seqno string,
  _hoodie_record_key string,
  _hoodie_partition_path string,
  _hoodie_file_name STRING,
  id STRING,
  name STRING
)
PARTITIONED BY (ds STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_hudi_pt/';

Step 3: Register partitions

After creating a partitioned external table, register its OSS partitions with MaxCompute. Without this step, queries return no data.

MSCK REPAIR TABLE vehicle_hudi_pt ADD PARTITIONS;

For the full partition syntax, see Syntax for adding partition data to an OSS external table.

Step 4: Query the table

SELECT * FROM vehicle_hudi_pt WHERE ds='20250612';