This topic describes how to create, read, and write data to OSS external tables in Hudi format.
MaxCompute only supports the external table feature for the published Hudi software development kit (SDK) version. No further version updates or feature enhancements will be provided. Use Paimon external tables to read data in data lake table formats.
Limitations
Before creating a Hudi external table, review the following constraints:
-
OSS external tables do not support the cluster property.
-
The maximum size of a single file is 2 GB. Split files that exceed this limit before reading them.
-
MaxCompute and Object Storage Service (OSS) must be in the same region.
-
Hudi external tables support reading all data from mapped files only. The following operations are not supported:
-
Automatic hiding of system columns
-
Incremental reads
-
Snapshot reads
-
Write operations
-
-
For operations that require atomicity, consistency, isolation, and durability (ACID), use MaxCompute Delta tables or Paimon external tables instead.
-
The default Hudi SDK version integrated into MaxCompute is
org.apache.hudi:hudi-hadoop-mr-bundle:0.12.2-emr-1.0.6. MaxCompute does not guarantee forward or backward compatibility for the Hudi SDK. The open source community is responsible for compatibility. -
For partitioned external tables, you must run
MSCK REPAIR TABLE <table_name> ADD PARTITIONS;after creating the table to register existing OSS partitions with MaxCompute. Queries on a partitioned table return no data until this step is complete.
Supported data types
For the full type reference, see Data types (Version 1.0) and Data types (Version 2.0).
| Data type | Support | Data type | Support |
|---|---|---|---|
| TINYINT | ✓ | STRING | ✗ |
| SMALLINT | ✓ | DATE | ✗ |
| INT | ✓ | DATETIME | ✗ |
| BIGINT | ✓ | TIMESTAMP | ✗ |
| BINARY | ✓ | TIMESTAMP_NTZ | ✗ |
| FLOAT | ✗ | BOOLEAN | ✓ |
| DOUBLE | ✗ | ARRAY | ✓ |
| DECIMAL(precision,scale) | ✓ | MAP | ✗ |
| VARCHAR(n) | ✗ | STRUCT | ✗ |
| CHAR(n) | ✗ | JSON | ✗ |
Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss_location';
The INPUTFORMAT, OUTPUTFORMAT, and ROW FORMAT SERDE values must be used exactly as shown. For descriptions of other parameters, see Basic syntax parameters.
Query data
-
For SELECT syntax, see Read data from OSS.
-
For query plan optimization, see Query optimization.
Example
This example creates a partitioned Hudi external table, registers its partitions, and queries the data.
Prerequisites
Before you begin, ensure that you have:
-
An OSS bucket and folder in the same region as your MaxCompute project. To create a bucket, see Create a bucket. To create a folder, see Manage folders. MaxCompute also supports automatic folder creation in OSS.
-
OSS access permissions granted to an Alibaba Cloud account, a Resource Access Management (RAM) user, or a RAM role. For details, see Authorize access in STS mode for OSS
-
The CreateTable permission in the MaxCompute project. For details, see MaxCompute permissions
Step 1: Upload test data
Log on to the OSS console and upload the Hudi-formatted test data file to the oss-mc-test/Demo_hudi_pt/dt=20250612/ folder. For details, see Upload files to OSS.
Step 2: Create a Hudi external table
Run the following statement to create a partitioned Hudi external table mapped to the OSS location:
CREATE EXTERNAL TABLE vehicle_hudi_pt (
_hoodie_commit_time string,
_hoodie_commit_seqno string,
_hoodie_record_key string,
_hoodie_partition_path string,
_hoodie_file_name STRING,
id STRING,
name STRING
)
PARTITIONED BY (ds STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_hudi_pt/';
Step 3: Register partitions
After creating a partitioned external table, register its OSS partitions with MaxCompute. Without this step, queries return no data.
MSCK REPAIR TABLE vehicle_hudi_pt ADD PARTITIONS;
For the full partition syntax, see Syntax for adding partition data to an OSS external table.
Step 4: Query the table
SELECT * FROM vehicle_hudi_pt WHERE ds='20250612';