This topic describes how to create, read, and write data to OSS external tables in Hudi format.
MaxCompute only supports the external table feature for the published Hudi software development kit (SDK) version. No further version updates or feature enhancements will be provided. Use Paimon external tables to read data in data lake table formats.
Applicability
The cluster property is not supported for OSS external tables.
A single file cannot exceed 3 GB. If a file is larger than 3 GB, you must split it.
MaxCompute and OSS must be deployed in the same region.
Hudi external tables only support reading all data from the mapped files. Automatic hiding of system columns, incremental reads, snapshot reads, or write operations are not supported. For read and write operations that require atomicity, consistency, isolation, and durability (ACID), use features such as MaxCompute Delta tables or Paimon external tables.
The default Hudi SDK version integrated into MaxCompute is
org.apache.hudi:hudi-hadoop-mr-bundle:0.12.2-emr-1.0.6. MaxCompute does not guarantee forward or backward compatibility for the Hudi SDK. The open source community is responsible for compatibility.
Supported data types
For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).
Data type | Support | Data type | Support |
TINYINT |
| STRING |
|
SMALLINT |
| DATE |
|
INT |
| DATETIME |
|
BIGINT |
| TIMESTAMP |
|
BINARY |
| TIMESTAMP_NTZ |
|
FLOAT |
| BOOLEAN |
|
DOUBLE |
| ARRAY |
|
DECIMAL(precision,scale) |
| MAP |
|
VARCHAR(n) |
| STRUCT |
|
CHAR(n) |
| JSON |
|
Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss_location';Parameters
For more information, see Basic syntax parameters.
Query and analysis
For more information about the SELECT syntax, see Read data from OSS.
For more information about optimizing a query plan, see Query optimization.
Example scenario
Create a Hudi-formatted external table using a built-in open source data parser, write data to OSS, and query the data.
Prepare the data.
Log on to the OSS console and upload the test RCFILE format data file to the specified directory
oss-mc-test/Demo_rcfile+pt/dt=20250521/in your OSS bucket. For more information, see Upload files to OSS.Upload test data
Log on to the OSS console and upload the Hudi-formatted test data file to the
oss-mc-test/Demo_hudi_pt/dt=20250612/folder in the OSS bucket. For more information, see Upload files to OSS.You can create a Delta Lake external table.
CREATE EXTERNAL TABLE vehicle_hudi_pt ( _hoodie_commit_time string, _hoodie_commit_seqno string, _hoodie_record_key string, _hoodie_partition_path string, _hoodie_file_name STRING , id STRING , name STRING ) PARTITIONED BY (ds STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_hudi_pt/';Import partition data. After you create a partitioned OSS external table, you must add its partitions. For more information, see Syntax for adding partition data to an OSS external table.
MSCK REPAIR TABLE vehicle_hudi_pt ADD PARTITIONS;Read the Hudi external table
SELECT * FROM vehicle_hudi_pt WHERE ds='20250612';