This topic describes how to create and read data from Hudi-formatted external tables in Object Storage Service (OSS).
Prerequisites
The Alibaba Cloud account or RAM user is granted permissions to access OSS. Alibaba Cloud accounts, RAM users, and RAM roles can access OSS external tables. For more information about authorization, see STS authorization.
A MaxCompute project is created.
MaxCompute is deployed only in specific regions. To prevent a cross-region data connectivity issue, we recommend that you use a bucket in the same region as your MaxCompute project.
The Alibaba Cloud account or RAM user is granted the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.
(Optional) An OSS bucket, OSS directories, and OSS data files are prepared. For more information, see Create buckets, Manage directories, and Simple upload.
MaxCompute can automatically create an OSS directory in OSS. For SQL statements that include external tables and UDFs, you can execute operations to read and write external tables and UDFs with a single SQL statement. You can also manually create an OSS directory.
Limits
OSS external tables do not support the clustering.
A single file cannot exceed 3 GB. If one does, we recommend that you split it into multiple files.
Notes
Hudi external tables can read data only from the files that are mapped to the table. They do not support automatic hiding of system columns, incremental reads, snapshot reads, or write operations. You can use MaxCompute Delta tables or Paimon external tables to implement read and write operations with atomicity, consistency, isolation, and durability (ACID).
The Hudi software development kit (SDK) version that is integrated into MaxCompute by default is
org.apache.hudi:hudi-hadoop-mr-bundle:0.12.2-emr-1.0.6. MaxCompute cannot guarantee forward or backward compatibility for the Hudi SDK. The open source community is responsible for ensuring compatibility.MaxCompute supports only the external table features available in the previously mentioned Hudi SDK version. MaxCompute will not provide version updates or feature enhancements. Use Paimon external tables to read data in data lake formats.
Supported data types
In the following tables,
indicates supported and
indicates not supported.
For more information about MaxCompute data types, see Data types (Edition 1.0) and Data types (Edition 2.0).
Data type | Supported | Data type | Supported |
TINYINT |
| STRING |
|
SMALLINT |
| DATE |
|
INT |
| DATETIME |
|
BIGINT |
| TIMESTAMP |
|
BINARY |
| TIMESTAMP_NTZ |
|
FLOAT |
| BOOLEAN |
|
DOUBLE |
| ARRAY |
|
DECIMAL(precision,scale) |
| MAP |
|
VARCHAR(n) |
| STRUCT |
|
CHAR(n) |
| JSON |
|
Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] mc_oss_extable_name
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'oss_location';Parameters
For more information, see Basic syntax parameters.
Query analysis
For more information about the SELECT syntax, see Read data from OSS.
For more information about optimizing query plans, see Query optimization.
Example
This example shows how to create an external table for Hudi-formatted data using the built-in open source data parser and then query the data.
Prepare the data.
Log on to the OSS console and upload the test data file in Hudi format to the
oss-mc-test/Demo_hudi_pt/dt=20250612/directory in an OSS bucket. For more information, see Upload files.Create a Delta Lake external table.
CREATE EXTERNAL TABLE vehicle_hudi_pt ( _hoodie_commit_time string, _hoodie_commit_seqno string, _hoodie_record_key string, _hoodie_partition_path string, _hoodie_file_name STRING , id STRING , name STRING ) PARTITIONED BY (ds STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_hudi_pt/';Import partition data. If the OSS external table is a partitioned table, you must import the partition data. For more information, see Syntax for adding partition data to OSS external tables.
MSCK REPAIR TABLE vehicle_hudi_pt ADD PARTITIONS;Read data from the Hudi external table.
SELECT * FROM vehicle_hudi_pt WHERE ds='20250612';