Delta Lake external tables in OSS let you query Delta Lake format data in place, without moving or converting it. This topic explains how to create, configure, and query these tables.
MaxCompute supports Delta Lake external tables only with a fixed version of the Delta Lake software development kit (SDK) and will not update the SDK or develop this feature further. For new data lake workloads, use Paimon external tables instead.
Limitations
Before creating a Delta Lake external table, review the following constraints:
| Constraint | Detail | Impact |
|---|---|---|
| Read-only, full scan | Supports only full reads of all data mapped to the table. Automatic hiding of system columns, incremental reads, snapshot reads, and write operations are not supported. | For ACID operations, use MaxCompute Delta tables or Paimon external tables. |
No cluster property |
OSS external tables do not support the cluster property. |
Omit the CLUSTERED BY clause when writing DDL statements. |
| 2 GB per-file limit | A single file cannot exceed 2 GB. | Split files that exceed this limit before creating the table. |
| Same-region requirement | MaxCompute and the OSS bucket must be in the same region. | Cross-region access is not supported and will cause query failures. |
| Fixed SDK version | The Delta Lake SDK integrated into MaxCompute is io.delta:delta-core_2.11:0.2.0.5. Forward and backward compatibility is not guaranteed; compatibility is maintained by the open source community. |
Use Delta Lake data written with a compatible SDK version. |
Supported data types
All of the following MaxCompute data types are supported. For type details, see Data types (version 1.0) and Data types (version 2.0).
| Data type | Supported | Data type | Supported |
|---|---|---|---|
| TINYINT | Yes | STRING | Yes |
| SMALLINT | Yes | DATE | Yes |
| INT | Yes | DATETIME | Yes |
| BIGINT | Yes | TIMESTAMP | Yes |
| BINARY | Yes | TIMESTAMP_NTZ | Yes |
| FLOAT | Yes | BOOLEAN | Yes |
| DOUBLE | Yes | ARRAY | Yes |
| DECIMAL(precision,scale) | Yes | MAP | Yes |
| VARCHAR(n) | Yes | STRUCT | Yes |
| CHAR(n) | Yes | JSON | Yes |
Create an external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'io.delta.hive.DeltaInputFormat'
OUTPUTFORMAT 'io.delta.hive.DeltaOutputFormat'
LOCATION 'oss_location';
DeltaInputFormat and DeltaOutputFormat are Delta Lake's Hive-compatible input/output classes. ParquetHiveSerDe handles the underlying Parquet serialization, since Delta Lake stores data in Parquet format. You must specify all three to register the table correctly.
For parameter details, see Basic syntax parameters.
Query data
-
For SELECT syntax, see Read data from OSS.
-
For query plan optimization, see Query optimization.
Example: create and query a partitioned Delta Lake external table
This example walks through the full workflow: uploading Delta Lake data to OSS, creating a partitioned external table in MaxCompute, importing partition metadata, and running a query.
Prerequisites
Before you begin, ensure that you have:
-
An OSS bucket in the same region as your MaxCompute project. See Create a bucket and Manage folders
-
OSS access permission via an Alibaba Cloud account, a Resource Access Management (RAM) user, or a RAM role. See Authorize access in STS mode for OSS
-
The
CreateTablepermission in the MaxCompute project. See MaxCompute permissions
Steps
-
Upload test data. Log in to the OSS console and upload a test data file in Delta Lake format to the
oss-mc-test/Demo_delta_pt/dt=20250612/folder. For details, see Upload files to OSS.MaxCompute creates OSS folders automatically when SQL statements reference external tables that include user-defined functions (UDFs). For other cases, create the folder manually.
-
Create a Delta Lake external table.
CREATE EXTERNAL TABLE test_delta_pt ( id INT ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'io.delta.hive.DeltaInputFormat' OUTPUTFORMAT 'io.delta.hive.DeltaOutputFormat' LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/Demo_delta_pt/'; -
Import partition metadata. For partitioned tables, run
MSCK REPAIR TABLEto detect and register partitions from the OSS path. For details, see Populate partitions for an OSS external table.MSCK REPAIR TABLE test_delta_pt ADD PARTITIONS; -
Query the table.
SELECT * FROM test_delta_pt WHERE dt='20250612';