Map a Paimon table stored in Object Storage Service (OSS) to a MaxCompute external table and run SQL queries against it directly—without ETL migration.
Apache Paimon is a lake storage format for batch and streaming workloads. It supports high-throughput writes and low-latency queries, and integrates with engines such as Spark, Hive, Trino, Realtime Compute for Apache Flink, and E-MapReduce. By creating a Paimon external table in MaxCompute, you can query Paimon data on OSS directly. Metadata filtering speeds up reads by skipping unnecessary OSS directory files during read operations.
Limitations
Schema changes in the underlying Paimon files are not reflected automatically. Recreate the external table DDL manually when the Paimon schema changes.
Cluster attributes and primary keys cannot be set on Paimon external tables.
Querying historical versions of data (time travel) is not supported.
Do not write data directly into a Paimon external table. To export data to OSS, use UNLOAD or a similar method.
INSERT INTOandINSERT OVERWRITEare supported, but writing to Dynamic Bucket tables and Cross Partition tables is not.UPDATEandDELETEare not supported.MaxCompute and the OSS bucket must be in the same region.
Prerequisites
Before you begin, make sure you have:
A MaxCompute project. See Create a MaxCompute project.
An OSS bucket and directory in the same region as your MaxCompute project. To set these up, see Create a bucket. To create directories inside the bucket, see Manage folders.
OSS access permissions for your Alibaba Cloud account, Resource Access Management (RAM) user, or RAM role. See STS-mode authorization for OSS.
The
CreateTablepermission in your MaxCompute project. See MaxCompute permissions.
Create a Paimon external table
Syntax
CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
<col_name> <data_type>,
...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss_location>';For a full description of external table syntax, see OSS external tables. For parameter details, see Basic syntax parameters.
Write data
INSERT INTO and INSERT OVERWRITE are supported. For the write syntax, see Write syntax.
Writing to Dynamic Bucket tables and Cross Partition tables is not supported.
Query data
Paimon uses its own internal file organization and sharding mechanism (Split), which does not fully align with MaxCompute native table parameters. Metadata filtering optimizes query performance by skipping unnecessary OSS directory files during read operations. For the SELECT syntax, see Query syntax. To optimize query plans, see Query optimization. For BadRowSkipping configuration, see BadRowSkipping.
Example: query Paimon data from MaxCompute
This example walks through the full flow: create a Paimon table in Flink, write data to OSS, create a Paimon external table in MaxCompute, and query the results.
Step 1: Prepare Paimon data in Flink
If you already have Paimon data in OSS, skip this step.
1.1 Create a Paimon filesystem catalog
Log on to the Flink console. In the upper-left corner, select a region.
Click the target workspace name. In the left navigation pane, select Catalogs.
On the Catalog List page, click Create Catalog. In the Create Catalog dialog box, select Apache Paimon, click Next, and configure the following parameters:
Parameter Required Description metastore Yes Metastore type. Use filesystem.catalog name Yes A custom name, such as paimon-catalog.warehouse Yes The OSS warehouse directory, such as oss://paimon-fs/paimon-test/.fs.oss.endpoint Yes The OSS endpoint. For example, oss-cn-hangzhou-internal.aliyuncs.comfor the China (Hangzhou) region.fs.oss.accessKeyId Yes The AccessKey ID for OSS access. fs.oss.accessKeySecret Yes The AccessKey Secret for OSS access.
1.2 Create a Paimon table and insert data
Log on to the Flink console. Select a region, then click the target workspace name.
In the left navigation pane, select Development > Scripts.
On the New Script tab, click
to create a query script.Run the following SQL:
CREATE TABLE `paimon_catalog`.`default`.test_tbl ( id BIGINT, data STRING, dt STRING, PRIMARY KEY (dt, id) NOT ENFORCED ) PARTITIONED BY (dt); INSERT INTO `paimon-catalog`.`default`.test_tbl VALUES (1,'CCC','2024-07-18'), (2,'DDD','2024-07-18');
If the job is rate-limited when running INSERT INTO ... VALUES ...:
In the left navigation pane, select O&M > Deployments.
On the Deployments page, click the target job to open its Configuration page.
In the Runtime Parameter Settings area, click Edit. In the Other Configuration section, add:
execution.checkpointing.checkpoints-after-tasks-finish.enabled: trueFor details, see Configure job deployment information.
Step 2: Create a Paimon external table in MaxCompute
Run the following SQL in MaxCompute. Replace <uid> with your Alibaba Cloud account ID, and replace <table_path> with the path to your Paimon table directory in OSS (without the oss:// prefix).
CREATE EXTERNAL TABLE oss_extable_paimon_pt
(
id BIGINT,
data STRING
)
PARTITIONED BY (dt STRING)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-<your region>-internal.aliyuncs.com/<table_path>';Find the table path
<table_path> is the path of the Paimon table in OSS, for example paimon-fs/paimon-test/default.db/test_tbl. To find it:
In the Flink console, select the target workspace and go to Catalogs.
On the Metadata page, click default under the target catalog, then click View for the target table.
On the Table Schema tab, in the Properties area, copy the value of the path parameter. Use only the part after
oss://.
Step 3: Load partition data
Because oss_extable_paimon_pt is a partitioned table, run the following command to discover and register existing partitions:
MSCK REPAIR TABLE oss_extable_paimon_pt ADD PARTITIONS;For the full syntax, see Load partition data for OSS external tables.
Step 4: Query the Paimon external table
The following two SET statements enable Hive-compatible query execution, which is required for Paimon external tables.
SET odps.sql.common.table.planner.ext.hive.bridge = true;
SET odps.sql.hive.compatible = true;
SELECT * FROM oss_extable_paimon_pt WHERE dt='2024-07-18';Expected output:
+------------+------------+------------+
| id | data | dt |
+------------+------------+------------+
| 1 | CCC | 2024-07-18 |
| 2 | DDD | 2024-07-18 |
+------------+------------+------------+Schema mismatch behavior
If the Paimon file schema differs from the external table DDL:
Fewer columns in the file than in the DDL: Missing values are filled with
NULL.More columns in the file than in the DDL: Extra columns are discarded.
Data type mismatch: Reading
STRINGdata into anINTcolumn is not supported. ReadingINTdata into aSTRINGcolumn is supported but not recommended.
Supported data types
For MaxCompute data type details, see Data type edition 1.0. If you use the newer type system, see Data type edition 2.0.
| Paimon data type | MaxCompute 2.0 data type | Read/write support | Notes |
|---|---|---|---|
| TINYINT | TINYINT | Supported | 8-bit signed integer |
| SMALLINT | SMALLINT | Supported | 16-bit signed integer |
| INT | INT | Supported | 32-bit signed integer |
| BIGINT | BIGINT | Supported | 64-bit signed integer |
| BINARY(MAX_LENGTH) | BINARY | Supported | Maximum length: 8 MB |
| FLOAT | FLOAT | Supported | 32-bit binary floating-point |
| DOUBLE | DOUBLE | Supported | 64-bit binary floating-point |
| DECIMAL(precision,scale) | DECIMAL(precision,scale) | Supported | Default: decimal(38,18). Precision: 1–38. Scale: 0–18. |
| VARCHAR(n) | VARCHAR(n) | Supported | n ranges from 1 to 65535 |
| CHAR(n) | CHAR(n) | Supported | n ranges from 1 to 255 |
| VARCHAR(MAX_LENGTH) | STRING | Supported | Maximum length: 8 MB |
| DATE | DATE | Supported | Format: yyyy-mm-dd |
| TIME, TIME(p) | Not supported | Not supported | MaxCompute has no corresponding type. The Paimon TIME type represents time without time zone, composed of hours, minutes, and seconds, with nanosecond precision. TIME(p) specifies fractional-second precision from 0 to 9 (default is 0). |
| TIMESTAMP, TIMESTAMP(p) | TIMESTAMP_NTZ | Supported | Timestamp without time zone, precise to nanoseconds. Before reading, run: SET odps.sql.common.table.jni.disable.native=true; |
| TIMESTAMP WITH LOCAL TIME_ZONE(9) | TIMESTAMP | Supported | Format: yyyy-mm-dd hh:mm:ss.xxxxxxxxx. Precision truncation applies: 0–3 → 3 digits, 4–6 → 6 digits, 7–9 → 9 digits. |
| TIMESTAMP WITH LOCAL TIME_ZONE(9) | DATETIME | Not supported | Timestamp type precise to nanoseconds. Format: yyyy-mm-dd hh:mm:ss.xxxxxxxxx |
| BOOLEAN | BOOLEAN | Supported | — |
| ARRAY | ARRAY | Supported | Complex type |
| MAP | MAP | Supported | Complex type |
| ROW | STRUCT | Supported | Complex type |
| MULTISET\<t\> | Not supported | Not supported | MaxCompute has no corresponding type. |
| VARBINARY, VARBINARY(n), BYTES | BINARY | Supported | Variable-length binary string |
Troubleshooting
kSIGABRT error when reading a Paimon external table
Error message:
ODPS-0123144: Fuxi job failed - kSIGABRT(errCode:6) at Odps/*****_SQL_0_1_0_job_0/M1@f01b17437.cloud.eo166#3.
Detail error msg: CRASH_CORE, maybe caused by jvm crash, please check your java udf/udaf/udtf.
| fatalInstance: Odps/*****_SQL_0_1_0_job_0/M1#0_0This error occurs when reading TIMESTAMP_NTZ data in JNI mode. Before querying the table, run:
SET odps.sql.common.table.jni.disable.native=true;What's next
You can also create a MaxCompute Paimon external table directly from Flink as a custom catalog, write data to it, and then query and consume the Paimon data from MaxCompute. See Create a MaxCompute Paimon external table based on Flink.