All Products
Search
Document Center

MaxCompute:Paimon external tables

Last Updated:Mar 26, 2026

Map a Paimon table stored in Object Storage Service (OSS) to a MaxCompute external table and run SQL queries against it directly—without ETL migration.

Apache Paimon is a lake storage format for batch and streaming workloads. It supports high-throughput writes and low-latency queries, and integrates with engines such as Spark, Hive, Trino, Realtime Compute for Apache Flink, and E-MapReduce. By creating a Paimon external table in MaxCompute, you can query Paimon data on OSS directly. Metadata filtering speeds up reads by skipping unnecessary OSS directory files during read operations.

Limitations

  • Schema changes in the underlying Paimon files are not reflected automatically. Recreate the external table DDL manually when the Paimon schema changes.

  • Cluster attributes and primary keys cannot be set on Paimon external tables.

  • Querying historical versions of data (time travel) is not supported.

  • Do not write data directly into a Paimon external table. To export data to OSS, use UNLOAD or a similar method.

  • INSERT INTO and INSERT OVERWRITE are supported, but writing to Dynamic Bucket tables and Cross Partition tables is not.

  • UPDATE and DELETE are not supported.

  • MaxCompute and the OSS bucket must be in the same region.

Prerequisites

Before you begin, make sure you have:

Create a Paimon external table

Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
  'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss_location>';

For a full description of external table syntax, see OSS external tables. For parameter details, see Basic syntax parameters.

Write data

INSERT INTO and INSERT OVERWRITE are supported. For the write syntax, see Write syntax.

Writing to Dynamic Bucket tables and Cross Partition tables is not supported.

Query data

Paimon uses its own internal file organization and sharding mechanism (Split), which does not fully align with MaxCompute native table parameters. Metadata filtering optimizes query performance by skipping unnecessary OSS directory files during read operations. For the SELECT syntax, see Query syntax. To optimize query plans, see Query optimization. For BadRowSkipping configuration, see BadRowSkipping.

Example: query Paimon data from MaxCompute

This example walks through the full flow: create a Paimon table in Flink, write data to OSS, create a Paimon external table in MaxCompute, and query the results.

Step 1: Prepare Paimon data in Flink

If you already have Paimon data in OSS, skip this step.

1.1 Create a Paimon filesystem catalog

  1. Log on to the Flink console. In the upper-left corner, select a region.

  2. Click the target workspace name. In the left navigation pane, select Catalogs.

  3. On the Catalog List page, click Create Catalog. In the Create Catalog dialog box, select Apache Paimon, click Next, and configure the following parameters:

    ParameterRequiredDescription
    metastoreYesMetastore type. Use filesystem.
    catalog nameYesA custom name, such as paimon-catalog.
    warehouseYesThe OSS warehouse directory, such as oss://paimon-fs/paimon-test/.
    fs.oss.endpointYesThe OSS endpoint. For example, oss-cn-hangzhou-internal.aliyuncs.com for the China (Hangzhou) region.
    fs.oss.accessKeyIdYesThe AccessKey ID for OSS access.
    fs.oss.accessKeySecretYesThe AccessKey Secret for OSS access.

1.2 Create a Paimon table and insert data

  1. Log on to the Flink console. Select a region, then click the target workspace name.

  2. In the left navigation pane, select Development > Scripts.

  3. On the New Script tab, click image to create a query script.

  4. Run the following SQL:

    CREATE TABLE `paimon_catalog`.`default`.test_tbl (
        id BIGINT,
        data STRING,
        dt STRING,
        PRIMARY KEY (dt, id) NOT ENFORCED
    ) PARTITIONED BY (dt);
    
    INSERT INTO `paimon-catalog`.`default`.test_tbl VALUES (1,'CCC','2024-07-18'), (2,'DDD','2024-07-18');

If the job is rate-limited when running INSERT INTO ... VALUES ...:

  1. In the left navigation pane, select O&M > Deployments.

  2. On the Deployments page, click the target job to open its Configuration page.

  3. In the Runtime Parameter Settings area, click Edit. In the Other Configuration section, add:

    execution.checkpointing.checkpoints-after-tasks-finish.enabled: true

    For details, see Configure job deployment information.

Step 2: Create a Paimon external table in MaxCompute

Run the following SQL in MaxCompute. Replace <uid> with your Alibaba Cloud account ID, and replace <table_path> with the path to your Paimon table directory in OSS (without the oss:// prefix).

CREATE EXTERNAL TABLE oss_extable_paimon_pt
(
    id BIGINT,
    data STRING
)
PARTITIONED BY (dt STRING)
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
    'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-<your region>-internal.aliyuncs.com/<table_path>';

Find the table path

<table_path> is the path of the Paimon table in OSS, for example paimon-fs/paimon-test/default.db/test_tbl. To find it:

  1. In the Flink console, select the target workspace and go to Catalogs.

  2. On the Metadata page, click default under the target catalog, then click View for the target table.

  3. On the Table Schema tab, in the Properties area, copy the value of the path parameter. Use only the part after oss://.

Step 3: Load partition data

Because oss_extable_paimon_pt is a partitioned table, run the following command to discover and register existing partitions:

MSCK REPAIR TABLE oss_extable_paimon_pt ADD PARTITIONS;

For the full syntax, see Load partition data for OSS external tables.

Step 4: Query the Paimon external table

The following two SET statements enable Hive-compatible query execution, which is required for Paimon external tables.

SET odps.sql.common.table.planner.ext.hive.bridge = true;
SET odps.sql.hive.compatible = true;
SELECT * FROM oss_extable_paimon_pt WHERE dt='2024-07-18';

Expected output:

+------------+------------+------------+
| id         | data       | dt         |
+------------+------------+------------+
| 1          | CCC        | 2024-07-18 |
| 2          | DDD        | 2024-07-18 |
+------------+------------+------------+

Schema mismatch behavior

If the Paimon file schema differs from the external table DDL:

  • Fewer columns in the file than in the DDL: Missing values are filled with NULL.

  • More columns in the file than in the DDL: Extra columns are discarded.

  • Data type mismatch: Reading STRING data into an INT column is not supported. Reading INT data into a STRING column is supported but not recommended.

Supported data types

For MaxCompute data type details, see Data type edition 1.0. If you use the newer type system, see Data type edition 2.0.

Paimon data typeMaxCompute 2.0 data typeRead/write supportNotes
TINYINTTINYINTSupported8-bit signed integer
SMALLINTSMALLINTSupported16-bit signed integer
INTINTSupported32-bit signed integer
BIGINTBIGINTSupported64-bit signed integer
BINARY(MAX_LENGTH)BINARYSupportedMaximum length: 8 MB
FLOATFLOATSupported32-bit binary floating-point
DOUBLEDOUBLESupported64-bit binary floating-point
DECIMAL(precision,scale)DECIMAL(precision,scale)SupportedDefault: decimal(38,18). Precision: 1–38. Scale: 0–18.
VARCHAR(n)VARCHAR(n)Supportedn ranges from 1 to 65535
CHAR(n)CHAR(n)Supportedn ranges from 1 to 255
VARCHAR(MAX_LENGTH)STRINGSupportedMaximum length: 8 MB
DATEDATESupportedFormat: yyyy-mm-dd
TIME, TIME(p)Not supportedNot supportedMaxCompute has no corresponding type. The Paimon TIME type represents time without time zone, composed of hours, minutes, and seconds, with nanosecond precision. TIME(p) specifies fractional-second precision from 0 to 9 (default is 0).
TIMESTAMP, TIMESTAMP(p)TIMESTAMP_NTZSupportedTimestamp without time zone, precise to nanoseconds. Before reading, run: SET odps.sql.common.table.jni.disable.native=true;
TIMESTAMP WITH LOCAL TIME_ZONE(9)TIMESTAMPSupportedFormat: yyyy-mm-dd hh:mm:ss.xxxxxxxxx. Precision truncation applies: 0–3 → 3 digits, 4–6 → 6 digits, 7–9 → 9 digits.
TIMESTAMP WITH LOCAL TIME_ZONE(9)DATETIMENot supportedTimestamp type precise to nanoseconds. Format: yyyy-mm-dd hh:mm:ss.xxxxxxxxx
BOOLEANBOOLEANSupported
ARRAYARRAYSupportedComplex type
MAPMAPSupportedComplex type
ROWSTRUCTSupportedComplex type
MULTISET\<t\>Not supportedNot supportedMaxCompute has no corresponding type.
VARBINARY, VARBINARY(n), BYTESBINARYSupportedVariable-length binary string

Troubleshooting

kSIGABRT error when reading a Paimon external table

Error message:

ODPS-0123144: Fuxi job failed - kSIGABRT(errCode:6) at Odps/*****_SQL_0_1_0_job_0/M1@f01b17437.cloud.eo166#3.
  Detail error msg: CRASH_CORE, maybe caused by jvm crash, please check your java udf/udaf/udtf.
  | fatalInstance: Odps/*****_SQL_0_1_0_job_0/M1#0_0

This error occurs when reading TIMESTAMP_NTZ data in JNI mode. Before querying the table, run:

SET odps.sql.common.table.jni.disable.native=true;

What's next

You can also create a MaxCompute Paimon external table directly from Flink as a custom catalog, write data to it, and then query and consume the Paimon data from MaxCompute. See Create a MaxCompute Paimon external table based on Flink.