All Products
Search
Document Center

MaxCompute:Paimon external tables

Last Updated:Aug 26, 2025

MaxCompute lets you create Paimon external tables to map to Paimon table folders in Object Storage Service (OSS) and access the data within these folders. This topic describes how to create a Paimon external table and access its data using MaxCompute.

Background information

Apache Paimon is a lake storage format for both streaming and batch processing. It provides high-throughput writes and low-latency queries. Common compute engines, such as Spark, Hive, or Trino, are fully integrated with Paimon and available in services such as Realtime Compute for Apache Flink and E-MapReduce. With Apache Paimon, you can quickly build your own data lake storage service in OSS and connect it to MaxCompute for data lake analytics. You can use metadata filtering to avoid reading unnecessary files from OSS folders during task processing.

Prerequisites

  • Ensure your Alibaba Cloud account, RAM user, or RAM role has the required permissions to access OSS external tables. For more information about authorization, see STS authorization for OSS.

  • (Optional) Prepare an OSS bucket, directories, and data files. For more information, see Create buckets, Manage directories, and Simple upload.

    MaxCompute can automatically create directories in OSS. You can use a single SQL statement to read from or write to an external table that uses a UDF. Manual directory creation is no longer required, but the legacy method is still supported.
  • A MaxCompute project has been created. For more information, see Create a MaxCompute project.

    MaxCompute is deployed only in specific regions. To prevent cross-region data connectivity issues, we recommend that you use a bucket in the same region as your MaxCompute project.
  • The Alibaba Cloud account or RAM user has the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.

Limits

  • MaxCompute currently supports only read operations for Paimon external tables. Write operations and automatic schema evolution for Paimon tables are not supported.

  • Paimon external tables do not support the `cluster` property.

  • Paimon external tables do not support features such as querying historical data versions.

Usage notes

When the schema in the Paimon file does not match the external table schema:

  • Mismatched column count: If the Paimon file contains fewer columns than the external table schema, the missing columns are filled with NULL values when the data is read. If the Paimon file contains more columns, the extra column data is discarded.

  • Mismatched column types: MaxCompute does not allow you to use the `INT` type to read `STRING` data from a Paimon file. You can use the `STRING` type to read `INT` data, but this practice is not recommended.

Supported data types

Note

In the following table, 已开通 indicates support and 未开通 indicates no support.

For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).

Data type

Supported

TINYINT

已开通

SMALLINT

已开通

INT

已开通

BIGINT

已开通

BINARY

已开通

FLOAT

已开通

DOUBLE

已开通

DECIMAL(precision,scale)

已开通

VARCHAR(n)

已开通

CHAR(n)

已开通

STRING

已开通

DATE

已开通

DATETIME

未开通

TIMESTAMP

已开通

TIMESTAMP_NTZ

未开通

BOOLEAN

已开通

ARRAY

已开通

MAP

已开通

STRUCT

已开通

JSON

未开通

Create an external table

Syntax

For more information about the syntax for external tables of different formats, see OSS external tables.

CREATE EXTERNAL TABLE  [if NOT EXISTS] <mc_oss_extable_name>
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
  'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss_location>';

Common parameters

For information about common parameters, see Basic syntax parameters.

Write data

For more information about the write syntax in MaxCompute, see Write syntax.

Query analysis

Example

Step 1: Prepare data in Flink

Create a Paimon catalog and a Paimon table, and then insert data into the table. The following example shows how.

Note

If you already have Paimon table data in OSS, you can skip this step.

  1. Log on to the Realtime Compute for Apache Flink console and create a Paimon Filesystem Catalog. For this example, the catalog is named paimoncatalog. For more information, see Create an Apache Paimon Filesystem catalog.

  2. Create a Paimon table. For more information, see Manage Apache Paimon tables.

    1. In the text editor on the Data Query page, enter the following statement, select the code, and then click Run.

      CREATE TABLE `paimoncatalog`.`default`.test_tbl (
          id BIGINT,
          data STRING,
          dt STRING,
          PRIMARY KEY (dt, id) NOT ENFORCED
      ) PARTITIONED BY (dt);
    2. On the Data Development > ETL page, create an SQL job that contains the following statement. Then, deploy and run the job. For more information, see Develop an SQL draft.

      INSERT INTO `paimoncatalog`.`default`.test_tbl VALUES (1,'CCC','2024-07-18'), (2,'DDD','2024-07-18');
      Important
      • Make sure that the engine version for the SQL job is vvr-8.0.1-flink-1.17 or later.

      • If the SQL job is a bounded stream job, for example, a job that executes an INSERT INTO ... VALUES ... statement, perform the following steps:

        1. On the O&M page, click the target job to go to its Deployment Details tab.

        2. Edit the Runtime Parameter Configuration. In the Other Configurations section, set execution.checkpointing.checkpoints-after-tasks-finish.enabled: true, and then save the configuration. For more information about how to configure runtime parameters for a job, see Configure job deployment information.

Step 2: Create a Paimon external table in MaxCompute

In MaxCompute, execute the following SQL statement to create a MaxCompute Paimon external table.

CREATE EXTERNAL TABLE oss_extable_paimon_pt
(
    id BIGINT,
    data STRING
)
PARTITIONED BY (dt STRING )
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
WITH serdeproperties (
    'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/<table_path>'
;

MSCK REPAIR TABLE oss_extable_paimon_pt ADD PARTITIONS;

In the preceding statement, table_path is the path of the Paimon table that you created in Flink, for example, oss-mc-test/paimon/default.db/test_tbl. To obtain the path, perform the following steps:

  1. Log on to the Realtime Compute for Apache Flink console and click the name of your workspace.

  2. In the navigation pane on the left, click Data Management.

  3. On the Metadata page, click **default** in the target catalog. On the default page, click View in the target table's Actions column.

  4. In the Table Properties section on the Table Schema Details tab, obtain the value of the path parameter. Use only the part of the path that follows oss:// as the value for table_path.

Step 3: Import partition data

If the OSS external table that you created is a partitioned table, you must also import the partition data. For more information, see Syntax for adding partition data to an OSS external table.

MSCK REPAIR TABLE oss_extable_paimon_pt ADD PARTITIONS;

Step 4: Read the Paimon external table using MaxCompute

In MaxCompute, execute the following statement to query the MaxCompute Paimon external table `oss_extable_paimon_pt`.

SET odps.sql.common.table.planner.ext.hive.bridge = true;
SET odps.sql.hive.compatible = true;
SELECT * FROM oss_extable_paimon_pt WHERE dt='2024-07-18';

The following result is returned:

+------------+------------+------------+
| id         | data       | dt         |
+------------+------------+------------+
| 1          | CCC        | 2024-07-18 |
| 2          | DDD        | 2024-07-18 |
+------------+------------+------------+

Related topics

You can also create a MaxCompute Paimon external table in Flink using a custom catalog. After data is written to the table, you can query and consume the Paimon data using MaxCompute. For more information, see Create an Apache Paimon external table by using Realtime Compute for Apache Flink.