All Products
Search
Document Center

OpenLake:(Invitation-only Beta) Accessing Paimon Data from external projects using DLF

Last Updated:Jan 27, 2026

This topic describes how to use Flink to create a Paimon catalog in Data Lake Formation (DLF), read MySQL Change Data Capture (CDC) data, and write the data to DLF. You can then use a MaxCompute external project to perform federated queries and analysis on the data lake and write the results back to DLF. This topic applies to the new version of DLF, which is different from the old version of DLF. For more information about the new version of DLF, see Data Lake Formation.

Prerequisites

Procedure

Step 1: Prepare the source data

  • If you already have MySQL test data, you can skip this step.

  • This step simulates real-time data updates from a business system. The data is written to the data lake in Paimon format using Flink.

  1. Log on to the RDS console.

  2. In the navigation pane on the left, click Instances. Then, select a region in the upper-left corner.

  3. On the Instances page, click the target instance's Instance ID/Name to open its details page.

  4. In the left navigation pane, click Databases.

  5. Click Create Database. Configure the following parameters:

    Parameter

    Required

    Description

    Example

    Database Name

    Required

    • The name must be 2 to 64 characters in length.

    • It must start with a letter and end with a letter or a digit.

    • It can contain lowercase letters, digits, underscores (_), and hyphens (-).

    • The database name must be unique within the instance.

    • If the database name contains a hyphen (-), the hyphen (-) in the name of the created database folder is replaced with @002d.

    dlf25_paimon

    Supported Character Set

    Required

    Select a character set as needed.

    utf8

    Authorized By

    Optional

    • Select the accounts that need to access this database. You can leave this parameter empty and attach accounts after the database is created.

    • Only standard accounts are displayed here. Privileged accounts have all permissions on all databases and do not require authorization.

    Default

    Description

    Optional

    Enter remarks about the database for future management. The remarks can be up to 256 characters in length.

    Create a test database for the external project DLF 2.5.

  6. Click Log On to Database. In the left navigation pane, select Database Instances. Double-click the database that you created. On the SQLConsole page, execute the following statements to create a test table and write test data.

    If the instance exists but the target database is not displayed after you expand the instance, the reason may be one of the following:

    • The logon account does not have access to the target database: You can go to the Accounts page of the RDS instance to modify the account permissions or change the logon database account.

    • The metadata is not synchronized, which prevents the directory from being displayed: Hover the mouse pointer over the instance that contains the target database. Click the image button to the right of the instance name to refresh the database list.

    CREATE TABLE emp (
        empno    INT PRIMARY KEY,
        ename    VARCHAR(20),
        job      VARCHAR(20),
        mgr      INT,
        hiredate DATE,
        sal      DECIMAL(10,2),
        comm     DECIMAL(10,2),
        deptno   INT
    );
    
    INSERT INTO emp VALUES
    (7369,'SMITH','CLERK',7902,'2020-12-17', 800.00,NULL,20),
    (7499,'ALLEN','SALESMAN',7698,'2021-02-20',1600.00,300.00,30),
    (7521,'WARD','SALESMAN',7698,'2021-02-22',1250.00,500.00,30),
    (7566,'JONES','MANAGER',7839,'2021-04-02',2975.00,NULL,20),
    (7654,'MARTIN','SALESMAN',7698,'2021-09-28',1250.00,1400.00,30),
    (7698,'BLAKE','MANAGER',7839,'2021-05-01',2850.00,NULL,30),
    (7782,'CLARK','MANAGER',7839,'2021-06-09',2450.00,NULL,10),
    (7788,'SCOTT','ANALYST',7566,'2021-12-09',3000.00,NULL,20),
    (7839,'KING','PRESIDENT',NULL,'2021-11-17',5000.00,NULL,10),
    (7844,'TURNER','SALESMAN',7698,'2021-09-08',1500.00,0.00,30),
    (7876,'ADAMS','CLERK',7788,'2022-01-12',1100.00,NULL,20),
    (7900,'JAMES','CLERK',7698,'2021-12-03', 950.00,NULL,30),
    (7902,'FORD','ANALYST',7566,'2021-12-03',3000.00,NULL,20),
    (7934,'MILLER','CLERK',7782,'2022-01-23',1300.00,NULL,10),
    (8001,'DUKE','ENGINEER',7788,'2023-03-15',3500.00,NULL,20),
    (8002,'DANIEL','ENGINEER',7788,'2023-04-01',3400.00,NULL,20),
    (8003,'SANDRA','CONSULTANT',7698,'2023-05-10',2700.00,NULL,30),
    (8004,'KAREN','CLERK',7902,'2023-06-11',1200.00,NULL,20),
    (8005,'JOHN','MANAGER',7839,'2023-07-12',4000.00,NULL,10),
    (8006,'LISA','DESIGNER',7698,'2023-08-13',2200.00,NULL,30);
  7. Query the data in the test table.

    SELECT * FROM emp;

    The following result is returned:

    image

Step 2: Prepare a DLF metadatabase

  1. Log on to the Data Lake Formation (DLF) console and select a region in the upper-left corner.

  2. In the navigation pane on the left, choose Catalog List.

  3. On the Catalogs page, click Create Catalog.

  4. On the Data Lake Formation (DLF) page, enter the following parameters and then click Create Catalog:

    Parameter

    Required

    Description

    Catalog Name

    Required

    A custom catalog name. The name must start with a letter and can be 1 to 256 characters in length. It can contain letters, digits, and underscores (_). For example, db_dlf25_oss.

    Description

    Optional

    A custom description.

    Storage Type

    Required

    Standard.

    Storage Redundancy Type

    Required

    • Locally redundant storage: Data is stored in a single zone. If the zone where the data is stored becomes unavailable, the data becomes inaccessible. We recommend that you use zone-redundant storage.

    • Zone-redundant storage: This is a multi-zone redundancy mechanism within the same region. If a single zone becomes unavailable, your data remains available. After a catalog is created, you cannot change its storage redundancy type from zone-redundant to locally redundant. Zone-redundant storage provides higher data availability than locally redundant storage but is more expensive. We recommend zone-redundant storage for data that requires high availability.

Step 3: Create Paimon and MySQL catalogs using Flink

  1. Create a Paimon catalog

    1. Log on to the Flink console. In the upper-left corner of the page, select a region.

    2. Click the name of the target workspace. In the navigation pane on the left, select Catalogs.

    3. On the Catalog List page, click Create Catalog . In the Create Catalog dialog box, select Apache Paimon, click Next, and then configure the following parameters:

      Parameter

      Required

      Description

      metastore

      Required

      The metastore type. In this example, select dlf.

      catalog name

      Required

      Select the DLF Catalog of the version you want to associate. Select DLF V2.5. In this example, it is db_dlf25_oss created in DLF.

  2. Create a MySQL catalog

    1. Log on to the Flink console. In the upper-left corner of the page, select a region.

    2. Add an IP address to the whitelist.

      1. In the Actions column of the target workspace, click Details.

        In the Workspace Details panel that appears, copy the value of CIDR Block for the vSwitch.

      2. Log on to the RDS console.

        In the navigation pane on the left, click Instances. Then, select a region in the upper-left corner.

        On the Instances page, click the target instance's Instance ID/Name to open its details page.

      3. In the navigation pane on the left, click Whitelist and SecGroup.

        On the Whitelist Settings tab, click Modify.

      4. In the Edit Whitelist dialog box, add the copied CIDR block to the IP Addresses field, and then click OK.

    3. Log on to the Flink console. In the upper-left corner of the page, select a region.

      Click the name of the target workspace. In the navigation pane on the left, select Catalogs.

    4. On the Catalog List page, click Create Catalog. In the Create Catalog dialog box, select MySQL, click Next, and then configure the following parameters:

      Parameter

      Required

      Description

      catalog name

      Required

      A custom name for the MySQL catalog. For example, mysql-catalog-dlf25.

      hostname

      Required

      • The IP address or hostname of the MySQL database.

      • You can log on to the RDS MySQL console. On the database instance details page, click Database Connection to view the database's Internal Endpoint, Public Endpoint, and Internal Port.

      • When accessing across VPCs or over the Internet, you need to establish network connectivity. For more information, see Network Connectivity.

      port

      Default

      The port used to connect to the server. The default value is 3306.

      default database

      Required

      The default database name. For example, dlf25_paimon.

      username

      Required

      The username used to connect to the MySQL database server. You can log on to the RDS for MySQL console. On the instance details page, click Account Management to view the username.

      password

      Required

      The password used to connect to the MySQL database server. You can log on to the RDS for MySQL console. On the instance details page, click Account Management to view the password.

Step 4: Read MySQL data and write it to a Paimon table in DLF using Flink

  1. Log on to the Flink console. In the upper-left corner of the page, select a region.

  2. Click the name of the target workspace. In the navigation pane on the left, select Data Development > ETL.

  3. On the Drafts tab, click image to create a new folder.

  4. Right-click a folder and select New Blank Stream Draft. In the New Draft dialog box, enter a Name and select an Engine Version.

  5. In the file, write and execute the following SQL statement. Make sure to modify the names in the code based on your actual environment.

    CREATE TABLE IF NOT EXISTS `db_dlf25_oss`.`default`.`emp`
    WITH (
      'bucket' = '4',
      'changelog-producer' = 'input'
    ) AS TABLE `mysql-catalog-dlf25`.`dlf25_paimon`.`emp`;
    1. Optional: In the upper-right corner, click Validate to check the Flink SQL statement for syntax errors.

    2. In the upper-right corner, click Deploy. In the Deploy draft dialog box, enter values for Comment, Label, and Deployment Target, and then click Confirm.

  6. Click the name of the target workspace. In the navigation pane on the left, select O&M > Deployments.

  7. On the Deployments page, click the name of the target job to open the Configuration page.

  8. In the upper-right corner of the deployment details page, click Start, select an Initial Mode, and then click Start.

  9. Query the Paimon data

    In the navigation pane on the left, select Development > Scripts.

    On the New Script tab, click image to create a new query script.

    SELECT * FROM `db_dlf25_oss`.`default`.`emp`;

    The following result is returned:

    image

  10. Log on to the Data Lake Formation (DLF) console. Choose Catalog List > Catalogs, click the data catalog to go to the database, and view the details of the synchronized table.

image

Step 5: Create a Paimon_DLF external data source in MaxCompute

  1. Log on to the MaxCompute console and select a region in the top-left corner.

  2. In the navigation pane on the left, choose Manage Configurations > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters. The following tables describe the parameters.

    Parameter

    Required

    Description

    Foreign Server Type

    Required

    Select DLF+OSS.

    Foreign Server Name

    Required

    A custom name. The naming conventions are as follows:

    • The name must start with a letter and can contain only lowercase letters, underscores (_), and digits.

    • The name cannot exceed 128 characters in length.

    For example, mysql_paimon_dlf25.

    Foreign Server Description

    Optional

    Enter a description as needed.

    Region

    Required

    The default value is the current region.

    DLF Endpoint

    Required

    The default value is the DLF Endpoint of the current region.

    OSS Endpoint

    Required

    The default value is the OSS Endpoint of the current region.

    RoleARN

    Required

    The Alibaba Cloud Resource Name (ARN) of the RAM role. This role must have permissions to access both DLF and OSS.

    1. Log on to the Resource Access Management (RAM) console.

    2. In the navigation pane on the left, choose Identities > Roles.

    3. In the Basic Information section, you can find the ARN.

    Example: acs:ram::124****:role/aliyunodpsdefaultrole.

    Foreign Server Supplemental Properties

    Optional

    Special attributes for the external data source. After you specify them, tasks that use this external data source can access the source system according to the defined behavior.

    Note

    For information about the supported parameters, see the official documentation. More parameters will be supported as the product evolves.

  5. Click OK to create the external data source.

  6. On the Foreign Server page, find the target data source and click Details in the Actions column.

Step 6: Create a Paimon_DLF external project in MaxCompute

  1. Log on to the MaxCompute console and select a region in the top-left corner.

  2. In the navigation pane on the left, choose Manage Configurations > Projects.

  3. On the External Project tab, click Create Project.

  4. In the Create Project dialog box, configure the project information as prompted and click OK.

    Parameter

    Required

    Description

    Project Type

    Required

    The default value is External Project.

    Region

    Required

    The default value is the current region. You cannot change it here.

    Project Name (Globally Unique)

    Required

    The name must start with a letter, contain letters, digits, and underscores (_), and be 3 to 28 characters in length.

    MaxCompute Foreign Server Type

    Optional

    The default value is Paimon_DLF.

    MaxCompute Foreign Server

    Optional

    • Use Existing: A list of created external data sources appears.

    • Create Foreign Server: Create and use a new external data source.

    MaxCompute Foreign Server Name

    Required

    • Select Existing: Select the name of a created external data source from the drop-down list.

    • Create External Data Source: The name of the new external data source is used.

    Authentication And Authorization

    Required

    The identity of the task executor. If a service-linked role does not exist, you must create one before you can use this mode.

    Service-linked Role

    Required

    Generated by default.

    Endpoint

    Required

    Generated by default.

    Data Catalog

    Required

    The DLF data catalog.

    Billing Method

    Required

    Subscription or Pay-as-you-go.

    Default Quota

    Required

    Select an existing quota.

    Description

    Optional

    A custom project description.

Step 7: Analyze data

  1. Use a connection tool to log on to the external project.

  2. List the schemas in the external project.

    -- Enable the schema syntax at the session level.
    SET odps.namespace.schema=true;
    SHOW schemas;
    
    -- The following result is returned.
    ID = 20250919****am4qb
    default
    system
    
    
    OK
  3. List the tables in the schema of the external project.

    USE schema default;
    SHOW tables;
    
    -- The following result is returned.
    ID = 20250919****am4qb
    acs:ram::<uid>:root  emp
    
    
    OK
  4. Read data from the Paimon table in DLF.

    SELECT * FROM emp;
    
    -- The following result is returned.
    +------------+------------+------------+------------+------------+------------+------------+------------+
    | empno      | ename      | job        | mgr        | hiredate   | sal        | comm       | deptno     | 
    +------------+------------+------------+------------+------------+------------+------------+------------+
    | 7521       | WARD       | SALESMAN   | 7698       | 2021-02-22 | 1250       | 500        | 30         | 
    | 7844       | TURNER     | SALESMAN   | 7698       | 2021-09-08 | 1500       | 0          | 30         | 
    | 7876       | ADAMS      | CLERK      | 7788       | 2022-01-12 | 1100       | NULL       | 20         | 
    | 7900       | JAMES      | CLERK      | 7698       | 2021-12-03 | 950        | NULL       | 30         | 
    | 7934       | MILLER     | CLERK      | 7782       | 2022-01-23 | 1300       | NULL       | 10         | 
    | 8005       | JOHN       | MANAGER    | 7839       | 2023-07-12 | 4000       | NULL       | 10         | 
    | 7369       | SMITH      | CLERK      | 7902       | 2020-12-17 | 800        | NULL       | 20         | 
    | 7566       | JONES      | MANAGER    | 7839       | 2021-04-02 | 2975       | NULL       | 20         | 
    | 7654       | MARTIN     | SALESMAN   | 7698       | 2021-09-28 | 1250       | 1400       | 30         | 
    | 7698       | BLAKE      | MANAGER    | 7839       | 2021-05-01 | 2850       | NULL       | 30         | 
    | 7839       | KING       | PRESIDENT  | NULL       | 2021-11-17 | 5000       | NULL       | 10         | 
    | 8002       | DANIEL     | ENGINEER   | 7788       | 2023-04-01 | 3400       | NULL       | 20         | 
    | 8006       | LISA       | DESIGNER   | 7698       | 2023-08-13 | 2200       | NULL       | 30         | 
    | 7499       | ALLEN      | SALESMAN   | 7698       | 2021-02-20 | 1600       | 300        | 30         | 
    | 7782       | CLARK      | MANAGER    | 7839       | 2021-06-09 | 2450       | NULL       | 10         | 
    | 7788       | SCOTT      | ANALYST    | 7566       | 2021-12-09 | 3000       | NULL       | 20         | 
    | 7902       | FORD       | ANALYST    | 7566       | 2021-12-03 | 3000       | NULL       | 20         | 
    | 8001       | DUKE       | ENGINEER   | 7788       | 2023-03-15 | 3500       | NULL       | 20         | 
    | 8003       | SANDRA     | CONSULTANT | 7698       | 2023-05-10 | 2700       | NULL       | 30         | 
    | 8004       | KAREN      | CLERK      | 7902       | 2023-06-11 | 1200       | NULL       | 20         | 
    +------------+------------+------------+------------+------------+------------+------------+------------+
  5. Query the emp table to retrieve the complete information of the employees who have the highest and lowest salaries in each department.

    WITH ranked AS (
        SELECT e.*,
               ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS rn_desc,
               ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal ASC)  AS rn_asc
        FROM emp e
    )
    SELECT *
    FROM ranked
    WHERE rn_desc = 1 
       OR rn_asc  = 1
    ORDER BY deptno, sal DESC;
    
    -- The following result is returned.
    +-------+--------+-----------+------+------------+------+------+--------+------------+------------+
    | empno | ename  | job       | mgr  | hiredate   | sal  | comm | deptno | rn_desc    | rn_asc     |
    +-------+--------+-----------+------+------------+------+------+--------+------------+------------+
    | 7839  | KING   | PRESIDENT | NULL | 2021-11-17 | 5000 | NULL | 10     | 1          | 4          |
    | 7934  | MILLER | CLERK     | 7782 | 2022-01-23 | 1300 | NULL | 10     | 4          | 1          |
    | 8001  | DUKE   | ENGINEER  | 7788 | 2023-03-15 | 3500 | NULL | 20     | 1          | 8          |
    | 7369  | SMITH  | CLERK     | 7902 | 2020-12-17 | 800  | NULL | 20     | 8          | 1          |
    | 7698  | BLAKE  | MANAGER   | 7839 | 2021-05-01 | 2850 | NULL | 30     | 1          | 8          |
    | 7900  | JAMES  | CLERK     | 7698 | 2021-12-03 | 950  | NULL | 30     | 8          | 1          |
    +-------+--------+-----------+------+------------+------+------+--------+------------+------------+

Step 8: Write the analysis results back to DLF

  1. In the external project from the previous step, create a table to store the SQL analysis results.

    CREATE TABLE emp_detail (
        empno    INT,
        ename    VARCHAR(20),
        job      VARCHAR(20),
        mgr      INT,
        hiredate DATE,
        sal      DECIMAL(10,2),
        comm     DECIMAL(10,2),
        deptno   INT
    );
  2. Write the analysis results from Step 7 to the new table.

    WITH ranked AS (
        SELECT e.*,
               ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal DESC) AS rn_desc,
               ROW_NUMBER() OVER (PARTITION BY deptno ORDER BY sal ASC)  AS rn_asc
        FROM emp e
    )
    insert into emp_detail 
    SELECT 
        empno,ename,job,mgr,
        hiredate,sal,comm,deptno
    FROM ranked
    WHERE rn_desc = 1 
       OR rn_asc  = 1
    ORDER BY deptno, sal DESC;
  3. Query the new table.

    SELECT * FROM emp_detail;
    
    -- The following result is returned.
    +------------+------------+------------+------------+------------+------------+------------+------------+
    | empno      | ename      | job        | mgr        | hiredate   | sal        | comm       | deptno     | 
    +------------+------------+------------+------------+------------+------------+------------+------------+
    | 7839       | KING       | PRESIDENT  | NULL       | 2021-11-17 | 5000       | NULL       | 10         | 
    | 7934       | MILLER     | CLERK      | 7782       | 2022-01-23 | 1300       | NULL       | 10         | 
    | 8001       | DUKE       | ENGINEER   | 7788       | 2023-03-15 | 3500       | NULL       | 20         | 
    | 7369       | SMITH      | CLERK      | 7902       | 2020-12-17 | 800        | NULL       | 20         | 
    | 7698       | BLAKE      | MANAGER    | 7839       | 2021-05-01 | 2850       | NULL       | 30         | 
    | 7900       | JAMES      | CLERK      | 7698       | 2021-12-03 | 950        | NULL       | 30         | 
    +------------+------------+------------+------------+------------+------------+------------+------------+
  4. Log on to the Data Lake Formation (DLF) console. In the navigation pane on the left, choose Catalog List. The newly created emp_detail table is displayed.