Build Paimon Tables with Flink on OSS via DLF Catalog - OpenLake

This topic describes how to use basic features of Apache Paimon in the development console of Realtime Compute for Apache Flink. The basic features allow you to create and delete an Apache Paimon catalog, create and delete an Apache Paimon table, write data to an Apache Paimon table, and update and consume data in an Apache Paimon table.

Prerequisites

If you want to use a RAM user or RAM role to access the development console of Realtime Compute for Apache Flink, make sure that the RAM user or RAM role has the required permissions. For more information, see Permission management.
A workspace is created. For more information, see Create a workspace.

Object Storage Service (OSS) is activated and an OSS bucket whose storage class is Standard is created. For more information, see Get started with the OSS console. OSS is used to store files related to Apache Paimon tables, such as data files and metadata files.
Only Realtime Compute for Apache Flink that uses Ververica Runtime (VVR) 8.0.5 or later supports Apache Paimon tables.

Create a Paimon DLF Catalog

You can create a Paimon Catalog in DLF. For more information, see Quick start with DLF.
1. The DLF Catalog must be in the same region as the Flink workspace. Otherwise, you cannot associate them in the subsequent steps.

You can create a Paimon Catalog in the Realtime Compute for Apache Flink development console.

Note

This operation creates a mapping to your DLF catalog. Creating or deleting the catalog in Flink does not affect actual data in DLF.

Log on to the Realtime Compute for Apache Flink management console.
Click your workspace name to open the Development Console.

UI

In the left navigation menu, click Catalogs.
On the Catalog List page, click Create Catalog.
In the Create Catalog wizard, select Apache Paimon, and then click Next.
Set metastore to DLF. For catalog name, select the DLF catalog to connect.
Click Confirm.

SQL commands

In the Scripts SQL editor, copy and run the following SQL code to register a DLF catalog in Flink.

CREATE CATALOG `flink_catalog_name` 
WITH (
  'type' = 'paimon',
  'metastore' = 'rest',
  'token.provider' = 'dlf',
  'uri' = 'http://cn-hangzhou-vpc.dlf.aliyuncs.com',
  'warehouse' = 'dlf_test'
);

The following table describes the connector options:

Option	Description	Required	Example
`type`	The catalog type. Set this option to `paimon`.	Yes	`paimon`
`metastore`	The catalog metastore. Set this option to `rest`.	Yes	`rest`
`token.provider`	The token provider. Set this option to `dlf`.	Yes	`dlf`
`uri`	The Rest URI for the DLF catalog service. Format: `http://[region-id]-vpc.dlf.aliyuncs.com`. See Region ID in Endpoints.	Yes	http://ap-southeast-1-vpc.dlf.aliyuncs.com
`warehouse`	The name of the DLF paimon catalog.	Yes	`dlf_test`

Step 2: Create an Apache Paimon table

On the Scripts tab, enter the following code in the script editor to create an Apache Paimon database named my_db and an Apache Paimon table named my_tbl:
```
CREATE DATABASE `my-catalog`.`my_db`;
CREATE TABLE `my-catalog`.`my_db`.`my_tbl` (
  dt STRING,
  id BIGINT,
  content STRING,
  PRIMARY KEY (dt, id) NOT ENFORCED
) PARTITIONED BY (dt) WITH (
  'changelog-producer' = 'lookup'  
);
```
Note
In this example, the changelog-producer parameter is set to lookup in the WITH clause to use the lookup policy to generate change logs. This way, data can be consumed from the Apache Paimon table in streaming mode. For more information about change log generation, see Change data generation mechanism.
Select the code for creating the Apache Paimon database and the Apache Paimon table, and click Run on the left side of the script editor.
If the The following statement has been executed successfully! message is returned, the Apache Paimon database named my_db and the Apache Paimon table named my_tbl are created.

Step 3: Write data to the Apache Paimon table

On the Drafts tab of the Development > ETL page, click New. On the SQL Scripts tab of the New Draft dialog box, click Blank Stream Draft. For more information about how to develop an SQL draft, see Job development map. Copy the following INSERT statement to the SQL editor:

-- The Apache Paimon result table commits data only after each checkpointing is complete. 
-- In this example, the checkpointing interval is reduced to 10s to help you quickly obtain the results. 
-- In the production environment, the checkpointing interval and the minimal pause between checkpointing attempts vary based on your business requirements for latency. In most cases, they are set to 1 to 10 minutes. 
SET 'execution.checkpointing.interval'='10s';
INSERT INTO `my-catalog`.`my_db`.`my_tbl` VALUES ('20240108',1,'apple'), ('20240108',2,'banana'), ('20240109',1,'cat'), ('20240109',2,'dog');

In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.
On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.
If the deployment status changes to FINISHED, data is written to the deployment.

Step 4: Consume data from the Apache Paimon table in streaming mode

Create a blank streaming draft, and copy the following code to the SQL editor. The code uses the Print connector to export all data from the my_tbl table to logs.

CREATE TEMPORARY TABLE Print (
  dt STRING,
  id BIGINT,
  content STRING
) WITH (
  'connector' = 'print'
);
INSERT INTO Print SELECT * FROM `my-catalog`.`my_db`.`my_tbl`;

In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.
On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.
On the Deployments page, view the computing result.
1. In the left-side navigation pane, click O&M > Deployments. On the Deployments page, click the name of the deployment that you want to manage.
2. On the Logs tab of the Logs tab, click the value in the Path, ID column on the Running Task Managers tab.
3. Click the Stdout tab to view the consumed Apache Paimon data.

Step 5: Update data in the Apache Paimon table

Create a blank streaming draft, and copy the following code to the SQL editor:

SET 'execution.checkpointing.interval' = '10s';
INSERT INTO `my-catalog`.`my_db`.`my_tbl` VALUES ('20240108', 1, 'hello'), ('20240109', 2, 'world');

In the upper-right corner of the SQL editor page, click Deploy. In the Deploy draft dialog box, configure the parameters and click Confirm.
On the O&M > Deployments page, find the desired deployment, and click Start in the Actions column. In the Start Job panel, select Initial Mode, and click Start.
If the deployment status changes to FINISHED, data is written to the Apache Paimon table.
Go to the Stdout tab from the Deployments page as described in Step 4, and view the data that is updated in the Apache Paimon table.

(Optional) Step 6: Cancel the deployment in which data is consumed in streaming mode and clear the resources

After the test is complete, you can perform the following steps to cancel the deployment in which data is consumed in streaming mode and clear the resources:

On the O&M > Deployments page, find the deployment that you want to cancel and click Cancel in the Actions column.
On the SQL Editor page, click the Scripts tab. In the SQL editor on the Scripts tab, enter the following code to delete the Apache Paimon data files and the Apache Paimon catalog:
```
DROP DATABASE 'my-catalog'.'my_db' CASCADE; -- Delete all data files of the Apache Paimon database stored in OSS. 
DROP CATALOG 'my-catalog'; -- Delete the Apache Paimon catalog from the metadata in the development console of Realtime Compute for Apache Flink. Data files stored in OSS are not deleted.
```
If the The following statement has been executed successfully! message is returned, the Apache Paimon data files and the Apache Paimon catalog are deleted.

References

For more information about how to write data to or consume data from an Apache Paimon table, see Write data to or consume data from an Apache Paimon table.
For more information about how to modify the schema of an Apache Paimon table, such as adding a column and changing the data type of a column, and how to temporarily modify the parameters of an Apache Paimon table, see Modify a table schema.
For more information about how to optimize the Apache Paimon primary key tables and Append Scalable tables in different scenarios, see Optimize performance of Apache Paimon tables.
For more information about how to resolve issues related to Apache Paimon, see Connectors.