Migrate data from PolarDB for PostgreSQL (Compatible with Oracle) to Kafka - Data Transmission Service

This topic describes how to use Data Transmission Service (DTS) to migrate data from a PolarDB for PostgreSQL (Compatible with Oracle) cluster to a Message Queue for Apache Kafka instance.

Prerequisites

The value of the wal_level parameter is set to logical for the source PolarDB for PostgreSQL (Compatible with Oracle) cluster. This adds the information required for logical decoding to the write-ahead log (WAL). For more information, see Set cluster parameters.
A destination Message Queue for Apache Kafka instance is created. The destination instance must have more available disk space than the disk space used by the source PolarDB for PostgreSQL (Compatible with Oracle) instance.
Note
For information about the supported database versions, see Migration solutions.
A topic is created in the destination Message Queue for Apache Kafka instance to receive the migrated data. For more information, see Step 1: Create a topic.

Precautions

Type	Description
Source database limits	Bandwidth: The source database server must have sufficient egress bandwidth. Otherwise, the data migration speed is affected. Tables to be migrated must have a primary key or a UNIQUE constraint, and the fields in the constraint must be unique. Otherwise, duplicate data may occur in the destination database. If you migrate objects at the table level and need to edit them, such as mapping column names, a single data migration task can migrate a maximum of 1,000 tables. If you exceed this limit, an error is reported when you submit the task. In this case, split the tables into smaller batches and configure multiple tasks, or configure a task to migrate the entire database. If you perform incremental migration, for the write-ahead log (WAL): Enable it. For incremental migration tasks, DTS requires the source database to retain WAL logs for more than 24 hours. For tasks that include both full and incremental migration, DTS requires the source database to retain WAL logs for at least 7 days. After the full migration is complete, you can change the log retention period to more than 24 hours. If the retention period is too short, the DTS task may fail because DTS cannot obtain the required WAL logs. In extreme cases, this can lead to data inconsistency or data loss. Issues caused by a log retention period that is shorter than the DTS requirement are not covered by the DTS Service-level agreement (SLA). Source database operation limits: During the schema migration and full migration phases, do not perform DDL operations that change the database or table structure. Otherwise, the data migration task fails. If you perform only full data migration, do not write new data to the source instance. Otherwise, data inconsistency occurs between the source and destination. To maintain real-time data consistency, select schema migration, full data migration, and incremental data migration. To ensure the migration task runs properly and to prevent logical subscription interruptions caused by a primary/secondary switchover, your PolarDB for PostgreSQL (Compatible with Oracle) instance must support and have Logical Replication Slot Failover enabled. For more information, see Enable Logical Replication Slot Failover. Note If the source PolarDB for PostgreSQL (Compatible with Oracle) cluster does not support the logical replication slot failover feature (for example, when the cluster's Database Engine is Oracle Syntax Compatible 2.0), the migration instance may fail and cannot be recovered when the source database triggers an HA failover. Due to the limits of logical replication in the source database, if a single piece of incremental data to be migrated exceeds 256 MB during the migration, the DTS instance may fail and cannot be recovered. You must reconfigure the DTS instance. If the source database has long-running transactions and the instance includes an incremental migration task, the write-ahead log (WAL) before the transaction commit may accumulate and cannot be cleared. This can lead to insufficient disk space in the source database.
Other limits	A single data migration task can migrate only one database. To migrate multiple databases, configure a data migration task for each database. DTS does not support the migration of TimescaleDB extension tables or tables with cross-schema inheritance. DTS does not support migrating INDEX, PARTITION, VIEW, PROCEDURE, FUNCTION, TRIGGER, and FK objects. If the DTS instance performs an incremental data migration task, you must run the `ALTER TABLE schema.table REPLICA IDENTITY FULL;` command on the tables to be migrated in the source database before you write data to them. This applies to the following two scenarios and ensures data consistency. During the execution of this command, we recommend that you do not perform table lock operations. Otherwise, the tables may be locked. If you skip the relevant check in the precheck, DTS automatically runs this command during the instance initialization. When the instance runs for the first time. When the migration object granularity is Schema, and a new table is created in the schema to be migrated or a table to be migrated is rebuilt using the RENAME command. Note In the command, replace `schema` and `table` with the schema name and table name of the data to be migrated. We recommend that you perform this operation during off-peak hours. DTS creates the following temporary tables in the source database to obtain DDL statements for incremental data, the schemas of incremental tables, and heartbeat information. Do not delete these temporary tables during the migration. Otherwise, the DTS task becomes abnormal. The temporary tables are automatically deleted after the DTS instance is released. `public.dts_pg_class`, `public.dts_pg_attribute`, `public.dts_pg_type`, `public.dts_pg_enum`, `public.dts_postgres_heartbeat`, `public.dts_ddl_command`, `public.dts_args_session`, and `public.aliyun_dts_instance`. To ensure the accuracy of the displayed migration latency, DTS adds a table named `dts_postgres_heartbeat` to the source database. During incremental data migration, DTS creates a replication slot with the prefix `dts_sync_` in the source database to replicate data. Using this replication slot, DTS can obtain incremental logs from the source database within the last 15 minutes. When the data migration fails or the migration instance is released, DTS attempts to automatically clear this replication slot. Note If you change the password of the source database account used by the task or delete the DTS IP address from the whitelist of the source database during data migration, the replication slot cannot be automatically cleared. In this case, you must manually clear the replication slot in the source database to prevent it from accumulating and occupying disk space, which can make the source database unavailable. If a failover occurs in the source database, you must log on to the secondary database to manually clear the slot. During data migration, if the destination Kafka instance is scaled out or scaled in, you must restart the migration instance. Before you migrate data, evaluate the performance of the source and destination databases. We also recommend that you migrate data during off-peak hours. Otherwise, DTS consumes read and write resources on the source and destination databases during full data migration, which may increase the database load. DTS attempts to resume failed migration tasks within seven days. Before you switch your business to the destination instance, make sure to end or release the task, or revoke the write permissions of the account that DTS uses to access the destination instance. This prevents the source data from overwriting the data in the destination instance after the task is automatically resumed. If the task fails, DTS technical support will attempt to recover it within 8 hours. During the recovery process, operations such as restarting the task or adjusting its parameters may be performed. Note When parameters are adjusted, only DTS task parameters are modified. Database parameters remain unchanged.The parameters that may be modified include but are not limited to those described in Modify instance parameters. When migrating partitioned tables, include both the child partitions and the parent table as synchronization objects. Otherwise, data inconsistency may occur for the partitioned table. Note The parent table of a partitioned table in PolarDB for PostgreSQL (Compatible with Oracle) does not directly store data. All data is stored in the child partitions. The sync task must include both the parent table and all its child partitions. Otherwise, data from the child partitions may be missed, which leads to data inconsistency between the source and destination.

Billing

Migration type	Link configuration fees	Data transfer cost
Schema migration and full data migration	Free of charge.	You are charged for data transfer when you migrate data out of Alibaba Cloud over the public network. For more information, see Billing overview.
Incremental data migration	Charged. For more information, see Billing overview.

SQL operations that support incremental migration

Operation type	SQL statement
DML	INSERT, UPDATE, DELETE
DDL	CREATE TABLE, ALTER TABLE, DROP TABLE, RENAME TABLE, TRUNCATE TABLE CREATE VIEW, ALTER VIEW, DROP VIEW CREATE PROCEDURE, ALTER PROCEDURE, DROP PROCEDURE CREATE FUNCTION, DROP FUNCTION CREATE INDEX, DROP INDEX Important Additional information in DDL statements, such as CASCADE or RESTRICT, is not supported. DDL statements in a session where the `SET session_replication_role = replica` command is run are not supported. DDL statements that are run by calling a FUNCTION or other methods are not supported. If multiple SQL statements submitted from the source database at one time contain both DML and DDL statements, the DDL statements are not migrated. If multiple SQL statements submitted from the source database at one time contain DDL statements for objects that are not to be migrated, the DDL statements are not migrated.

Permissions required for database accounts

Database	Required permissions	Account creation and authorization method
PolarDB for PostgreSQL (Compatible with Oracle) cluster	Privileged account	Create and manage a database account

Procedure

Go to the migration task list page of the destination region. You can use one of the following methods.
From the DTS console
1. Log on to the Data Transmission Service (DTS) console.
2. In the navigation pane on the left, click Data Migration.
3. In the upper-left corner of the page, select the region where the migration instance is located.
From the DMS console
Note
The actual operations may vary based on the mode and layout of the DMS console. For more information, see Simple mode console and Customize the layout and style of the DMS console.
1. Log on to the Data Management (DMS) console.
2. In the top navigation bar, choose Data + AI > Data Transmission (DTS) > Data Migration.
3. To the right of Data Migration Tasks, select the region where the migration instance is located.
Click Create Task to go to the task configuration page.

Configure the source and destination databases.

Note

For information about how to obtain the parameters of the destination Message Queue for Apache Kafka instance, see Configure the parameters of a Message Queue for Apache Kafka instance.

Category	Configuration	Description
N/A	Task Name	DTS automatically generates a task name. We recommend that you specify a descriptive name for easy identification. The name does not have to be unique.
Source Database	Select Existing Connection	To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured. Note In the DMS console, this parameter is named Select a DMS database instance.. If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select PolarDB (Compatible with Oracle).
	Access Method	Select Alibaba Cloud Instance.
	Instance Region	Select the region where the source PolarDB for PostgreSQL (Compatible with Oracle) cluster resides.
	Replicate Data Across Alibaba Cloud Accounts	This example migrates data within the same Alibaba Cloud account. Select No.
	Instance ID	Select the ID of the source PolarDB for PostgreSQL (Compatible with Oracle) cluster.
	Database Name	Enter the name of the database that contains the objects to be migrated from the source PolarDB for PostgreSQL (Compatible with Oracle) cluster.
	Database Account	Enter the database account of the source PolarDB for PostgreSQL (Compatible with Oracle) cluster. For information about the required permissions, see Permissions required for database accounts.
	Database Password	Enter the password that corresponds to the database account.
Destination Database	Select Existing Connection	To use a database instance that has been added to the system (created or saved), select the desired database instance from the drop-down list. The database information below will be automatically configured. Note In the DMS console, this parameter is named Select a DMS database instance.. If you have not registered the database instance with the system, or do not need to use a registered instance, manually configure the database information below.
	Database Type	Select Kafka.
	Access Method	Select Express Connect, VPN Gateway, or Smart Access Gateway. Note Here, the Message Queue for Apache Kafka instance is configured as a self-managed Kafka database for the migration instance.
	Instance Region	Select the region where the destination Message Queue for Apache Kafka instance resides.
	Connected VPC	Select the ID of the virtual private cloud (VPC) to which the destination Message Queue for Apache Kafka instance belongs.
	Domain Name or IP	Enter any IP address from the Default Endpoint of the target ApsaraMQ for Kafka instance.
	Port Number	Enter the service port of the target ApsaraMQ for Kafka instance. The default value is 9092.
	Database Account	You do not need to fill in this parameter for this example.
	Database Password	You do not need to fill in this parameter for this example.
	Kafka Version	Select the version of the Kafka instance.
	Encryption	Select Non-encrypted or SCRAM-SHA-256 based on your business and security requirements.
	Topic	Select the topic to receive data from the drop-down list.
	Use Kafka Schema Registry	Kafka Schema Registry is a metadata service layer that provides a RESTful interface for storing and retrieving Avro Schema. No: Do not use Kafka Schema Registry. Yes: Use Kafka Schema Registry. You must enter the URL or IP address of the Avro schema registered in Kafka Schema Registry in the URL or IP Address of Schema Registry text box.

After you complete the configuration, click Test Connectivity and Proceed at the bottom of the page.
Note
- Ensure that the IP address segment of the DTS service is automatically or manually added to the security settings of the source and destination databases to allow access from DTS servers. For more information, see Add DTS server IP addresses to a whitelist.
- If the source or destination database is a self-managed database (the Access Method is not Alibaba Cloud Instance), you must also click Test Connectivity in the CIDR Blocks of DTS Servers dialog box that appears.

Configure the task objects.

On the Configure Objects page, configure the objects to be migrated.

Configuration	Description
Migration Types	If you only need to perform a full migration, select both Schema Migration and Full Data Migration. To perform a zero-downtime migration, select Schema Migration, Full Data Migration, and Incremental Data Migration. Note If the Access Method of the destination Kafka instance is Alibaba Cloud Instance, Schema Migration is not supported. If you do not select Incremental Data Migration, do not write new data to the source instance during data migration to ensure data consistency.
Processing Mode of Conflicting Tables	Precheck and Report Errors: Checks whether tables with the same names exist in the destination database. If no tables with the same names exist, the precheck item is passed. If tables with the same names exist, an error is reported during the precheck phase, and the data migration task does not start. Note If a table in the destination database has the same name but cannot be easily deleted or renamed, you can change the name of the table in the destination database. For more information, see Object name mapping. Ignore Errors and Proceed: Skips the check for tables with the same names. Warning Selecting Ignore Errors and Proceed may cause data inconsistency and business risks. For example: If the table schemas are consistent and a record in the destination database has the same primary key value as a record in the source database: During full migration, DTS keeps the record in the destination cluster. The record from the source database is not migrated to the destination database. During incremental migration, DTS does not keep the record in the destination cluster. The record from the source database overwrites the record in the destination database. If the table schemas are inconsistent, only some columns of data may be migrated, or the migration may fail. Proceed with caution.
Data Format in Kafka	Select the desired data format for storage in the Kafka instance. If you select Canal JSON, see Canal JSON for parameter descriptions and examples. Note Currently, only the China (Qingdao) and China (Beijing) regions support selecting Canal JSON. If you select DTS Avro, you must parse the data based on the DTS Avro schema definition. For more information, see DTS Avro schema definition and DTS Avro deserialization sample code. If you select Shareplex JSON, see Shareplex Json for parameter descriptions and examples.
Kafka Data Compression Format	Select the compression format for Kafka messages based on your requirements. LZ4 (Default): Low compression ratio, high compression speed. GZIP: High compression ratio, low compression speed. Note CPU usage is high. Snappy: Medium compression ratio, medium compression speed.
Policy for Shipping Data to Kafka Partitions	Select the desired policy.
Message acknowledgement mechanism	Select the desired message acknowledgment mechanism.
Topic That Stores DDL Information	Select a topic from the drop-down list to store DDL information. Note If you do not select a topic, the DDL information is stored in the topic that receives data by default.
Capitalization of Object Names in Destination Instance	You can configure the case sensitivity policy for the English names of migrated objects, such as databases, tables, and columns, in the destination instance. By default, DTS default policy is selected. You can also choose to keep it consistent with the default policy of the source or destination database. For more information, see Case sensitivity of object names in the destination database.
Source Objects	In the Source Objects box, click the objects to migrate, and then click to move them to the Selected Objects box. Note You can select tables as the objects to be migrated.
Selected Objects	No extra configuration is needed for this example. You can use the mapping feature to set the topic name, number of topic partitions, and partition key for the source table in the destination Kafka instance. For more information, see Mapping information. Note If you use the object name mapping feature, the migration of other objects that depend on this object may fail. To select the SQL operations for incremental migration, right-click the migration object in the Selected Objects section, and select the desired SQL operations in the dialog box that appears.

Click Next: Advanced Settings to configure advanced parameters.

Configuration	Description
Dedicated Cluster for Task Scheduling	By default, DTS schedules tasks on a shared cluster. You do not need to select one. If you want more stable tasks, you can purchase a dedicated cluster to run DTS migration tasks.
Retry Time for Failed Connections	After the migration task starts, if the connection to the source or destination database fails, DTS reports an error and immediately starts continuous retry attempts. The default retry duration is 720 minutes. You can also customize the retry time within a range of 10 to 1440 minutes. We recommend that you set it to more than 30 minutes. If DTS reconnects to the source and destination databases within the set time, the migration task automatically resumes. Otherwise, the task fails. Note For multiple DTS instances that share the same source or destination, the network retry time is determined by the setting of the last created task. Because you are charged for the task during the connection retry period, we recommend that you customize the retry time based on your business needs, or release the DTS instance as soon as possible after the source and destination database instances are released.
Retry Time for Other Issues	After the migration task starts, if other non-connectivity issues occur in the source or destination database (such as a DDL or DML execution exception), DTS reports an error and immediately starts continuous retry attempts. The default retry duration is 10 minutes. You can also customize the retry time within a range of 1 to 1440 minutes. We recommend that you set it to more than 10 minutes. If the related operations succeed within the set retry time, the migration task automatically resumes. Otherwise, the task fails. Important The value of Retry Time for Other Issues must be less than the value of Retry Time for Failed Connections.
Enable Throttling for Full Data Migration	During the full migration phase, DTS consumes some read and write resources of the source and destination databases, which may increase the database load. As needed, you can choose whether to set speed limits for the full migration task. You can set Queries per second (QPS) to the source database, RPS of Full Data Migration, and Data migration speed for full migration (MB/s) to reduce the pressure on the destination database. Note This configuration item is available only if you select Full Data Migration for Migration Types. You can also adjust the full migration speed after the migration instance is running.
Enable Throttling for Incremental Data Migration	As needed, you can also choose whether to set speed limits for the incremental migration task. You can set RPS of Incremental Data Migration and Data migration speed for incremental migration (MB/s) to reduce the pressure on the destination database. Note This configuration item is available only if you select Incremental Data Migration for Migration Types. You can also adjust the incremental migration speed after the migration instance is running.
Environment Tag	You can select an environment tag to identify the instance if needed. This is not required for this example.
Configure ETL	Choose whether to enable the extract, transform, and load (ETL) feature. For more information, see What is ETL? Valid values: Yes: Enables the ETL feature. Enter data processing statements in the code editor. For more information, see Configure ETL in a data migration or data synchronization task. No: Disables the ETL feature.
Monitoring and Alerting	Select whether to set alerts and receive alert notifications based on your business needs. No: Does not set an alert. Yes: Sets an alert. You must also set the alert threshold and alert notifications. The system sends an alert notification if the migration fails or the latency exceeds the threshold.

Save the task and run a precheck.
- To view the parameters for configuring this instance when you call the API operation, move the pointer over the Next: Save Task Settings and Precheck button and click Preview OpenAPI parameters in the bubble.
- If you do not need to view or have finished viewing the API parameters, click Next: Save Task Settings and Precheck at the bottom of the page.
Note
- Before the migration task starts, a precheck is performed. The task starts only after it passes the precheck.
- If the precheck fails, click View Details next to the failed check item, fix the issue based on the prompt, and then run the precheck again.
- If a warning is reported during the precheck:
  For check items that cannot be ignored, click View Details next to the failed item, fix the issue based on the prompt, and then run the precheck again.
  For check items that can be ignored and do not need to be fixed, you can click Confirm Alert Details, Ignore, OK, and Precheck Again to skip the alert item and run the precheck again. If you choose to shield an alert item, it may cause issues such as data inconsistency and pose risks to your business.

Purchase the instance.

When the Success Rate is 100%, click Next: Purchase Instance.

On the Purchase page, select the link specification for the data migration instance. For more information, see the following table.

Category	Parameter	Description
New Instance Class	Resource Group Settings	Select the resource group to which the instance belongs. The default value is default resource group. For more information, see What is Resource Management?
New Instance Class	Instance Class	DTS provides migration specifications with different performance levels. The link specification affects the migration speed. You can select a specification based on your business scenario. For more information, see Data migration link specifications.

After the configuration is complete, read and select Data Transmission Service (Pay-as-you-go) Service Terms.
Click Buy and Start, and in the OK dialog box that appears, click OK.
You can view the progress of the migration instance on the Data Migration Tasks list page.
Note
- If the migration instance does not include an incremental migration task, it stops automatically. After the instance stops, its Status is Completed.
- If the migration instance includes an incremental migration task, it does not stop automatically, and the incremental migration task continues to run. While the incremental migration task is running normally, the Status of the instance is Running.

Mapping information

In the Selected Objects area, hover the mouse pointer over the destination topic name at the table level.
For the target topic, click Edit.

In the Edit Table dialog box, you can configure the mapping information.