Synchronize data from Db2 for LUW to self-managed Kafka - Data Transmission Service

Data Transmission Service (DTS) streams change data from a Db2 for LUW (Linux, UNIX, and Windows) database into a self-managed Kafka cluster using CDC replication. Use this guide to configure the synchronization task from prerequisites to a running task.

Prerequisites

Before you begin, ensure that you have:

A Kafka cluster running version 0.10.1.0 to 2.7.0
Enough free storage on the Kafka cluster to hold all data in the source Db2 for LUW database (required for full data synchronization)
Database administrator permissions on the source Db2 for LUW database
Log archiving enabled on the source Db2 for LUW database — set LOGARCHMETH1 or LOGARCHMETH2 (or both). See logarchmeth1 - Primary log archive method configuration parameter and logarchmeth2 - Secondary log archive method configuration parameter

Limitations

Foreign keys

DTS does not synchronize foreign keys. Cascade and delete operations on the source database are not replicated to the destination.

Source database limits

Limit	Details
Outbound bandwidth	The source server must have sufficient outbound bandwidth. Insufficient bandwidth reduces synchronization speed.
Primary key or unique constraints	Tables to be synchronized must have `PRIMARY KEY` or `UNIQUE` constraints with all fields unique. Without these, the destination may contain duplicate records.
Table count per task	If you select tables as objects and plan to rename tables or columns in the destination, a single task supports up to 5,000 tables. Exceeding this limit causes a request error — split into multiple tasks or synchronize at the database level instead.
Log retention for incremental-only tasks	Retain logs for more than 24 hours. If DTS cannot read the logs, the task may fail or data inconsistency may occur.
Log retention for full + incremental tasks	Retain logs for at least seven days before starting the task. After full synchronization completes, you can reduce the retention period to more than 24 hours. Make sure that you set the retention period based on the preceding requirements. Otherwise, the service reliability or performance stated in the SLA of DTS cannot be guaranteed.

CDC-specific limits

DTS uses Db2 for LUW CDC replication technology for incremental data. This technology has its own restrictions — see General data restrictions for SQL Replication.

Other limits

Schedule synchronization during off-peak hours. Full data synchronization consumes read and write resources on both source and destination databases and may increase server load.
After full synchronization, the destination tablespace is larger than the source due to fragmentation from concurrent INSERT operations.
Write data to the destination only through DTS during synchronization to prevent data inconsistency. After synchronization completes, you can run DDL statements online using Data Management (DMS) — see Perform lock-free DDL operations.
If a primary/secondary switchover occurs on the source while the task is running, the task fails.
If the destination ApsaraMQ for Kafka instance is scaled during synchronization, restart the instance.

Synchronization latency

DTS calculates latency based on the timestamp of the latest synchronized record in the destination versus the current source timestamp. If no DML operations occur on the source for an extended period, the reported latency may be inaccurate. Run a DML operation on the source to refresh the latency value.

If you synchronize an entire database, create a heartbeat table. DTS updates the heartbeat table every second to keep the latency reading accurate.

Billing

Synchronization type	Fee
Schema synchronization and full data synchronization	Free of charge
Incremental data synchronization	Charged — see Billing overview

Supported synchronization topologies

One-way one-to-one synchronization
One-way one-to-many synchronization
One-way cascade synchronization
One-way many-to-one synchronization

For details, see Synchronization topologies.

SQL operations that can be synchronized

Operation type	SQL statements
DML	INSERT, UPDATE, DELETE

Create a synchronization task

Step 1: Open the Data Synchronization Tasks page

Log on to the Data Management (DMS) console.
In the top navigation bar, click Data + AI.
In the left-side navigation pane, choose DTS (DTS) > Data Synchronization.

The navigation path may vary by console mode and layout. See Simple mode and Customize the layout and style of the DMS console. You can also go directly to the Data Synchronization Tasks page.

Step 2: Select the region

On the right side of Data Synchronization Tasks, select the region where your synchronization instance resides.

In the new DTS console, select the region from the top navigation bar.

Step 3: Configure source and destination databases

Click Create Task. In the wizard, configure the following parameters.

Task information

Parameter	Description
Task Name	A name for the DTS task. DTS generates a name automatically. Specify a descriptive name to help identify the task — uniqueness is not required.

Source database

Parameter	Description
Select a DMS database instance	Select an existing database instance, or leave blank and configure manually. If you select an existing instance, DTS auto-fills the remaining parameters.
Database Type	Select DB2 for LUW.
Connection Type	Select the access method based on where the source database is deployed. This example uses Self-managed Database on ECS. If your source is a self-managed database, set up the network environment first — see Preparation overview.
Instance Region	The region where the source Db2 for LUW database resides.
Replicate Data Across Alibaba Cloud Accounts	Whether to synchronize data across Alibaba Cloud accounts. This example uses No.
ECS Instance ID	The ID of the Elastic Compute Service (ECS) instance hosting the source database.
Port Number	The service port of the source Db2 for LUW database. Default: 50000.
Database Name	The name of the source Db2 for LUW database.
Database Account	The username for connecting to the source database. The account requires database administrator permissions.
Database Password	The password for the database account.

Destination database

Parameter	Description
Select a DMS database instance	Select an existing database instance, or leave blank and configure manually.
Database Type	Select Kafka.
Connection Type	Select the access method based on where the Kafka cluster is deployed. This example uses Self-managed Database on ECS. See Preparation overview for network setup requirements.
Instance Region	The region where the destination Kafka cluster resides.
ECS Instance ID	The ID of the ECS instance hosting the Kafka cluster. For a multi-node cluster, select any one node — DTS automatically discovers topic information for all nodes.
Port number	The service port of the Kafka cluster. Default: 9092.
Database Account	The username for connecting to the Kafka cluster. Leave blank if authentication is not enabled.
Database Password	The password for the Kafka account. Leave blank if authentication is not enabled.
Kafka Version	The version of the self-managed Kafka cluster. For version 1.0 or later, select Later Than 1.0.
Encryption	The connection encryption method. Select Non-encrypted or SCRAM-SHA-256 based on your business and security requirements.
Topic	The destination topic. Select from the drop-down list.
Topic That Stores DDL Information	The topic for storing DDL information. If left blank, DDL information is stored in the topic specified by Topic.
Use Kafka Schema Registry	Whether to use Kafka Schema Registry for Avro schema storage and retrieval via a RESTful API. Select No to skip, or Yes and provide the URL or IP address of your Kafka Schema Registry.

Step 4: Test connectivity

Click Test Connectivity and Proceed at the bottom of the page.

DTS automatically adds its server CIDR blocks to the security settings of Alibaba Cloud database instances and ECS-hosted databases. For databases in data centers or on third-party clouds, manually add the DTS server CIDR blocks to the database whitelist — see Add the CIDR blocks of DTS servers.

Warning

Adding DTS CIDR blocks to whitelists or security group rules introduces security exposure. Before proceeding, take preventive measures such as: strengthening username and password security, restricting exposed ports, authenticating API calls, auditing whitelist and security group rules regularly, and removing unauthorized CIDR blocks. For higher security, connect the database to DTS over Express Connect, VPN Gateway, or Smart Access Gateway.

Step 5: Configure objects and advanced settings

Basic settings

Parameter	Description
Synchronization Types	By default, Incremental Data Synchronization is selected. Also select Schema Synchronization and Full Data Synchronization. DTS runs full synchronization first to copy existing data, which serves as the baseline for incremental synchronization.
Processing Mode of Conflicting Tables	How DTS handles destination tables that share names with source tables: Precheck and Report Errors (default) — fails the precheck if identical table names exist; Clear Destination Table — clears data from matching destination tables before synchronization (use with caution); Ignore Errors and Proceed — skips the name conflict check. If you choose this option, data inconsistency may occur: during full synchronization, existing records with matching primary keys are retained; during incremental synchronization, existing records are overwritten. If schemas differ, some columns may not be synchronized or the task may fail. To resolve name conflicts without deleting destination tables, use the object name mapping feature — see Map object names.
Data Format in Kafka	The format for records stored in the destination Kafka topic. Default: DTS Avro. For format details, see Data formats in a message queue.
Policy for Shipping Data to Kafka Partitions	How DTS routes records to Kafka partitions. See Specify the policy for synchronizing data to Kafka partitions.
Capitalization of Object Names in Destination Instance	Controls whether database, table, and column names in the destination are uppercased or lowercased. Default: DTS default policy. See Specify the capitalization of object names in the destination instance.
Source Objects	Select objects from the Source Objects section and click the right-arrow icon to move them to Selected Objects. You can select columns, tables, or databases. Selecting tables or columns excludes views, triggers, and stored procedures.
Selected Objects	To rename a single object in the destination, right-click it in this section — see Map the name of a single object. To rename multiple objects at once, click Batch Edit — see Map multiple object names at a time. To filter rows by SQL conditions, right-click an object and specify WHERE conditions — see Specify filter conditions.

Advanced settings

Parameter	Description
Monitoring and Alerting	Whether to enable alerting for task failures or high synchronization latency. Select No to skip, or Yes and configure the alert threshold and notification settings — see Configure monitoring and alerting when you create a DTS task.
Retry Time for Failed Connections	How long DTS retries failed connections after the task starts. Range: 10–1440 minutes. Default: 720 minutes. We recommend that you set this to more than 30 minutes. If DTS reconnects within this window, the task resumes; otherwise, it fails. If multiple tasks share the same source or destination database, the shortest retry window takes effect. Note that DTS charges for the instance during retry attempts.
Configure ETL	Whether to enable extract, transform, and load (ETL) processing. Select Yes to enter data transformation statements in the code editor — see Configure ETL in a data migration or data synchronization task. Select No to skip. For an ETL overview, see What is ETL?

Step 6: Run the precheck

Click Next: Save Task Settings and Precheck.

To preview the API parameters for this task configuration, hover over Next: Save Task Settings and Precheck and click Preview OpenAPI parameters.

DTS runs a precheck before starting the task. If any item fails:

Click View Details next to the failed item, resolve the issue, and run the precheck again.
For alert items that can be ignored: click Confirm Alert Details, then Ignore in the dialog, click OK, and click Precheck Again. Ignoring alerts may lead to data inconsistency.

Step 7: Purchase the instance

Wait until Success Rate reaches 100%, then click Next: Purchase Instance.

On the purchase page, configure the following:

Parameter	Description
Billing Method	Subscription — pay upfront for a fixed term (1–9 months, or 1, 2, 3, or 5 years). More cost-effective for long-term use. Pay-as-you-go — billed hourly. Suitable for short-term use. Release the instance when no longer needed to stop charges.
Resource Group Settings	The resource group for this instance. Default: default resource group. See What is Resource Management?
Instance Class	The synchronization throughput class. Select based on your data volume and latency requirements. See Instance classes of data synchronization instances.
Subscription Duration	The subscription term. Available only for the Subscription billing method.

Read and select Data Transmission Service (Pay-as-you-go) Service Terms, then click Buy and Start. In the confirmation dialog, click OK.

The task appears in the task list. Monitor its progress there.

Data Transmission Service:Synchronize data from a Db2 for LUW database to a self-managed Kafka cluster