Kafka is a widely used distributed, high-throughput, and scalable message queue. It is commonly used for log collection, monitoring data aggregation, stream data processing, and online and offline analytics, making it an indispensable part of the big data ecosystem. Using Data Transmission Service (DTS), you can synchronize data from PolarDB for MySQL to a self-managed Kafka cluster to extend your message processing capabilities.
Prerequisites
- Your Kafka cluster must be a version from 0.10.1.0 to 2.7.0.
- PolarDB for MySQL must have binlog enabled. For more information, see Enable binary logging.
Notes
If a source table lacks a primary key or unique constraint, duplicate data may appear in the destination database.
Billing
Synchronization type | Pricing |
Schema synchronization and full data synchronization | Free of charge. |
Incremental data synchronization | Charged. For more information, see Billing overview. |
Limitations
- Only table-level data synchronization is supported.
- DTS does not automatically adjust synchronization objects. Note If you rename a source table, DTS stops synchronizing it. To resume synchronization, you must add the new table name as a synchronization object.
Procedure
- Purchase a data synchronization task. For more information, see Purchase a DTS task.
Note During the purchase, set the source instance to PolarDB, the destination instance to Kafka, and the synchronization topology to One-way Synchronization.
-
Log on to the DTS console.
NoteIf you are automatically redirected to the Data Management (DMS) console, you can click the
icon in the lower-right corner and then click
to return to the classic DTS console. -
In the left-side navigation pane, click Data Synchronization.
-
At the top of the Synchronization Tasks page, select the region where your destination instance is located.
-
Find the data synchronization task that you purchased and click Configure Task.
- Configure the source and destination instances.
Category Setting Description N/A Synchronization task name DTS automatically generates a task name. Specify a descriptive name for easy identification. The name does not need to be unique. Source instance information Instance type Set to PolarDB Instance by default and cannot be changed. Instance region The region of the source instance that you selected during the purchase. This setting cannot be changed. PolarDB instance ID Select the ID of the PolarDB for MySQL cluster. Database account Enter the database account for the PolarDB for MySQL cluster. This account must have read permissions on the databases to be synchronized. Database password Enter the password for the database account. Destination instance information Instance type Select an option based on your Kafka cluster's deployment location. This topic uses a self-managed database on an ECS instance as an example. Note If you select a different instance type, you must perform additional preparation steps. For more information, see Preparation overview.Instance region The region of the destination instance that you selected during the purchase. This setting cannot be changed. ECS instance ID Select the ID of the ECS instance where the Kafka cluster is deployed. Database type Select Kafka. Port The service port of the Kafka cluster. The default is 9092. Database account Enter the username for the Kafka cluster. You can leave this blank if authentication is not enabled for the Kafka cluster. Database password Enter the password for the username. You can leave this blank if authentication is not enabled for the Kafka cluster. Topic Click Get Topic List on the right and select a topic from the drop-down list. Kafka version Select the version that matches your destination Kafka cluster. Connection method Select Non-encrypted or SCRAM-SHA-256 based on your business and security requirements. -
In the lower-right corner of the page, click Set Whitelist and Next.
If the source or destination database is an Alibaba Cloud database instance, such as an ApsaraDB RDS for MySQL or ApsaraDB for MongoDB instance, DTS automatically adds the CIDR blocks of DTS servers to the IP address whitelist of the instance. If the source or destination database is a self-managed database hosted on an Elastic Compute Service (ECS) instance, DTS automatically adds the CIDR blocks of DTS servers to the security group rules of the ECS instance, and you must make sure that the ECS instance can access the database. If the self-managed database is hosted on multiple ECS instances, you must manually add the CIDR blocks of DTS servers to the security group rules of each ECS instance. If the source or destination database is a self-managed database that is deployed in a data center or provided by a third-party cloud service provider, you must manually add the CIDR blocks of DTS servers to the IP address whitelist of the database to allow DTS to access the database. For more information, see Whitelist DTS server IP addresses.
WarningAdding the public IP address blocks of the DTS service, either automatically or manually, may pose security risks. Using this product, you acknowledge that you understand and accept the potential security risks and that you must implement basic security measures. These measures include, but are not limited to, strengthening password security, limiting the ports open to each CIDR block, using authentication for internal API calls, and regularly checking and restricting unnecessary CIDR blocks. Alternatively, you can connect through a private network using a leased line, VPN Gateway, or Smart Access Gateway.
- Configure the synchronization objects.
Parameter Description Data Format in Kafka Data synchronized to the Kafka cluster is stored in Avro or Canal JSON format. For more information, see Data formats in a message queue. Policy for shipping data to Kafka partitions Select the policy that meets your business requirements. For a detailed description, see Policies for synchronizing data to Kafka partitions. Synchronization objects In the Source Objects box, select the objects to synchronize (tables are the finest granularity), and then click the
icon to move them to the Selected box.Note DTS automatically maps the table name to the topic name that you selected in Step 6. To change the destination topic for a table, use the object name mapping feature. For more information, see Set object names in the destination instance.Object name mapping Change the names of synchronized objects in the destination instance. For more information, see Map databases, tables, and columns.
Retry time for failed connections If DTS cannot connect to the source or destination instance, it retries for 720 minutes (12 hours) by default. You can also specify a custom retry duration. If DTS reconnects to the source or destination instance within the specified duration, the synchronization task automatically resumes. Otherwise, the task fails.
NoteYou are billed for task run time during connection retries. Customize the retry duration based on your business needs, or release the DTS instance as soon as the source and destination instances are released.
- After you complete the preceding configurations, click Next in the lower-right corner of the page.
- Configure advanced settings for initial synchronization.
Parameter Description Initial synchronization By default, both Initial Schema Synchronization and Initial Full Data Synchronization are selected. Before synchronizing incremental data, DTS synchronizes the schema and existing data of the selected objects to the destination. Filter options By default, Ignore DDL in incremental synchronization phase is selected. This means that DTS does not synchronize DDL operations performed on the source database during incremental data synchronization. -
After completing the preceding configurations, click Precheck and Start in the lower-right corner of the page.
Note-
A precheck runs before the synchronization task starts, and you can only start the task after it passes.
-
If the precheck fails, click the
icon next to the failed item to view the details.-
You can fix the issues based on the cause and run the precheck again.
-
If you do not need to fix the items that triggered warnings, you can click Ignore or Ignore Warnings and Rerun Precheck to skip the warnings and run the precheck again.
-
-
- After the Precheck dialog box shows Precheck Passed, close the Precheck dialog box. The data synchronization task starts.
On the Data Synchronization page, the task list displays key information, including Instance ID/Task Name, Status (for example, Synchronizing), Synchronization Overview (including latency and speed), Billing Method, and Synchronization Topology. The Actions column provides options such as Pause Synchronization, Convert to Subscription, and Upgrade.