Kafka is a distributed message queue service that features high throughput and high scalability. Kafka is widely used for big data analytics such as log collection, data aggregation, streaming processing, and online and offline analysis. It is important for the big data ecosystem. This topic describes how to synchronize data from a PolarDB for Oracle cluster to a self-managed Kafka cluster by using Data Transmission Service (DTS). The data synchronization feature allows you to extend message processing capabilities.
Prerequisites
- The source PolarDB for Oracle cluster uses the latest version. For more information, see Version Management.
- The tables to be synchronized contain primary keys or UNIQUE NOT NULL indexes.
- The value of the wal_level parameter is set to logical for the source PolarDB for Oracle cluster. This setting ensures that logical decoding is supported in write-ahead logging (WAL). For more information, see Configure cluster parameters.
Precautions
- In this scenario, DTS supports only incremental data synchronization. DTS does not support schema synchronization or full data synchronization.
- A single data synchronization task can synchronize data from only one database. To synchronize data from multiple databases, you must create a data synchronization task for each database.
- To ensure that the delay time of data synchronization is accurate, DTS adds a heartbeat
table named
dts_postgres_heartbeat
to the source database. The following figure shows the schema of the heartbeat table.