All Products
Search
Document Center

ApsaraMQ for Kafka:Migrate data from Apache Kafka MirrorMaker to Confluent Replicator

Last Updated:Mar 18, 2024

As a stand-alone tool that enables the replication of data between Kafka clusters, Apache Kafka MirrorMake helps to connect Kafka consumers and producers. It reads data from a topic in the source cluster and writes data to a topic with the same name in the destination cluster. Compared with Apache Kafka MirrorMaker, Confluent Replicator provides a more comprehensive solution. You can use Confluent Replicator to replicate the configurations of and data in topics. You can also integrate Confluent Replicator with Kafka Connect and Control Center to improve the availability, scalability, and usability of Confluent Replicator.

Background information

This topic describes how to migrate the data of a data center that is using Apache Kafka MirrorMaker to Confluent Replicator. During the migration, messages are replicated from a specific point in time rather than from the beginning. This way, you can retain the legacy messages that you do not want to migrate. In the examples of this topic, two data centers DC1 and DC2 are involved. Each of the data centers runs a Kafka cluster, and data in a single topic named inventory in DC1 is replicated to a topic with the same name in DC2.

Example 1: Same number of partitions in DC1 and DC2

In this example, the number of partitions in the inventory topic in DC2 is the same as the number of partitions in the inventory topic in DC1 after migration.

Prerequisites

  • Confluent Platform 5.0.0 or later is installed. For more information, see Install Confluent Platform On-Premises.

  • The number of partitions in the inventory topic in DC1 must be the same as the number of partitions in the inventory topic in DC2.

  • The value of the src.consumer.group.id parameter in Confluent Replicator must match the value of the group.id parameter in Apache Kafka MirrorMaker.

Procedure

  1. Run the following command to stop the Apache Kafka MirrorMaker instance that is running in DC1. In the command, <mm pid> is the process ID of Apache Kafka MirrorMaker.

    kill <mm pid>
  2. Configure and start Confluent Replicator. In this example, Confluent Replicator is run as an executable from the command line or a Docker image.

    1. Add the following code to the CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_consumer.properties file. Replace localhost:9082 with the address specified by the bootstrap.servers parameter in the source cluster.

      bootstrap.servers=localhost:9082
      topic.preserve.partitions=true
    2. Add the following code to the CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_producer.properties file. Replace localhost:9092 with the address specified by the bootstrap.servers parameter in the destination cluster.

      bootstrap.servers=localhost:9092
    3. Set the replication factors to 2 or 3.

      echo "confluent.topic.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
      echo "offset.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
      echo "config.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
      echo "status.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
    4. Start Confluent Replicator.

      replicator --cluster.id <new-cluster-id> \
      --producer.config replicator_producer.properties \
      --consumer.config replicator_consumer.properties \
      --replication.config ./etc/kafka-connect-replicator/quickstart-replicator.properties

    Confluent Replicator starts to replicate messages to DC2 based on the offsets committed by Apache Kafka MirrorMaker in DC1.

Example 2: Different numbers of partitions in DC1 and DC2

In this example, the number of partitions in the inventory topic in DC2 is different from the number of partitions in the inventory topic in DC1 after migration.

Prerequisites

  • Confluent Platform 5.0.0 or later is installed. For more information, see Install Confluent Platform On-Premises.

  • The value of the src.consumer.group.id parameter in Confluent Replicator must match the value of the group.id parameter in Apache Kafka MirrorMaker.

Procedure

  1. Run the following command to stop the Apache Kafka MirrorMaker instance that is running in DC1. In the command, <mm pid> is the process ID of Apache Kafka MirrorMaker.

    kill <mm pid>
  2. Configure and start Confluent Replicator. In this example, Confluent Replicator is run as an executable from the command line or a Docker image.

    1. Add the following code to the CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_consumer.properties file. Replace localhost:9082 with the address specified by the bootstrap.servers parameter in the source cluster.

      bootstrap.servers=localhost:9082
      topic.preserve.partitions=false
    2. Add the following code to the CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_producer.properties file. Replace localhost:9092 with the address specified by the bootstrap.servers parameter in the destination cluster.

      bootstrap.servers=localhost:9092
    3. Set the replication factors to 2 or 3.

      echo "confluent.topic.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
      echo "offset.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
      echo "config.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
      echo "status.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties
    4. Start Confluent Replicator.

      replicator --cluster.id <new-cluster-id> \
      --producer.config replicator_producer.properties \
      --consumer.config replicator_consumer.properties \
      --replication.config ./etc/kafka-connect-replicator/quickstart-replicator.properties

    Confluent Replicator starts to replicate messages to DC2 based on the offsets committed by Apache Kafka MirrorMaker in DC1.