Use the cluster script feature of E-MapReduce (EMR) to deploy MirrorMaker 2.0 (MM2) in dedicated mode and synchronize data between Kafka clusters.
In this guide, an EMR Dataflow cluster serves as both the destination cluster and the dedicated MirrorMaker cluster. In production, you can run the MirrorMaker cluster on a separate server.
When to use this method
MM2 can run in three ways. Choose the method that fits your scenario before proceeding.
| Method | When to use |
|---|---|
| Distributed Kafka Connect cluster (recommended) | Most production scenarios. Run MM2 connector tasks in an existing distributed Kafka Connect cluster and manage them via the Connect REST API. See Use Kafka MM2 to synchronize data across clusters. |
| Dedicated MirrorMaker cluster (this guide) | When you need a standalone driver process to manage all MM2 tasks, separate from an existing Connect cluster. |
| Single Connect worker | Test and development environments only. Run a MirrorSourceConnector task on a single Connect worker. |
Use cases
MM2 is suitable for the following scenarios:
Remote data synchronization: Replicate data between Kafka clusters in different regions.
Disaster recovery: Build a primary/secondary architecture across data centers with real-time synchronization. If one cluster becomes unavailable, fail over traffic to the other cluster for geo-disaster recovery.
Data migration: Migrate data from an existing cluster to a new one during cloud migrations, hybrid cloud deployments, or cluster upgrades, without interrupting business operations.
Data aggregation: Consolidate data from multiple Kafka sub-clusters into a central Kafka cluster.
Key features
As a data replication tool, Kafka MM2 provides the following features:
Replicates the data and configuration information of topics.
Replicates the offset information of consumer groups and the consumed topics.
Replicates access control lists (ACLs).
Automatically detects new topics and partitions.
Provides Kafka MM2 metrics.
Provides high-availability architectures that are horizontally scalable.
Prerequisites
Before you begin, make sure you have:
Two EMR Dataflow clusters with the Kafka service enabled — one source cluster and one destination cluster. See Create a cluster.
Kafka service version 2.12_2.4.1 or later on both clusters.
An Object Storage Service (OSS) bucket. See Create buckets.
The example in this guide uses source and destination clusters running EMR V3.42.0.
Deploy MM2 on a dedicated cluster
Step 1: Prepare the MM2 configuration file
Create mm2.properties with the following content. Replace src.bootstrap.servers and dest.bootstrap.servers with the bootstrap server addresses of your source and destination clusters. Adjust other parameters to match your business requirements.
# Cluster aliases — use these aliases in all subsequent settings
clusters = src, dest
# Bootstrap server addresses (required)
src.bootstrap.servers = <source-bootstrap-servers>
dest.bootstrap.servers = <destination-bootstrap-servers>
# Replication flow: enable replication from src to dest (required)
src->dest.enabled = true
# Topics to replicate — uses a regex pattern (required)
src->dest.topics = foo-.*
# Consumer groups to replicate
groups=.*
# Exclude internal topics
topics.blacklist="__.*"
# Replication factor for mirror topics
replication.factor=3For the full list of MM2 configuration options, see Configuring geo-replication in the Apache Kafka documentation.
Upload mm2.properties to your OSS bucket.
Step 2: Prepare the deployment script
Create kafka_mm2_deploy.sh with the following content.
#!/bin/bash
SIGNAL=${SIGNAL:-TERM}
PIDS=$(ps ax | grep -i 'org.apache.kafka.connect.mirror.MirrorMaker' | grep java | grep -v grep | awk '{print $1}')
if [ -n "$PIDS" ]; then
echo "stop the exist mirror maker server."
kill -s $SIGNAL $PIDS
fi
KAFKA_CONF=/etc/taihao-apps/kafka-conf/kafka-conf
TAIHAO_EXECUTOR=/usr/local/taihao-executor-all/executor/1.0.1
cd $KAFKA_CONF
if [ -e "./mm2.properties" ]; then
mv mm2.properties mm2.properties.bak
fi
${TAIHAO_EXECUTOR}/ossutil64 cp oss://<yourBucket>/mm2.properties ./ -e <yourEndpoint> -i <yourAccessKeyId> -k <yourAccessKeySecret>
su - kafka <<EOF
exec connect-mirror-maker.sh -daemon $KAFKA_CONF/mm2.properties
exit;
EOFThe script contains two groups of parameters. Update them before uploading.
Script path variables — verify that these paths match the actual locations on your cluster:
| Variable | Default value | Description |
|---|---|---|
KAFKA_CONF | /etc/taihao-apps/kafka-conf/kafka-conf | Directory where Kafka configuration files are stored. |
TAIHAO_EXECUTOR | /usr/local/taihao-executor-all/executor/1.0.1 | Directory where the taihao-executor tools are installed. |
OSS access parameters — replace with your actual values:
| Parameter | Description |
|---|---|
oss://<yourBucket>/mm2.properties | Full OSS path to the mm2.properties file you uploaded in Step 1. |
<yourEndpoint> | OSS service endpoint for your region. |
<yourAccessKeyId> | AccessKey ID of your Alibaba Cloud account. |
<yourAccessKeySecret> | AccessKey secret of your Alibaba Cloud account. |
Upload the updated script to your OSS bucket.
Step 3: Run the script in the EMR console
Run the script as a cluster script task. See Manually run scripts for step-by-step instructions.
Select all broker nodes when running the script.
After the script runs, data is synchronized between Kafka clusters.
What's next
To manage MM2 tasks using the Kafka Connect REST API instead, see Use Kafka MM2 to synchronize data across clusters.
For full MM2 configuration reference, see Configuring geo-replication.