This topic describes the usage notes, parameters, and examples for using Kafka Rebalancer. In this topic, E-MapReduce (EMR) Kafka V2.4.1 is used in the example.
Background information
- Uneven message distribution among leader partitions: This leads to unbalanced load among brokers and decreased read/write throughput.
- Unbalanced data distribution among brokers: This results in higher disk usage of some brokers compared with the average disk usage of the cluster, which increases the risk of broker breakdown.
- Unbalanced disk usage in a broker: The usage of some disks is significantly higher than the average disk usage in a broker, which increases the risks of replica offline and even broker breakdown.
- Hot topics: This causes unbalanced load among disks.
If the preceding issues occur, load balancing is required. To perform load balancing, you need to perform operations such as electing new leader partition replicas and reassigning partitions. Kafka provides tools such as kafka-preferred-replica-election.sh and kafka-reassign-partitions.sh for load balancing. However, you must configure these tools before you use them, which increases the O&M workload and difficulties.
The Rebalancer tool provided by EMR Kafka encapsulates tools such as kafka-preferred-replica-election.sh and kafka-reassign-partitions.sh. This reduces the O&M workload and difficulties, and you can still use these tools for O&M tasks.
Usage notes
- When you use Kafka Rebalancer, you must throttle the O&M traffic.
- Kafka Rebalancer generates a reassignment file in the JSON format based on specific configuration items. You must check the generated file to ensure that the reassignment result meets your expectations.
- To copy or move multiple partition replicas, you need to determine whether to use this tool for O&M based on the O&M duration. If the O&M task takes a long time, you can use the kafka-reassign-partitions.sh tool to split the O&M task. This way, you can perform the O&M task in different time periods.
- If you want to monitor the O&M process by using the kafka-reassign-partitions.sh tool, you must manually save the JSON file of the reassignment configuration. This file is used as the input for the verify action of the kafka-reassign-partitions.sh tool.
Features
- preferred-election: This feature allows you to balance the leadership in a topic. You need to set the topics parameter and balance the leadership for the specified topics. For more information, see preferred-election.
This feature encapsulates the kafka-preferred-replica-election.sh tool.
- balance-disks: This feature allows you to balance the distribution of disk partition replicas in a node based on disk usage. For more information, see balance-disks.
- rebalance: This feature allows you to balance the distribution of disk partition replicas among nodes in a cluster based on disk usage. For more information, see rebalance.
- remove-broker-ids: This feature allows you to remove all partition replicas from a broker. After replicas are removed from the broker, you can unpublish the broker. For more information, see remove-broker-ids.
preferred-election
If the leader partition replica is not on the preferred broker, unbalanced load may occur among brokers. In this case, the leadership must be balanced.
Parameters
topics: the topics in which preferred election is performed.
Examples
- Create a test topic.
kafka-topics.sh --create --topic elelction-topic --bootstrap-server core-1-1:9092 --replication-factor 2 --partitions 50 - Trigger preferred election for the test topic.
kafka-rebalancer.sh --zookeeper master-1-1:2181/emr-kafka --preferred-election --bootstrap-server core-1-1:9092 --topics elelction-topic
balance-disks
This feature is used to balance the allocation of partition replicas in a broker. This feature encapsulates the kafka-reassign-partitions.sh tool. Different from the kafka-reassign-partitions.sh tool, Kafka Rebalancer automatically generates a file to allocate partition replicas in a broker based on the disk usage of the broker.
Parameters
| Parameter | Description |
|---|---|
| replica-alter-log-dirs-throttle | Throttles the traffic consumed during the migration of replicas between log directories on a broker. Note To prevent resource competition, you must set this parameter to a proper value when you use this feature. |
| threshold | The difference threshold for the disk usage. Default value: 0.1. If the disk usage difference in a broker is greater than this value, replica migration is performed between disks of the broker. |
Examples
- Use the kafka-rebalancer.sh tool to balance the replicas among disks in a broker.
- Monitor and check the disk balance process. You can use the kafka-configs.sh tool to check the throttling parameter and the kafka-reassign-partitions.sh tool to check the migration process.
rebalance
This feature is used to balance the allocation of partition replicas among brokers. This feature encapsulates the kafka-reassign-partitions.sh tool. Different from the kafka-reassign-partitions.sh tool, Kafka Rebalancer automatically generates a file to allocate partition replicas among brokers based on the disk usage of brokers.
Parameters
| Parameter | Description |
|---|---|
| throttle | Throttles the traffic consumed for partition replica migration during the process of reassignment. Note To prevent resource competition, you must set this parameter to a proper value when you use this feature. |
| threshold | The difference threshold for the disk usage. Default value: 0.1. If the disk usage difference in a broker is greater than this value, a rebalance task is triggered. |
Examples
- Use the kafka-rebalancer.sh tool to balance the allocation of partition replicas among brokers.
- Monitor and check the rebalance process. You can use the kafka-configs.sh tool to check the throttling parameter and the kafka-reassign-partitions.sh tool to check the migration process.
remove-broker-ids
This feature is used to remove all partition replicas on a broker. If you want to unpublish a broker, you can use this feature to move all partition replicas on the broker to other brokers.
In most cases, a partition has three replicas. Therefore, you must retain three brokers in a cluster. We recommend that you do not unpublish brokers in a Kafka cluster that contains at most three brokers.
Parameters
Examples
- Use the kafka-rebalancer.sh tool to remove partition replicas from a broker.
- Monitor and check the remove process. You can use the kafka-configs.sh tool to check the throttling parameter, and the kafka-reassign-partitions.sh tool to check the migration of partition replicas. You can also use the kafka-log-dirs.sh tool to check whether the partition replicas to be removed from the broker are migrated to other brokers.