All Products
Search
Document Center

E-MapReduce:FAQ

Last Updated:Jan 22, 2024

This topic provides answers to some frequently asked questions about Kafka.

How do I clean up the output logs of a Kafka component?

If the output logs of a Kafka component are excessively large in size and occupy large storage space, you can go to the $LOG_DIR_ROOT directory that stores the output logs and delete the log file from the directory of the Kafka component. The default storage directory is /mnt/disk1/log. You can delete log files of various Kafka components, such as kafka, cruise-control, kafka-schema-registry, and kafka-rest-proxy, based on your business requirements.

How do I clean up the output logs of the Kafka Manager component?

If the output logs of the Kafka Manager component are excessively large in size and occupy large storage space, you can go to the $LOG_DIR_ROOT/kafka-manager directory that stores the output logs and delete the log file from the log directory of the Kafka Manager component. The default storage directory is /mnt/disk1/log/kafka-manager.

Can I stop the Kafka Manager component?

Kafka Manager is the management software for a Kafka cluster. Kafka does not rely on the Kafka Manager component when Kafka provides read and write services externally. If no Kafka management platform is integrated, we recommend that you retain the Kafka Manager component. If you do not need the Kafka Manager component, you can stop the component on the Services tab of the cluster in the E-MapReduce (EMR) console.

How do I fix the error "ERROR: While executing topic command: Replication factor: 1 larger than available brokers: 0"?

Problem description:

  • An error occurs in the Kafka service. The broker process of the cluster exits.

  • The information about the ZooKeeper hosts of the Kafka cluster is invalid.

Solution:

  • Fix the issue based on logs.

  • Change the value of the Cluster Zookeeper Hosts parameter to the value of the kafka.manager.zookeeper.hosts parameter for the Kafka cluster in the EMR console.

How do I fix the error "java.net.BindException: Address already in use (Bind failed)"?

The Java Management Extensions (JMX) port is occupied. Before you run a command, specify a JMX port. Sample code:

JMX_PORT=10101 kafka-topics.sh --bootstrap-server core-1-1:9092 --list

How do I fix the error "current leader's lastest offset xxxx is less than replica's lastest offset xxxxxx"?

If you confirm that all data is consumed or data loss is allowed, you can change the value of the unclean.leader.election.enable parameter of the Kafka broker component to true and restart the Kafka broker component. After the component is restarted, you can change the value of the unclean.leader.election.enable parameter back to false.

What do I do if the disk that stores Kafka data in the log directory is full?

If the disk that stores Kafka data in the log directory is full, the log directory goes offline. You can resolve this issue by using the method provided in Perform O&M operations when the disk space of an EMR Kafka cluster is full.

How do I fix the error "Too many open files"?

Problem description:

An excessively large number of partitions or network connections exist.

Solution:

Modify the ulimit open files configuration item in the /etc/security/limits.conf system configuration file. Change the values of the * soft nofile and * hard nofile parameters that are listed at the end of the configuration file based on your business requirements. Then, restart the abnormal components.

How do I estimate the number of partitions required for a Kafka topic?

The number of partitions required for a Kafka topic varies based on various factors. You can adjust the number of partitions based on your business requirements. First, perform a stress test to obtain the throughput of a single partition per second for the producer of a cluster within the expected latency. Then, estimate the expected business traffic of a Kafka topic. This way, you can estimate the number of partitions required for a single Kafka topic. You can adjust the number of partitions based on the consumption speed of consumers to make the consumption latency meet expectations. In most cases, more partitions indicate a higher parallelism for consumers.