This topic provides answers to some frequently asked questions about ZooKeeper.

What do I do if the ZooKeeper service is unstable and unexpectedly restarts?

The ZooKeeper service may be unstable due to various causes. The most common cause is that the number of ZooKeeper nodes (znodes) or the size of a snapshot is excessively large. ZooKeeper uses memory to maintain all znodes and synchronizes data between znodes. If the number of znodes or the size of a snapshot is excessively large, the service becomes unstable. ZooKeeper is a distributed coordination service and cannot be used as a file system. We recommend that you keep the number of znodes less than 100,000 and the size of each snapshot less than 800 MB.
  • To view the number of znodes in a cluster, go to the Monitoring tab of the cluster details page in the E-MapReduce (EMR) console.
  • To view the size of snapshots, perform the following steps:
    1. On the Configure tab of the ZooKeeper service page, search for the dataDir parameter and obtain its value. The value indicates the data directory of ZooKeeper.
    2. Run the following command to view the size of snapshots in the data directory:
      ls -lrt /mnt/disk1/zookeeper/data/version-2/snapshot*

      If the number of znodes or the size of a snapshot is excessively large, check the distribution of znodes. Then, stop upper-layer applications that excessively use ZooKeeper based on the distribution of znodes.

How do I smoothly migrate data from the data directory of ZooKeeper to a new data directory?

If you want to change the data directory of ZooKeeper to a new one due to issues such as insufficient disk space or poor disk performance, perform the following steps on each node in your cluster to achieve smooth data migration without interrupting the ZooKeeper service.
Note In the following example, the data directory of ZooKeeper needs to be changed from /mnt/disk1/zookeeper to /mnt/disk2/zookeeper. In the cluster, the master-1-2 node is the leader, and the master-1-1 and master-1-3 nodes are followers. We recommend that you perform operations on the followers and then the leader during the data migration.
  1. Change the data directory and save the configurations.
    1. On the Configure tab of the ZooKeeper service page, search for the dataDir parameter and change the value of this parameter to /mnt/disk2/zookeeper.
    2. In the lower part of the Configure tab, click Save.
    3. In the Save dialog box, configure the Execution Reason parameter and click Save.
  2. Deploy client configurations.
    1. In the upper-right corner of the Configure tab of the ZooKeeper service page, click Deploy Client Configuration.
    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.
    3. In the Confirm message, click OK.
  3. Optional:Verify that the new data directory is specified.
    1. Log on to your EMR cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to view the value of the dataDir parameter in the zoo.cfg configuration file:
       cat /etc/emr/zookeeper-conf/zoo.cfg

      The returned result indicates that the data directory of ZooKeeper is changed to the new directory.

  4. Stop the master-1-1 node.
    1. On the Status tab of the ZooKeeper service page, find the master-1-1 node and click Stop in the Actions column.
    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.
    3. In the Confirm message, click OK.
  5. Change the data directory.
    1. Log on to the master node of the EMR cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to change the data directory and configure related permissions for the master-1-1 node:
      sudo rm -rf /mnt/disk2/zookeeper && sudo cp -rf /mnt/disk1/zookeeper /mnt/disk2/zookeeper && sudo chown hadoop:hadoop -R /mnt/disk2/zookeeper
  6. Restart the master-1-1 node.
    1. On the Status tab of the ZooKeeper service page, find the master-1-1 node and click Start in the Actions column.
    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.
    3. In the Confirm message, click OK.

      Refresh the page until the value of the Health Status parameter for the master-1-1 node is Healthy.

  7. Log on to the master-1-3 node and repeat Step 4 to Step 6.
  8. Log on to the master-1-2 node and repeat Step 4 to Step 6.
    Data is migrated to the new data directory after the value of the Health Status parameter for all nodes is Healthy.
    Note The master-1-2 node was the original leader. After you click Stop for the node in the Actions column, the node becomes a follower. The master-1-1 or master-1-3 node becomes the leader.