-
Why is my ZooKeeper service unstable or restarting unexpectedly?
-
How do I migrate the ZooKeeper data directory without interrupting the service?
Why is my ZooKeeper service unstable or restarting unexpectedly?
The most common cause is too many znodes or snapshots that are too large. ZooKeeper keeps all znodes in memory and synchronizes data between znodes — if either limit is exceeded, memory pressure causes the service to become unstable or crash.
ZooKeeper is a distributed coordination service, not a file system. If your znode count is climbing into the hundreds of thousands, check whether upstream applications are writing to ZooKeeper beyond its intended purpose.
Keep within the following limits:
| Resource | Recommended limit |
|---|---|
| Znode count | Fewer than 100,000 |
| Snapshot size | Smaller than 800 MB per snapshot |
To check the znode count, go to the Monitoring tab on the cluster details page in the E-MapReduce (EMR) console.
To check snapshot sizes:
-
On the Configure tab of the ZooKeeper service page, search for
dataDirto find the data directory path. -
Run the following command to list snapshot files and their sizes:
ls -lrt /mnt/disk1/zookeeper/data/version-2/snapshot*
If either limit is exceeded, check the distribution of znodes, then stop the upstream applications that are writing excessively to ZooKeeper based on the distribution of znodes.
How do I migrate the ZooKeeper data directory without interrupting the service?
If disk space runs out or disk performance degrades, migrate the ZooKeeper data directory to a new path. Process followers first, then the leader — this keeps the ZooKeeper ensemble available throughout the migration.
The following example migrates from /mnt/disk1/zookeeper to /mnt/disk2/zookeeper. In this cluster, master-1-2 is the leader and master-1-1 and master-1-3 are followers.
Step 1: Update the data directory configuration
-
On the Configure tab of the ZooKeeper service page, search for
dataDirand change its value to/mnt/disk2/zookeeper. -
Click Save.
-
In the Save dialog, fill in Execution Reason and click Save.
Step 2: Deploy the updated configuration
-
In the upper-right corner of the Configure tab, click Deploy Client Configuration.
-
In the dialog, fill in Execution Reason and click OK.
-
In the confirmation message, click OK.
Step 3: (Optional) Verify the new data directory
-
Log on to your EMR cluster in SSH mode. For more information, see Log on to a cluster.
-
Run the following command and confirm that
dataDirpoints to/mnt/disk2/zookeeper:cat /etc/emr/zookeeper-conf/zoo.cfg
Step 4: Migrate data on each node
Perform the following on master-1-1 and master-1-3 (followers) first, then on master-1-2 (leader).
For each node:
-
On the Status tab of the ZooKeeper service page, find the node and click Stop in the Actions column. Fill in Execution Reason and click OK, then OK again.
-
Log on to the master node in SSH mode. Run the following command to copy the data directory and set the correct permissions:
sudo rm -rf /mnt/disk2/zookeeper && sudo cp -rf /mnt/disk1/zookeeper /mnt/disk2/zookeeper && sudo chown hadoop:hadoop -R /mnt/disk2/zookeeper -
On the Status tab, find the node and click Start in the Actions column. Fill in Execution Reason and click OK, then OK again.
-
Refresh the page until Health Status shows Healthy for the node before proceeding to the next one.
Migration is complete when all nodes show Health Status as Healthy.