This topic describes how to adjust the settings of related parameters to optimize the performance of the Hadoop Distributed File System (HDFS) balancer in E-MapReduce.

Question:

How does the HDFS balancer work? How can I optimize the performance of the HDFS balancer?

Answer:

HDFS provides a balancer utility that analyzes block placement and balances data across the DataNodes. Besides, HDFS offers the data locality feature for improving the cluster balancing performance. For example, after you add a large amount of empty DataNodes to the Hadoop cluster, you can balance data in time by moving computation close to data. This enhances the performance of the Hadoop cluster. You can follow these steps to run the HDFS balancer:

  1. Log on to a DataNode in the target cluster.

  2. Run the following commands to switch to the HDFS user and run the HDFS balancer:

    su hdfs
    /usr/lib/hadoop-current/sbin/start-balancer.sh -threshold 10
  3. Run either of the following commands to check whether the HDFS balancer is running:

    less /var/log/hadoop-hdfs/hadoop-hdfs-balancer-emr-header-xx.cluster-xxx.log

    or

    tailf /var/log/hadoop-hdfs/hadoop-hdfs-balancer-emr-header-xx.cluster-xxx.log
    Note If the command output includes Successfully, the HDFS balancer is running.

    The following table describes the HDFS balancer parameters.

    Parameter Description
    threshold

    The maximum difference allowed between the storage utilization of a DataNode and the storage utilization of its cluster. The default value is 10, indicating that the storage utilization of each DataNode differs by no more than plus or minus 10% of the average storage utilization of the cluster.

    Decrease the value when the storage utilization of the cluster is high, and increase the value when the storage utilization of the cluster is low. This improves the cluster performance. After you add a large number of DataNodes to the cluster, you can specify a high threshold first. In this way, the HDFS balancer efficiently moves data from overutilized DataNodes to underutilized DataNodes. After the HDFS balancer runs for a while, decrease the threshold to involve more DataNodes in load balancing. This increases the number of concurrent threads and guarantees efficient load balancing.

    dfs.datanode.balance.max.concurrent.moves

    The maximum number of concurrent threads on a DataNode used by the HDFS balancer for moving blocks. The default value is 5.

    Generally, set the value based on the number of disks for the DataNode. We recommend that you set the value to quadruple of the disk quantity on the DataNode side, and set the value to the disk quantity on the HDFS balancer side. In this way, the HDFS balancer can schedule the concurrent threads within the limit.

    Assume that a DataNode has 28 disks. Set the value of this parameter to 28 on the HDFS balancer side, and set the value of this parameter to 112 (28 x 4) on the DataNode side. Adjust the value based on the cluster load. Increase the value when the cluster load is low, and decrease the value when the cluster load is high. If the maximum numbers of concurrent threads set on the DataNode and HDFS balancer sides are different, the smaller one prevails.
    Note After you set this parameter for a DataNode, restart the DataNode for the parameter setting to take effect.
    dfs.balancer.dispatcherThreads

    The number of dispatcher threads used by the HDFS balancer to decide which blocks to move. Before the HDFS balancer moves a certain amount of data between two DataNodes, it repeatedly gets block lists for moving blocks until the required amount of data is scheduled.

    Note The default value is 200.
    dfs.balancer.rpc.per.sec

    The number of remote procedure calls (RPCs) sent by dispatcher threads per second. The default value is 20.

    Before the HDFS balancer moves a certain amount of data between two DataNodes, it uses dispatcher threads to repeatedly send the getBlocks() RPC to the NameNode. This results in a heavy load on the NameNode. To avoid this issue and balance the cluster load, we recommend that you set this parameter to limit the number of RPCs sent per second. For example, you can set this parameter to 10 or 5, which slightly affects block moves in the cluster.

    dfs.balancer.getBlocks.size

    The total data size of the blocks moved each time. Before the HDFS balancer moves a certain amount of data between two DataNodes, it repeatedly gets block lists for moving blocks until the required amount of data is scheduled. By default, the size of blocks in each block list is 2 GB. When the NameNode receives a getBlocks() RPC, the NameNode is locked. If an RPC queries a large amount of blocks, the NameNode is locked for a long time, which slows down data writing. To avoid this issue, we recommend that you set this parameter based on the NameNode load.

    dfs.balancer.moverThreads

    The number of threads used by the HDFS balancer to move blocks. The default value is 1000.

    Each block move requires a thread. This parameter limits the number of total concurrent moves for balancing in the entire cluster.

    dfs.namenode.balancer.request.standby

    Specifies whether the HDFS balancer sends RPCs to query the standby NameNode for blocks to be moved. The default value is false.

    When the NameNode receives a getBlocks() RPC, the NameNode is locked. If an RPC queries a large amount of blocks, the NameNode is locked for a long time, which slows down data writing. If you set high-availability clusters, the HDFS balancer only sends RPCs to the standby NameNode.

    dfs.balancer.getBlocks.min-block-size

    The minimum size of blocks to be queried by the getBlocks() RPC. The default value is 10, in MB. After you set this parameter, the getBlocks() RPC skips blocks smaller than the minimum size. This improves the query efficiency.

    dfs.balancer.max-iteration-time

    The maximum duration of each iteration for moving blocks between two DataNodes. The default value is 1200000, in milliseconds.

    After the duration of an iteration exceeds this limit, the HDFS balancer enters the next iteration.

    dfs.balancer.block-move.timeout

    The timeout duration for each iteration. The default value is 0, in milliseconds, indicating that the iteration does not time out.

    When the HDFS balancer moves blocks, an iteration may last for a long time because some block moves are not complete. You can set this parameter to avoid such issue.

    The following table describes the DataNode parameters.

    Parameter Description
    dfs.datanode.balance.bandwidthPerSec

    The maximum number of bytes per second that each DataNode can use to balance the cluster. The default bandwidth is 1 Mbit/s.

    We recommend that you set this parameter to a value greater than 100 Mbit/s so that the cluster can be balanced quickly. You can also set this parameter based on the cluster load. Alternatively, you can set the dfsadmin -setBalancerBandwidth parameter to specify the bandwidth. Modifying this configuration does not require restarting DataNodes.

    For example, you can increase the bandwidth when the cluster load is low, and decrease the bandwidth when the cluster load is high.

    dfs.datanode.balance.max.concurrent.moves

    The maximum number of concurrent threads on a DataNode used by the HDFS balancer for moving blocks.