What do I do if files cannot be closed while writing data to HDFS? - E-MapReduce

This topic describes how to troubleshoot the issue that files cannot be closed while writing data to Hadoop Distributed File System (HDFS).

Error message

java.io.IOException: Unable to close file because the last block xxx:xxx does not have enough number of replicas.

Cause

In most cases, this is because data blocks cannot be reported at the earliest opportunity due to the heavy write load of the DataNode.

Solution

We recommend that you refer to the following instructions to resolve the issue:

Note

If you increase the value of the dfs.client.block.write.locateFollowingBlock.retries parameter, the wait time before the system closes the file is prolonged when the node is busy, and data writing is not affected.

View the configurations of the HDFS service
Check the value of the dfs.client.block.write.locateFollowingBlock.retries parameter on the hdfs-site.xml tab. This parameter specifies the number of retries to close the file after data is written to data blocks. By default, the system tries to close the file 5 times within 30 seconds. We recommend that you set this parameter to 8, which indicates that the system tries to close the file 8 times within 2 minutes. You can increase the value of this parameter for a cluster with a high load.
Check whether the cluster has only a small number of DataNodes and a large number of task nodes. If a large number of jobs are submitted concurrently, a large number of JAR files need to be uploaded, which may cause heavy load on the DataNodes. In this case, you can increase the value of the dfs.client.block.write.locateFollowingBlock.retries parameter or increase the number of DataNodes.
Check whether jobs that consume a large amount of resources on the DataNodes exist in the cluster. For example, the checkpoints of Flink create and delete a large number of small files, which causes heavy loads on DataNodes. In such scenarios, you can run Flink on an independent cluster. To do this, the checkpoints use an independent HDFS cluster. You can also use Object Storage Service (OSS) or OSS-HDFS Connector to optimize checkpoints.