This topic provides answers to some frequently asked questions about Flume.

What do I do if the number of logs that are written to Hive is less than the number of logs that are generated?

  • Problem description: The number of logs that are written to Hive by using Flume is less than the number of logs that are generated.
  • Solution: Add the hdfs.batchSize parameter in the EMR console. For more information, see Manage parameters for services. HDFS Sink uses the hdfs.batchSize parameter to specify the number of events that are written to a file before the file is rolled to HDFS. If the hdfs.batchSize parameter is not specified, a file is rolled to HDFS each time 100 events are written to the file. As a result, data is not updated in time.

What do I do if a DeadLock error occurs when I terminate the Flume process?

  • Problem description: When you invoke the exit method to terminate the Flume process, a DeadLock error occasionally occurs.
  • Solution: Run the kill -9 command to forcibly terminate the Flume process.

How do I handle the occasional exception that occurs on File Channel after I run the kill -9 command to forcibly terminate the Flume process?

  • Problem 1
    • Problem description: File Channel is used. After you run the kill -9 command to forcibly terminate the Flume process, a directory lock fails to be obtained. As a result, you cannot restart Flume. The following error message appears:
      Due to java.io.IOException: Cannot lock data/checkpoints/xxx. The directory is already locked.
    • Solution: Delete the in_use.lock file before you restart Flume. We recommend that you run the kill -9 command only when necessary.
  • Problem 2
    • Problem description: File Channel is used. After you run the kill -9 command to forcibly terminate the Flume process, data directories fail to be parsed. As a result, you cannot restart Flume. The following error message appears:
      org.apache.flume.channel.file.CorruptEventException: Could not parse event from data file.
    • Solution: Delete checkpoints and data directories before you restart Flume. We recommend that you run the kill -9 command only when necessary.