This topic provides answers to some frequently asked questions about Flume.

What do I do if the number of logs that are written to Hive is less than the number of logs that are generated?

  • Problem description: The number of logs that are written to Hive by using Flume is less than the number of logs that are generated.
  • Solution: Add the hdfs.batchSize parameter in the EMR console. For more information, see Add parameters. HDFS Sink uses the hdfs.batchSize parameter to specify the number of events that are written to a file before the file is rolled to HDFS. If the hdfs.batchSize parameter is not specified, a file is rolled to HDFS each time 100 events are written to the file. As a result, data is not updated in time.

What do I do if a DeadLock error occurs when I terminate the Flume process?

  • Problem description: When you invoke the exit method to terminate the Flume process, a DeadLock error occasionally occurs.
  • Solution: Run the kill -9 command to forcibly terminate the Flume process.

How do I handle the occasional exception that occurs on File Channel after I run the kill -9 command to forcibly terminate the Flume process?

  • Problem 1
    • Problem description: File Channel is used. After you run the kill -9 command to forcibly terminate the Flume process, a directory lock fails to be obtained. As a result, you cannot restart Flume. The following error message appears:
      Due to java.io.IOException: Cannot lock data/checkpoints/xxx. The directory is already locked.
    • Solution: Delete the in_use.lock file before you restart Flume. We recommend that you run the kill -9 command only when necessary.
  • Problem 2
    • Problem description: File Channel is used. After you run the kill -9 command to forcibly terminate the Flume process, data directories fail to be parsed. As a result, you cannot restart Flume. The following error message appears:
      org.apache.flume.channel.file.CorruptEventException: Could not parse event from data file.
    • Solution: Delete checkpoints and data directories before you restart Flume. We recommend that you run the kill -9 command only when necessary.