When you collect logs and aggregate monitoring data, you can use Apache Kafka to process offline data and streaming data and analyze data in real time. This topic describes how to import data from Kafka to JindoFileSystem (JindoFS).

Import methods

  • Use Flume

    Apache Flume is a system used for moving data into Hadoop Distributed File System (HDFS). We recommend that you use Flume to import data from Kafka to JindoFS. To implement this feature, set the jfs.type parameter to hdfs and the jfs.hdfs.path parameter to a directory in JindoFS:

    a1.sinks = emr-jfs
    ...
    a1.sinks.emr-jfs.type = hdfs
    a1.sinks.emr-jfs.hdfs.path = jfs://emr-jfs/kafka/%{topic}/%y-%m-%d
    a1.sinks.emr-jfs.hdfs.rollInterval = 10
    a1.sinks.emr-jfs.hdfs.rollSize = 0
    a1.sinks.emr-jfs.hdfs.rollCount = 0
    a1.sinks.emr-jfs.hdfs.fileType = DataStream
  • Call a Kafka API

    Some engines like MapReduce and Spark call a Kafka API to export data from Kafka to HDFS. You only need to reference HDFS and set the export path to a directory in JindoFS.

  • Use Kafka HDFS Connector

    You can also use Kafka HDFS Connector to export data from Kafka to HDFS by setting the sink export path to a directory in JindoFS.