Stream Kafka Data into JindoFS with Flume & Kafka HDFS Connector - E-MapReduce

To build log collection, monitoring aggregation, and real-time analysis pipelines on E-MapReduce (EMR), you can import data from Apache Kafka into JindoFileSystem (JindoFS). Three methods are supported: Apache Flume, Kafka API (MapReduce or Spark), and Kafka HDFS Connector.

Choose an import method

Method	Best for	Configuration effort
Apache Flume (recommended)	General-purpose log and event ingestion	Low — configure a sink in the Flume agent file
Kafka API (MapReduce or Spark)	Jobs already using MapReduce or Spark for processing	Low — set the output path to a JindoFS directory
Kafka HDFS Connector	Kafka Connect-based pipelines	Medium — configure the connector sink path

Use Apache Flume

Apache Flume moves data into Hadoop Distributed File System (HDFS). Because JindoFS exposes an HDFS-compatible interface, you can direct the Flume HDFS sink to a JindoFS path without modifying the Flume agent.

Set jfs.type to hdfs and jfs.hdfs.path to a directory in JindoFS:

a1.sinks = emr-jfs
...
a1.sinks.emr-jfs.type = hdfs
a1.sinks.emr-jfs.hdfs.path = jfs://emr-jfs/kafka/%{topic}/%y-%m-%d
a1.sinks.emr-jfs.hdfs.rollInterval = 10
a1.sinks.emr-jfs.hdfs.rollSize = 0
a1.sinks.emr-jfs.hdfs.rollCount = 0
a1.sinks.emr-jfs.hdfs.fileType = DataStream

Use a Kafka API

Engines such as MapReduce and Spark call a Kafka API to read from Kafka and write to HDFS. To write to JindoFS instead, reference HDFS in your job and set the export path to a JindoFS directory:

jfs://emr-jfs/<your-output-path>

Replace <your-output-path> with the target directory in JindoFS.

Use Kafka HDFS Connector

The Kafka HDFS Connector exports data from Kafka topics to HDFS. To write to JindoFS, set the sink export path to a JindoFS directory:

jfs://emr-jfs/<your-output-path>

Replace <your-output-path> with the target directory in JindoFS.