All Products
Search
Document Center

E-MapReduce:Import data from Kafka to JindoFS

Last Updated:Mar 26, 2026

To build log collection, monitoring aggregation, and real-time analysis pipelines on E-MapReduce (EMR), you can import data from Apache Kafka into JindoFileSystem (JindoFS). Three methods are supported: Apache Flume, Kafka API (MapReduce or Spark), and Kafka HDFS Connector.

Choose an import method

MethodBest forConfiguration effort
Apache Flume (recommended)General-purpose log and event ingestionLow — configure a sink in the Flume agent file
Kafka API (MapReduce or Spark)Jobs already using MapReduce or Spark for processingLow — set the output path to a JindoFS directory
Kafka HDFS ConnectorKafka Connect-based pipelinesMedium — configure the connector sink path

Use Apache Flume

Apache Flume moves data into Hadoop Distributed File System (HDFS). Because JindoFS exposes an HDFS-compatible interface, you can direct the Flume HDFS sink to a JindoFS path without modifying the Flume agent.

Set jfs.type to hdfs and jfs.hdfs.path to a directory in JindoFS:

a1.sinks = emr-jfs
...
a1.sinks.emr-jfs.type = hdfs
a1.sinks.emr-jfs.hdfs.path = jfs://emr-jfs/kafka/%{topic}/%y-%m-%d
a1.sinks.emr-jfs.hdfs.rollInterval = 10
a1.sinks.emr-jfs.hdfs.rollSize = 0
a1.sinks.emr-jfs.hdfs.rollCount = 0
a1.sinks.emr-jfs.hdfs.fileType = DataStream

Use a Kafka API

Engines such as MapReduce and Spark call a Kafka API to read from Kafka and write to HDFS. To write to JindoFS instead, reference HDFS in your job and set the export path to a JindoFS directory:

jfs://emr-jfs/<your-output-path>

Replace <your-output-path> with the target directory in JindoFS.

Use Kafka HDFS Connector

The Kafka HDFS Connector exports data from Kafka topics to HDFS. To write to JindoFS, set the sink export path to a JindoFS directory:

jfs://emr-jfs/<your-output-path>

Replace <your-output-path> with the target directory in JindoFS.