To build log collection, monitoring aggregation, and real-time analysis pipelines on E-MapReduce (EMR), you can import data from Apache Kafka into JindoFileSystem (JindoFS). Three methods are supported: Apache Flume, Kafka API (MapReduce or Spark), and Kafka HDFS Connector.
Choose an import method
| Method | Best for | Configuration effort |
|---|---|---|
| Apache Flume (recommended) | General-purpose log and event ingestion | Low — configure a sink in the Flume agent file |
| Kafka API (MapReduce or Spark) | Jobs already using MapReduce or Spark for processing | Low — set the output path to a JindoFS directory |
| Kafka HDFS Connector | Kafka Connect-based pipelines | Medium — configure the connector sink path |
Use Apache Flume
Apache Flume moves data into Hadoop Distributed File System (HDFS). Because JindoFS exposes an HDFS-compatible interface, you can direct the Flume HDFS sink to a JindoFS path without modifying the Flume agent.
Set jfs.type to hdfs and jfs.hdfs.path to a directory in JindoFS:
a1.sinks = emr-jfs
...
a1.sinks.emr-jfs.type = hdfs
a1.sinks.emr-jfs.hdfs.path = jfs://emr-jfs/kafka/%{topic}/%y-%m-%d
a1.sinks.emr-jfs.hdfs.rollInterval = 10
a1.sinks.emr-jfs.hdfs.rollSize = 0
a1.sinks.emr-jfs.hdfs.rollCount = 0
a1.sinks.emr-jfs.hdfs.fileType = DataStreamUse a Kafka API
Engines such as MapReduce and Spark call a Kafka API to read from Kafka and write to HDFS. To write to JindoFS instead, reference HDFS in your job and set the export path to a JindoFS directory:
jfs://emr-jfs/<your-output-path>Replace <your-output-path> with the target directory in JindoFS.
Use Kafka HDFS Connector
The Kafka HDFS Connector exports data from Kafka topics to HDFS. To write to JindoFS, set the sink export path to a JindoFS directory:
jfs://emr-jfs/<your-output-path>Replace <your-output-path> with the target directory in JindoFS.