Apache Flume is a distributed, reliable, and highly available system. It can be used to collect, aggregate, and move large amounts of log data and store the data in a centralized manner. Various data sources are supported. The core of Flume is agents. Each agent contains one or more sources, channels, and sinks.

In EMR V3.19.0 and later, you can configure and manage Flume agents in the E-MapReduce (EMR) console.

The following figure shows a typical Flume agent topology. flume
Notice You can adjust the topology based on your business requirements. For more information about the configuration of the topology, see Synchronize audit logs to HDFS.
EMR Flume can be used to collect data from data sources such as EMR Kafka clusters and Alibaba Cloud Log Service. You can write the collected data to a persistent storage, such as Hadoop Distributed File System (HDFS), Hive, HBase, or Object Storage Service (OSS). Examples: