This topic describes how to use Flume to consume log data. You can use the aliyun-log-flume plug-in to connect LogHub of Log Service to Flume, and then write and consume log data.

After you connect LogHub to Flume, you can connect Log Service to other data systems through Flume, such as Hadoop Distributed File System (HDFS) and Kafka. Flume supports plug-ins for data systems such as HDFS, Kafka, Hive, HBase, and Elasticsearch. In the Flume community, you can also find plug-ins that can connect Flume to common data sources. The aliyun-log-flume plug-in provides the LogHub Sink and Source plug-ins to connect LogHub with Flume.
  • Sink: uses Flume to read data from other data sources and then write data to LogHub.
  • Source: uses Flume to consume LogHub data and then write data to other systems.
For more information, visit GitHub.

LogHub Sink

You can use the LogHub Sink plug-in to transmit data from other data sources to LogHub through Flume. Data can be parsed into the following two formats:
  • SIMPLE: writes a Flume event to LogHub as a field.
  • DELIMITED: separates Flume events with delimiters, parses an event into fields based on the configured column names, and then writes the fields to LogHub.
The following table describes the relevant parameters.
Parameter Required Description
type Yes Set this parameter to com.aliyun.loghub.flume.sink.LoghubSink.
endpoint Yes The endpoint of Log Service.
project Yes The name of the project.
logstore Yes The name of the Logstore.
accessKeyId Yes The AccessKey ID of the Alibaba Cloud account.
accessKey Yes The AccessKey secret of the Alibaba Cloud account.
batchSize No The number of data entries that are written to LogHub at a time. Default value: 1000.
maxBufferSize No The size of the cache queue. Default value: 1000.
serializer No The serialization format of the event. Valid values:
  • DELIMITED: Data is parsed into the DELIMITED format. If you set this parameter to DELIMITED, you must set the columns parameter.
  • SIMPLE: Data is parsed into the SIMPLE format. This is the default value.
  • Custom serializer: Data is parsed into a custom serializer format. If you set this parameter to a custom serializer, you must enter a complete column name.
columns No The configured column names. You must set this parameter if you set the serializer parameter to DELIMITED. Separate multiple columns with commas (,). Ensure that the columns are sorted in the same order as those of the log data.
separatorChar No The delimiter, which must be a single character. You can set this parameter if you set the serializer parameter to DELIMITED. Default value: ,.
quoteChar No The quote character. You can set this parameter if you set the serializer parameter to DELIMITED. Default value: ".
escapeChar No The escape character. You can set this parameter if you set the serializer parameter to DELIMITED. Default value: ".
useRecordTime No Specifies whether to use the value of the timestamp field as the log time. The value false indicates that the current time is used. Default value: false.

LogHub Source

You can use the LogHub Source plug-in to transmit data from LogHub to other data systems through Flume. Data can be output in the following two formats:
  • DELIMITED: writes data to Flume as delimiter logs.
  • JSON: writes data to Flume as JSON logs.
The following table describes the relevant parameters.
Parameter Required Description
type Yes Set this parameter to com.aliyun.loghub.flume.source.LoghubSource.
endpoint Yes The endpoint of Log Service.
project Yes The name of the project.
logstore Yes The name of the Logstore.
accessKeyId Yes The AccessKey ID of the Alibaba Cloud account.
accessKey Yes The AccessKey secret of the Alibaba Cloud account.
heartbeatIntervalMs No The heartbeat interval between the Flume client and LogHub. Unit: milliseconds. Default value: 30000.
fetchIntervalMs No The interval for pulling data from LogHub. Unit: milliseconds. Default value: 100.
fetchInOrder No Specifies whether to consume log data in order. Default value: false.
batchSize No The number of data entries that are read at a time. Default value: 100.
consumerGroup No The name of the consumer group to be read. The name is randomly generated.
initialPosition No The start point from which data is read. Valid values: begin, end, and timestamp. Default value: begin.
Note If a checkpoint exists on the server, the checkpoint is used.
TimeStamp No The Unix timestamp. You must set this parameter if you set the initialPosition parameter to timestamp.
deserializer Yes The deserialization format of the event. Valid values:
  • DELIMITED: Data is parsed into the DELIMITED format. This is the default value. If you set this parameter to DELIMITED, you must set thecolumns parameter.
  • JSON: Data is parsed into the JSON format.
  • Custom serializer: Data is parsed into a custom serializer format. If you set this parameter to a custom serializer, you must enter a complete column name.
columns No The configured column names. You must set this parameter if you set the deserializer parameter to DELIMITED. Separate multiple columns with commas (,). Ensure that the columns are sorted in the same order as those of the log data.
separatorChar No The delimiter, which must be a single character. You can set this parameter if you set the deserializer parameter to DELIMITED. Default value: ,.
quoteChar No The quote character. You can set this parameter if you set the deserializer parameter to DELIMITED. Default value: ".
escapeChar No The escape character. You can set this parameter if you set the deserializer parameter to DELIMITED. Default value: ".
appendTimestamp No Specifies whether to add the timestamp as a field to the end of each row. You can set this parameter if you set the deserializer parameter to DELIMITED. Default value: false.
sourceAsField No Specifies whether to add the log source as a field named __source__. You can set this parameter if you set the deserializer parameter to JSON. Default value: false.
tagAsField No Specifies whether to add the log tags as a field with the field name __tag__: {tag names}. You can set this parameter if you set the deserializer parameter to JSON. Default value: false.
timeAsField No Specifies whether to add the log time as a field named __time__. You can set this parameter if you set the deserializer parameter to JSON. Default value: false.
useRecordTime No Specifies whether to use the log time. The value false indicates that the current time is used. Default value: false.