This topic describes how to use Flume to consume log data. You can use the aliyun-log-flume plug-in to connect Log Service to Flume and write log data to Log Service or consume log data from Log Service.
Background information
The aliyun-log-flume plug-in connects Log Service to Flume. After Log Service is connected
to Flume, Log Service can connect to other systems such as Hadoop Distributed File
System (HDFS) and Kafka by using Flume. The aliyun-log-flume plug-in provides sinks
and sources to connect Log Service to Flume.
- Sink: reads data from other data sources and writes the data to Log Service.
- Source: consumes log data from Log Service and writes the log data to other systems.
Procedure
Sink
You can configure a sink to write data from other data sources to Log Service by using
Flume. The following modes are supported for parsing:
- SIMPLE: A Flume event is written to Log Service as a field.
- DELIMITED: A Flume event is parsed into fields based on the configured column names and written to Log Service.
The following table describes the parameters of a sink.
For more information about the configuration example of a sink, visit GitHub.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the sink. Default value: com.aliyun.Loghub.flume.sink.LoghubSink. |
endpoint | Yes | The endpoint of the Log Service project. Example: http://cn-qingdao.log.aliyuncs.com . Enter an endpoint based on your business requirements. For more information, see
Endpoints.
|
project | Yes | The name of the project. |
logstore | Yes | The name of the Logstore. |
accessKeyId | Yes | The AccessKey ID provided by Alibaba Cloud. The AccessKey ID is used to identify the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair. |
accessKey | Yes | The AccessKey secret provided by Alibaba Cloud. The AccessKey secret is used to authenticate the key of the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair. |
batchSize | No | The number of data entries that are written to Log Service at a time. Default value: 1000. |
maxBufferSize | No | The maximum number of data entries in a cache queue. Default value: 1000. |
serializer | No | The serialization mode of Flume events. Valid values:
|
columns | No | The columns. If you set the serializer parameter to DELIMITED, you must configure this parameter. Separate multiple columns with commas (,). The columns are sorted in the same order as they are in the data entries. |
separatorChar | No | The delimiter, which must be a single character. If you set the serializer parameter to DELIMITED, you must configure this parameter. By default, commas (,) are used. |
quoteChar | No | The quote. If you set the serializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used. |
escapeChar | No | The escape character. If you set the serializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used. |
useRecordTime | No | Specifies whether to use the value of the timestamp field in the data entries as the log time when data is written to Log Service. Default value: false. This value indicates that the current time is used as the log time. |
Source
You can configure a source to ship data from Log Service to other data sources by
using Flume. The following modes are supported for writing:
- DELIMITED: Log data is written to Flume in delimiter mode.
- JSON: Log data is written to Flume in JSON mode.
The following table describes the parameters of a source.
For more information about the configuration example of a source, visit GitHub.
Parameter | Required | Description |
---|---|---|
type | Yes | The type of the source. Default value: com.aliyun.loghub.flume.source.LoghubSource. |
endpoint | Yes | The endpoint of the Log Service project. Example: http://cn-qingdao.log.aliyuncs.com . Enter an endpoint based on your business requirements. For more information, see
Endpoints.
|
project | Yes | The name of the project. |
logstore | Yes | The name of the Logstore. |
accessKeyId | Yes | The AccessKey ID provided by Alibaba Cloud. The AccessKey ID is used to identify the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair. |
accessKey | Yes | The AccessKey secret provided by Alibaba Cloud. The AccessKey secret is used to authenticate the key of the user. To ensure the security of your account, we recommend that you use the AccessKey pair of a RAM user. For more information about how to obtain an AccessKey pair, see AccessKey pair. |
heartbeatIntervalMs | No | The interval at which the client sends heartbeat messages to Log Service. Default value: 30000. Unit: milliseconds. |
fetchIntervalMs | No | The interval at which data is read from Log Service. Default value: 100. Unit: milliseconds. |
fetchInOrder | No | Specifies whether to consume log data in the order based on which the log data is written to Log Service. Default value: false. |
batchSize | No | The number of data entries that are read at a time. Default value: 100. |
consumerGroup | No | The name of the consumer group that is used to read log data. |
initialPosition | No | The starting point from which data is read. Valid values: begin, end, and timestamp. Default value: begin.
Note If a checkpoint exists on Log Service, the checkpoint is preferentially used.
|
timestamp | No | The UNIX timestamp. If you set the initialPosition parameter to timestamp, you must configure this parameter. |
deserializer | Yes | The deserialization mode of events. Valid values:
|
columns | No | The columns. If you set the deserializer parameter to DELIMITED, you must configure this parameter. Separate multiple columns with commas (,). The columns are sorted in the same order as they are in the data entries. |
separatorChar | No | The delimiter, which must be a single character. If you set the deserializer parameter to DELIMITED, you must configure this parameter. By default, commas (,) are used. |
quoteChar | No | The quote. If you set the deserializer parameter to DELIMITED, you must configure this parameter. By default, commas (,) are used. |
escapeChar | No | The escape character. If you set the deserializer parameter to DELIMITED, you must configure this parameter. By default, double quotation marks (") are used. |
appendTimestamp | No | Specifies whether to append the timestamp specified by timestamp as a field to each log. If you set the deserializer parameter to DELIMITED, you must configure this parameter. Default value: false. |
sourceAsField | No | Specifies whether to add the log source as a field named __source__. If you set the deserializer parameter to JSON, you must configure this parameter. Default value: false. |
tagAsField | No | Specifies whether to add the log tag as a field. The field is named in the format of __tag__:{Name of the tag}. If you set the deserializer parameter to JSON, you must configure this parameter. Default value: false. |
timeAsField | No | Specifies whether to add the log time as a field named __time__. If you set the deserializer parameter to JSON, you must configure this parameter. Default value: false. |
useRecordTime | No | Specifies whether to use the value of the timestamp field in the logs as the log time when log data is read from Log Service. Default value: false. This value indicates that the current time is used as the log time. Default value: false. |