The logstash-input-datahub plug-in allows you to read data from DataHub. This topic describes how to use the logstash-input-datahub plug-in.

Prerequisites

  • The logstash-input-datahub plug-in is installed.

    For more information, see Install and remove a plug-in.

  • DataHub is activated, a project is created, a topic is created for the project, and data is imported to the topic.

    For more information, see Get started with DataHub.

Use logstash-input-datahub

Create a pipeline by following the instructions provided in Use configuration files to manage pipelines. When you create the pipeline, configure the pipeline parameters based on the descriptions that are described in the table of the Parameters section. After you configure the parameters, save the settings and deploy the pipeline. This way, Logstash can be triggered to read data from DataHub and transfer the data to the destination data source.

The following code provides a pipeline configuration example. For more information about the parameters, see Parameters.

input {
    datahub {
        access_id => "Your accessId"
        access_key => "Your accessKey"
        endpoint => "Endpoint"
        project_name => "test_project"
        topic_name => "test_topic"
        interval => 5
        #cursor => {
        #    "0"=>"20000000000000000000000003110091"
        #    "2"=>"20000000000000000000000003110091"
        #    "1"=>"20000000000000000000000003110091"
        #    "4"=>"20000000000000000000000003110091"
        #    "3"=>"20000000000000000000000003110000"
        #}
        shard_ids => []
        pos_file => "/ssd/1/<Logstash cluster ID>/logstash/data/File name"
    }
}
output {
    elasticsearch {
      hosts => ["http://es-cn-mp91cbxsm000c****.elasticsearch.aliyuncs.com:9200"]
      user => "elastic"
      password => "your_password"
      index => "datahubtest"
      document_type => "_doc"
  }
}
Notice By default, Alibaba Cloud Logstash supports data transmission only over the same virtual private cloud (VPC). If source data is on the Internet, configure a Network Address Translation (NAT) gateway for your Logstash cluster to enable the cluster to access the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet.

Parameters

The following table describes the parameters supported by logstash-input-datahub.
Parameter Type Required Description
endpoint string Yes The endpoint that is used to access DataHub. For more information, see Endpoints.
access_id string Yes The AccessKey ID of your Alibaba Cloud account.
access_key string Yes The AccessKey secret of your Alibaba Cloud account.
project_name string Yes The name of the DataHub project.
topic_name string Yes The name of the DataHub topic.
retry_times number No The number of retries allowed. The value -1 indicates no limits. The value 0 indicates that retries are not allowed. A value greater than 0 indicates that the specified number of retries are allowed. Default value: -1.
retry_interval number No The interval for retries. Unit: seconds.
shard_ids array No The shards that you want to consume. This parameter is empty by default. This indicates that all shards are consumed.
cursor string No The start point of consumption. This parameter is empty by default. This indicates that shards are consumed from the start.
pos_file string Yes The checkpoint file. This parameter is required. Checkpoints are preferentially used to resume consumption.
enable_pb boolean No Specifies whether to enable Protocol Buffers (Protobuf) for data transfer. Default value: true. If Protobuf is not supported for data transfer, set the value to false.
compress_method string No The compression algorithm for data transfer. This parameter is empty by default. Valid values: lz4 and deflate.
print_debug_info boolean No Specifies whether to display the debugging information of DataHub. Default value: false. If you set the value to true, a large amount of information is displayed. The information is used only to debug scripts.