The logstash-input-oss plug-in is developed based on Alibaba Cloud Message Service (MNS). When an associated object in Object Storage Service (OSS) is updated, MNS sends a notification to the plug-in. After the plug-in receives the notification, the plug-in triggers Alibaba Cloud Logstash to read the latest data from OSS. You can configure event notification rules on the event notification page of OSS. This way, OSS can automatically send a notification to MNS when an associated object in OSS is updated.

Note logstash-input-oss is an open source plug-in maintained by Alibaba Cloud. For more information, see logstash-input-oss.

Precautions

  • After logstash-input-oss receives a notification from MNS, Logstash synchronizes all data in the associated object.
  • If the associated objects are text objects in the .gz or .gzip format, Logstash processes the objects as objects in the .gzip format. Objects in other formats are processed as objects in the text format.
  • Objects are read in the text format. If an object is in a format that cannot be parsed, such as the .jar or .bin format, the object may be read as garbled characters.

Prerequisites

Use logstash-input-oss

Create a pipeline by following the instructions provided in Use configuration files to manage pipelines. When you create the pipeline, configure the pipeline parameters that are described in the table of the Parameters section. After you configure the parameters, save the settings and deploy the pipeline. This way, Logstash can be triggered to obtain data from OSS.

The following sample code provides an example on how to configure a pipeline to obtain data from OSS and write the data to Alibaba Cloud Elasticsearch:
input {
  oss {
    endpoint => "oss-cn-hangzhou-internal.aliyuncs.com"
    bucket => "zl-ossou****"
    access_key_id => "******"
    access_key_secret => "*********" 
    prefix => "file-sample-prefix"
    mns_settings => {
       endpoint => "******.mns.cn-hangzhou-internal.aliyuncs.com"
       queue => "aliyun-es-sample-mns"
    }
    codec => json {
      charset => "UTF-8"
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
    index => "aliyun-es-sample"
    user => "elastic"
    password => "changeme"
  }
}
Notice The MNS endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported.

Parameters

The following table describes the parameters supported by logstash-input-oss.

Parameter Type Required Description
endpoint string Yes The endpoint that is used to access OSS. For more information, see Regions and endpoints.
bucket string Yes The name of the OSS bucket.
access_key_id string Yes The AccessKey ID of your Alibaba Cloud account.
access_key_secret string Yes The AccessKey secret of your Alibaba Cloud account.
prefix string No If you configure this parameter, you must make sure that the value of this parameter matches the prefix of the name of an object or directory in the OSS bucket. The prefix is not a regular expression. You can configure this parameter to enable Logstash to read data from one or more specific directories in the OSS bucket.
additional_oss_settings hash No Additional OSS client configurations. You can configure the following parameters:
  • secure_connection_enabled: specifies whether to enable secure connections.
  • max_connections_to_oss: specifies the maximum number of connections to OSS.
delete boolean No Specifies whether to delete processed objects from the original OSS bucket. Valid values:
  • true
  • false (default value)
backup_to_bucket string No The name of the OSS bucket that stores the backups of processed objects.
backup_to_dir string No The local directory that stores the backups of processed files.
backup_add_prefix string No The prefix of a key after an object is processed. The key is a full path that includes the object name in OSS. If you back up data to the same or another OSS bucket, you can use this parameter to specify a new folder to store backups.
include_object_properties boolean No Specifies whether to include the properties of an OSS object in [@metadata][oss]. The properties refer to last_modified, content_type, and metadata. Valid values:
  • true
  • false

If this parameter is not configured, [@metadata][oss][key] always exists.

exclude_pattern string No The Ruby-based regular expression of the key that you want to exclude from the OSS bucket.
mns_settings hash Yes The configurations of MNS. Valid values:
  • endpoint: the endpoint that is used to access MNS. The endpoint cannot be prefixed with http and must be an internal endpoint. Otherwise, an error is reported.
  • queue: the name of an MNS queue.
  • poll_interval_seconds: the maximum waiting time for a ReceiveMessage request when the queue contains no messages. Default value: 10s.
  • wait_seconds: the maximum polling waiting time for a ReceiveMessage request. Unit: seconds.

For more information about ReceiveMessage, see ReceiveMessage.

FAQ

  • Q: Why is the logstash-input-oss plug-in developed based on MNS?

    A: A mechanism is required to notify clients of OSS object updates. The events about OSS object updates can be seamlessly written to MNS.

  • Q: Why is the ListObjects API operation of OSS not used to obtain updated objects?

    A: OSS consumes more local storage when it records unprocessed and processed objects. When the local storage consumption is high, the performance of ListObjects decreases. Other file storage systems, such as the Amazon S3 open source community, also replaced ListObjects with the message notification mechanism.