logstash-input-maxcompute plug-in - Elasticsearch - Alibaba Cloud Documentation Center

The logstash-input-maxcompute plug-in allows you to read data from the offline tables of MaxCompute.

Prerequisites

The logstash-input-maxcompute plug-in is installed.
For more information, see Install and remove a plug-in.
Alibaba Cloud MaxCompute is activated, a project is created, a table is created for the project, and data is imported to the table.
For more information, see Prepare and Getting Started.

Use logstash-input-maxcompute

After the prerequisites are met, you can create a pipeline by following the instructions provided in Use configuration files to manage pipelines. When you create the pipeline, configure the pipeline parameters based on the descriptions in the table of the Parameters section. After you configure the parameters, save the settings and deploy the pipeline. This way, Logstash can be triggered to read data from MaxCompute and transfer the data to the destination data source.

The following code provides a pipeline configuration example. For more information about the parameters, see Parameters.

input {
    maxcompute {
        access_id => "Your accessId"
        access_key => "Your accessKey"
        endpoint => "maxcompute service endpoint"
        project_name => "Your project"
        table_name => "Your table name"
        partition => "pt='p1',dt='d1'"
        thread_num => 1
        dirty_data_file => "/ssd/1/<Logstash cluster ID>/logstash/data/XXXXX.txt"
    }
}

output {
    stdout {
        codec => rubydebug
    }
}

Important

By default, Alibaba Cloud Logstash supports data transmission only over the same virtual private cloud (VPC). If source data is on the Internet, configure a Network Address Translation (NAT) gateway for your Logstash cluster to enable the cluster to access the Internet. For more information, see Configure a NAT gateway for data transmission over the Internet.
logstash-input-maxcompute fully reads data from MaxCompute.

Parameters

The following table describes the parameters supported by logstash-input-maxcompute.

Parameter	Type	Required	Description
`endpoint`	string	Yes	The endpoint that is used to access MaxCompute. For more information, see Endpoints in different regions (Internet).
`access_id`	string	Yes	The AccessKey ID of your Alibaba Cloud account.
`access_key`	string	Yes	The AccessKey secret of your Alibaba Cloud account.
`project_name`	string	Yes	The name of the MaxCompute project.
`table_name`	string	Yes	The name of the MaxCompute table.
`partition`	string	Yes	The partition field. The MaxCompute table is partitioned by using this field. Example: `sale_date='201911'` and `region='hangzhou'`.
`thread_num`	number	Yes	The number of threads. Default value: 1.
`retry_interval`	number	No	The interval for retries. Unit: seconds.
`dirty_data_file`	string	Yes	The path of the file that records logs about processing failures. Note Set the path to `/ssd/1/<Logstash cluster ID>/logstash/data/`.