Use logstash-input-maxcompute to read data from MaxCompute - Elasticsearch

The logstash-input-maxcompute plug-in reads data from the offline tables of MaxCompute and transfers it to a destination data source.

Important

The plug-in performs a full read every time it runs. Incremental reads are not supported.

Prerequisites

Before you begin, ensure that you have:

Installed the logstash-input-maxcompute plug-in. See Install and remove a plug-in
Activated Alibaba Cloud MaxCompute, created a project, created a table, and imported data into the table. See Prepare and Getting Started

Configure the pipeline

Use a configuration file to define a pipeline that reads from MaxCompute. The following example reads all data from a partitioned MaxCompute table and prints it to stdout for verification:

input {
    maxcompute {
        access_id => "Your accessId"
        access_key => "Your accessKey"
        endpoint => "maxcompute service endpoint"
        project_name => "Your project"
        table_name => "Your table name"
        partition => "pt='p1',dt='d1'"
        thread_num => 1
        dirty_data_file => "/ssd/1/<Logstash cluster ID>/logstash/data/XXXXX.txt"
    }
}

output {
    stdout {
        codec => rubydebug
    }
}

After configuring the parameters, save and deploy the pipeline. For instructions, see Use configuration files to manage pipelines.

Important

By default, Alibaba Cloud Logstash transmits data only within the same virtual private cloud (VPC). If your MaxCompute data source is accessible over the Internet, configure a Network Address Translation (NAT) gateway for your Logstash cluster first. See Configure a NAT gateway for data transmission over the Internet.

Parameters

The following table lists all parameters supported by logstash-input-maxcompute.

Parameter	Type	Required	Description
`endpoint`	string	Yes	The endpoint used to access MaxCompute.
`access_id`	string	Yes	The AccessKey ID of your Alibaba Cloud account.
`access_key`	string	Yes	The AccessKey secret of your Alibaba Cloud account.
`project_name`	string	Yes	The name of the MaxCompute project.
`table_name`	string	Yes	The name of the MaxCompute table.
`partition`	string	Yes	The partition field that the MaxCompute table is partitioned by.
`thread_num`	number	Yes	The number of threads used to read data. Default value: `1`.
`dirty_data_file`	string	Yes	The path of the file that records logs about processing failures.
`retry_interval`	number	No	The interval between retries, in seconds.

`endpoint`

Required
Type: string
Default value: none

The endpoint used to access MaxCompute. For endpoints by region, see Endpoints in different regions (Internet).

`access_id`

Required
Type: string
Default value: none

The AccessKey ID of your Alibaba Cloud account.

`access_key`

Required
Type: string
Default value: none

The AccessKey secret of your Alibaba Cloud account.

`project_name`

Required
Type: string
Default value: none

The name of the MaxCompute project.

`table_name`

Required
Type: string
Default value: none

The name of the MaxCompute table.

`partition`

Required
Type: string
Default value: none

The partition field that the MaxCompute table is partitioned by. Example: sale_date='201911' and region='hangzhou'.

`thread_num`

Required
Type: number
Default value: 1

The number of threads used to read data.

`dirty_data_file`

Required
Type: string
Default value: none

The path of the file that records logs about processing failures. Set the path to /ssd/1/<Logstash cluster ID>/logstash/data/.

`retry_interval`

Optional
Type: number
Default value: none

The interval between retries, in seconds.