The logstash-input-maxcompute plug-in reads data from the offline tables of MaxCompute and transfers it to a destination data source.
The plug-in performs a full read every time it runs. Incremental reads are not supported.
Prerequisites
Before you begin, ensure that you have:
Installed the logstash-input-maxcompute plug-in. See Install and remove a plug-in
Activated Alibaba Cloud MaxCompute, created a project, created a table, and imported data into the table. See Prepare and Getting Started
Configure the pipeline
Use a configuration file to define a pipeline that reads from MaxCompute. The following example reads all data from a partitioned MaxCompute table and prints it to stdout for verification:
input {
maxcompute {
access_id => "Your accessId"
access_key => "Your accessKey"
endpoint => "maxcompute service endpoint"
project_name => "Your project"
table_name => "Your table name"
partition => "pt='p1',dt='d1'"
thread_num => 1
dirty_data_file => "/ssd/1/<Logstash cluster ID>/logstash/data/XXXXX.txt"
}
}
output {
stdout {
codec => rubydebug
}
}After configuring the parameters, save and deploy the pipeline. For instructions, see Use configuration files to manage pipelines.
By default, Alibaba Cloud Logstash transmits data only within the same virtual private cloud (VPC). If your MaxCompute data source is accessible over the Internet, configure a Network Address Translation (NAT) gateway for your Logstash cluster first. See Configure a NAT gateway for data transmission over the Internet.
Parameters
The following table lists all parameters supported by logstash-input-maxcompute.
| Parameter | Type | Required | Description |
|---|---|---|---|
endpoint | string | Yes | The endpoint used to access MaxCompute. |
access_id | string | Yes | The AccessKey ID of your Alibaba Cloud account. |
access_key | string | Yes | The AccessKey secret of your Alibaba Cloud account. |
project_name | string | Yes | The name of the MaxCompute project. |
table_name | string | Yes | The name of the MaxCompute table. |
partition | string | Yes | The partition field that the MaxCompute table is partitioned by. |
thread_num | number | Yes | The number of threads used to read data. Default value: 1. |
dirty_data_file | string | Yes | The path of the file that records logs about processing failures. |
retry_interval | number | No | The interval between retries, in seconds. |
endpoint
Required
Type: string
Default value: none
The endpoint used to access MaxCompute. For endpoints by region, see Endpoints in different regions (Internet).
access_id
Required
Type: string
Default value: none
The AccessKey ID of your Alibaba Cloud account.
access_key
Required
Type: string
Default value: none
The AccessKey secret of your Alibaba Cloud account.
project_name
Required
Type: string
Default value: none
The name of the MaxCompute project.
table_name
Required
Type: string
Default value: none
The name of the MaxCompute table.
partition
Required
Type: string
Default value: none
The partition field that the MaxCompute table is partitioned by. Example: sale_date='201911' and region='hangzhou'.
thread_num
Required
Type: number
Default value:
1
The number of threads used to read data.
dirty_data_file
Required
Type: string
Default value: none
The path of the file that records logs about processing failures. Set the path to /ssd/1/<Logstash cluster ID>/logstash/data/.
retry_interval
Optional
Type: number
Default value: none
The interval between retries, in seconds.