logstash-input-sls is a built-in input plug-in for Alibaba Cloud Logstash that pulls log data from Simple Log Service (SLS). It is open source and maintained by Alibaba Cloud.
The plug-in handles distributed consumption, checkpointing, and shard allocation automatically, so you can focus on building the pipeline configuration rather than managing the consumer infrastructure.
Key capabilities
Distributed consumption: Deploy one pipeline per server across multiple servers and have them consume the same Logstore in parallel. All servers share the same
consumer_groupandconsumer_name, withconsumer_name_with_ipset totrueto make each consumer uniquely identifiable.High throughput: A single-core CPU can sustain 20 MB/s using the Java consumer group implementation.
Checkpoint-based reliability: The plug-in saves the consumption progress on each server. If a server restarts after a failure, consumption resumes from the last checkpoint.
Automatic shard rebalancing: Shards are redistributed across active consumers whenever a consumer joins or leaves the group.
Prerequisites
Before you begin, ensure that you have:
The logstash-input-sls plug-in installed. See Install and remove a plug-in.
An SLS project and Logstore with data collected. See Quick start.
(If using a RAM user) Consumer group permissions granted to that user. See Use consumer groups to consume data.
Configure the pipeline
Create a pipeline configuration file following the instructions in Use configuration files to manage pipelines. After saving and deploying the pipeline, Alibaba Cloud Logstash starts retrieving data from SLS.
The following example pulls log data from an SLS Logstore and writes it to Alibaba Cloud Elasticsearch:
input {
logservice{
endpoint => "your project endpoint"
access_id => "your access id"
access_key => "your access key"
project => "your project name"
logstore => "your logstore name"
consumer_group => "consumer group name"
consumer_name => "consumer name"
position => "end"
checkpoint_second => 30
include_meta => true
consumer_name_with_ip => true
}
}
output {
elasticsearch {
hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
index => "<your_index>"
user => "elastic"
password => "changeme"
}
}Parameters
Parameter | Type | Required | Default | Description |
| String | Yes | — | The VPC endpoint of the SLS project. See Internal Simple Log Service endpoints. |
| String | Yes | — | The AccessKey ID with consumer group permissions. See Use consumer groups to consume data. |
| String | Yes | — | The AccessKey secret with consumer group permissions. See Use consumer groups to consume data. |
| String | Yes | — | The name of the SLS project. |
| String | Yes | — | The name of the Logstore. |
| String | Yes | — | The consumer group name. Can be customized. |
| String | Yes | — | The consumer name. Must be unique within the consumer group; duplicate names cause undefined behavior. |
| String | Yes | — | The start position for consumption. Valid values: |
| Number | No | 30 | The checkpoint interval in seconds. Recommended range: 10–60. Minimum: 10. |
| Boolean | No | true | Whether to include log metadata (source, time, tag, and topic) in the input. |
| Boolean | No | true | Whether to append the server's IP address to the consumer name. Set to |
Best practices
Distributed consumption
Deploy exactly one pipeline with logstash-input-sls on each server. If multiple pipelines on the same server consume the same Logstore, duplicate data will appear in the output.
For all servers to participate in the same consumer group, configure them with identical consumer_group and consumer_name values, and set consumer_name_with_ip to true. The plug-in appends each server's IP to the consumer name, making every consumer unique and allowing the group to track each server's consumption position separately.
Example: A Logstore with 10 shards at 1 MB/s per shard, consumed by 5 servers at 3 MB/s capacity each. With one pipeline per server and consumer_name_with_ip set to true, the plug-in allocates 2 shards per server, each processing at 2 MB/s.
Checkpoint interval
Set checkpoint_second to a value between 10 and 60 seconds. A shorter interval reduces the amount of data re-processed after a server restart, but increases the frequency of checkpoint writes. The default of 30 seconds is appropriate for most workloads.
Performance benchmark
The following test results show throughput and resource usage on a 4-core Intel Xeon Platinum 8163 @ 2.50 GHz with 8 GB memory running Linux.
Test setup: A Java producer sends log entries (10 key-value pairs, ~500 bytes each) at increasing rates. Logstash consumes from the Logstore using the pipeline configuration below, writing output to Elasticsearch. The test verifies that consumption latency does not increase and consumption speed keeps pace with the ingest rate.
input {
logservice{
endpoint => "cn-hangzhou-intranet.log.aliyuncs.com"
access_id => "***"
access_key => "***"
project => "test-project"
logstore => "logstore1"
consumer_group => "consumer_group1"
consumer_name => "consumer1"
position => "end"
checkpoint_second => 30
include_meta => true
consumer_name_with_ip => true
}
}
output {
elasticsearch {
hosts => ["http://es-cn-***.elasticsearch.aliyuncs.com:9200"]
index => "myindex"
user => "elastic"
password => "changeme"
}
}Results:
Traffic (MB/s) | CPU utilization (%) | Memory usage (GB) |
2 | 11.3 | 1.3 |
4 | 21.0 | 1.3 |
8 | 41.5 | 1.3 |
16 | 83.3 | 1.3 |
32 | 170.3 | 1.3 |
CPU usage scales linearly with traffic. Memory usage remains constant at 1.3 GB across all traffic levels.
What's next
Use consumer groups to consume data: Learn how to manage consumer group permissions and monitor consumption progress.
Use configuration files to manage pipelines: Create and deploy Logstash pipeline configurations.