Alibaba Cloud Elasticsearch supports four data collection methods: Elastic Beats, Logstash, Elasticsearch clients, and the Kibana console. Choose based on your data type, transformation requirements, and whether data originates from infrastructure, applications, or direct API interaction.
Elasticsearch is widely used across many scenarios—including application search, website search, logging, infrastructure monitoring, application performance monitoring (APM), and security analytics. Solutions for these scenarios are provided free of charge. Before you can use them, you must import the required data into Elasticsearch.
Choose a collection method
Start with the simplest option that meets your requirements.
| Your scenario | Recommended method |
|---|---|
| Collecting logs, metrics, or system events from servers, edge devices, or IoT sensors | Elastic Beats |
| Collecting the same data types but needing enrichment, conditional routing, or multiple outputs | Logstash |
| Sending data programmatically from your application code | Elasticsearch clients |
| Testing, debugging Elasticsearch requests, or indexing individual documents | Kibana console |
When your needs are unclear, start with Beats. It has the lowest setup overhead and works for the majority of log and metric collection scenarios. Escalate to Logstash only when Beats modules cannot meet your transformation or routing requirements.
Elastic Beats
Beats is a collection of lightweight data shippers. Each shipper targets a specific data type and runs with minimal resource overhead. This makes Beats the default choice for edge devices, IoT sensors, firewalls, and any host where a heavier agent is not practical.
Beats also works on fully resourced servers. When collection needs are straightforward, Beats is faster to configure than Logstash. Pre-built modules handle parsing, indexing, and dashboard setup for common software stacks. A basic deployment typically runs in under five minutes.
Available shippers
| Shipper | What it collects | Key data sources |
|---|---|---|
| Filebeat | Log files and text streams | Files, TCP, UDP, containers, Redis, syslogs; modules for Apache, MySQL, Kafka |
| Metricbeat | System and service metrics | CPU, memory, disk, network, running processes; modules for Kafka, Redis, Palo Alto Networks |
| Packetbeat | Real-time network traffic | DHCP, DNS, HTTP, MongoDB, NFS, TLS; security analytics and APM |
| Winlogbeat | Windows event logs | Application, hardware, security, and system events |
| Auditbeat | File integrity and audit events | Linux audit framework; security analytics |
| Heartbeat | Service availability | ICMP, TCP, HTTP probes; infrastructure monitoring |
| Functionbeat | Serverless logs and metrics | AWS Lambda and other serverless environments |
For a hands-on example using Metricbeat, see Use self-managed Metricbeat to collect system metrics. Other shippers follow the same setup pattern.
Logstash
Logstash reads, transforms, and routes data from virtually any source. It handles complex enrichment tasks that Beats alone cannot perform—such as querying external data sources, applying conditional logic, or routing events to multiple outputs simultaneously.
Logstash requires more CPU and memory than Beats. Do not use it on low-resource devices. Use Logstash when your transformation requirements go beyond what Beats modules provide.
Alibaba Cloud Logstash is a fully managed service. You do not need to provision or maintain Logstash infrastructure. It is compatible with all open-source Logstash capabilities and can collect data from multiple sources simultaneously.
Pipeline components
A Logstash pipeline consists of three stages:
Input plugins — read data from files, HTTP endpoints, IMAP, JDBC, Kafka, syslogs, TCP, or UDP.
Filter plugins — parse and enrich data. The Grok filter plugin handles regular expressions, CSV, JSON, and key-value pairs. Other plugins add IP geolocation, DNS lookups, or value lookups from custom directories or Elasticsearch indexes. The mutate plugin renames, copies, or removes fields.
Output plugins — write results to a destination. The Elasticsearch output plugin sends processed data to your cluster.
Example: ingest an RSS feed
The following pipeline reads the Elastic Blogs RSS feed, strips HTML tags, and sends the cleaned data to Elasticsearch.
Prerequisites
Before you begin, confirm:
You have an Alibaba Cloud Logstash instance.
You have an Alibaba Cloud Elasticsearch cluster with a known internal endpoint and access credentials.
Network connectivity exists between the Logstash instance and the Elasticsearch cluster.
Configure the pipeline
input {
rss {
url => "https://www.elastic.co/blog/feed"
interval => 120
}
}
filter {
mutate {
rename => [ "message", "blog_html" ]
copy => { "blog_html" => "blog_text" }
copy => { "published" => "@timestamp" }
}
mutate {
gsub => [
"blog_text", "<.*?>", "",
"blog_text", "[\n\t]", " "
]
remove_field => [ "published", "author" ]
}
}
output {
stdout {
codec => dots
}
elasticsearch {
hosts => [ "https://<your-elasticsearch-internal-endpoint>:9200" ]
index => "elastic_blog"
user => "elastic"
password => "<your-elasticsearch-password>"
}
}Replace the following placeholders before running the pipeline:
| Placeholder | Description | Example |
|---|---|---|
<your-elasticsearch-internal-endpoint> | Internal endpoint of your Elasticsearch cluster | es-cn-xxxx.elasticsearch.aliyuncs.com |
<your-elasticsearch-password> | Password for the elastic user on the cluster | — |
Verify the results
After the pipeline runs, query the index from the Kibana console:
POST elastic_blog/_searchFor details on reading query results, see Step 3: View synchronization results.
Elasticsearch clients
Elasticsearch clients are language-specific libraries that abstract the low-level HTTP details of the RESTful API. Use clients when your application needs to index or query data programmatically. You can also use the RESTful API to manage Elasticsearch clusters and indexes.
Officially supported languages include Java, JavaScript, Go, .NET, PHP, Perl, Python, and Ruby. For full documentation and code samples, see Elasticsearch clients.
If your language is not in the list above, check Community contributed clients.
Kibana console
The Kibana console provides a browser-based interface to the Elasticsearch RESTful API. It abstracts the technical details of underlying HTTP requests. Use it for development, debugging, and one-off indexing tasks.
The following example adds a single JSON document to an index:
PUT my_first_index/_doc/1
{
"title" : "How to Ingest Into Elasticsearch Service",
"date" : "2019-08-15T14:12:12",
"description" : "This is an overview article about the various ways to ingest into Elasticsearch Service"
}For scripted or automated workflows, cURL is a command-line alternative that communicates with Elasticsearch through the same RESTful API and can be used to integrate tailored scripts.
Summary
Select the method that matches your data origin and operational constraints:
Beats — convenient, lightweight, and out-of-the-box. Pre-built modules cover common databases, operating systems, containers, web servers, and caches. A dashboard can be running in under five minutes. Best suited for embedded devices, IoT sensors, or firewalls with limited resources.
Logstash — use when Beats cannot meet your enrichment, filtering, or routing requirements. Alibaba Cloud Logstash is fully managed; you do not operate the infrastructure.
Elasticsearch clients — recommended when your application collects or produces data that must be sent programmatically.
Kibana console — recommended for developing, testing, or debugging Elasticsearch requests interactively.