Methods for ingesting data from various sources - Elasticsearch

Alibaba Cloud Elasticsearch supports four data collection methods: Elastic Beats, Logstash, Elasticsearch clients, and the Kibana console. Choose based on your data type, transformation requirements, and whether data originates from infrastructure, applications, or direct API interaction.

Elasticsearch is widely used across many scenarios—including application search, website search, logging, infrastructure monitoring, application performance monitoring (APM), and security analytics. Solutions for these scenarios are provided free of charge. Before you can use them, you must import the required data into Elasticsearch.

Choose a collection method

Start with the simplest option that meets your requirements.

Your scenario	Recommended method
Collecting logs, metrics, or system events from servers, edge devices, or IoT sensors	Elastic Beats
Collecting the same data types but needing enrichment, conditional routing, or multiple outputs	Logstash
Sending data programmatically from your application code	Elasticsearch clients
Testing, debugging Elasticsearch requests, or indexing individual documents	Kibana console

When your needs are unclear, start with Beats. It has the lowest setup overhead and works for the majority of log and metric collection scenarios. Escalate to Logstash only when Beats modules cannot meet your transformation or routing requirements.

Elastic Beats

Beats is a collection of lightweight data shippers. Each shipper targets a specific data type and runs with minimal resource overhead. This makes Beats the default choice for edge devices, IoT sensors, firewalls, and any host where a heavier agent is not practical.

Beats also works on fully resourced servers. When collection needs are straightforward, Beats is faster to configure than Logstash. Pre-built modules handle parsing, indexing, and dashboard setup for common software stacks. A basic deployment typically runs in under five minutes.

Available shippers

Shipper	What it collects	Key data sources
Filebeat	Log files and text streams	Files, TCP, UDP, containers, Redis, syslogs; modules for Apache, MySQL, Kafka
Metricbeat	System and service metrics	CPU, memory, disk, network, running processes; modules for Kafka, Redis, Palo Alto Networks
Packetbeat	Real-time network traffic	DHCP, DNS, HTTP, MongoDB, NFS, TLS; security analytics and APM
Winlogbeat	Windows event logs	Application, hardware, security, and system events
Auditbeat	File integrity and audit events	Linux audit framework; security analytics
Heartbeat	Service availability	ICMP, TCP, HTTP probes; infrastructure monitoring
Functionbeat	Serverless logs and metrics	AWS Lambda and other serverless environments

For a hands-on example using Metricbeat, see Use self-managed Metricbeat to collect system metrics. Other shippers follow the same setup pattern.

Logstash

Logstash reads, transforms, and routes data from virtually any source. It handles complex enrichment tasks that Beats alone cannot perform—such as querying external data sources, applying conditional logic, or routing events to multiple outputs simultaneously.

Logstash requires more CPU and memory than Beats. Do not use it on low-resource devices. Use Logstash when your transformation requirements go beyond what Beats modules provide.

Alibaba Cloud Logstash is a fully managed service. You do not need to provision or maintain Logstash infrastructure. It is compatible with all open-source Logstash capabilities and can collect data from multiple sources simultaneously.

Pipeline components

A Logstash pipeline consists of three stages:

Input plugins — read data from files, HTTP endpoints, IMAP, JDBC, Kafka, syslogs, TCP, or UDP.
Filter plugins — parse and enrich data. The Grok filter plugin handles regular expressions, CSV, JSON, and key-value pairs. Other plugins add IP geolocation, DNS lookups, or value lookups from custom directories or Elasticsearch indexes. The mutate plugin renames, copies, or removes fields.
Output plugins — write results to a destination. The Elasticsearch output plugin sends processed data to your cluster.

Example: ingest an RSS feed

The following pipeline reads the Elastic Blogs RSS feed, strips HTML tags, and sends the cleaned data to Elasticsearch.

Prerequisites

Before you begin, confirm:

You have an Alibaba Cloud Logstash instance.
You have an Alibaba Cloud Elasticsearch cluster with a known internal endpoint and access credentials.
Network connectivity exists between the Logstash instance and the Elasticsearch cluster.

Configure the pipeline

input {
  rss {
    url      => "https://www.elastic.co/blog/feed"
    interval => 120
  }
}

filter {
  mutate {
    rename => [ "message", "blog_html" ]
    copy   => { "blog_html"  => "blog_text" }
    copy   => { "published"  => "@timestamp" }
  }
  mutate {
    gsub => [
      "blog_text", "<.*?>",  "",
      "blog_text", "[\n\t]", " "
    ]
    remove_field => [ "published", "author" ]
  }
}

output {
  stdout {
    codec => dots
  }
  elasticsearch {
    hosts    => [ "https://<your-elasticsearch-internal-endpoint>:9200" ]
    index    => "elastic_blog"
    user     => "elastic"
    password => "<your-elasticsearch-password>"
  }
}

Replace the following placeholders before running the pipeline:

Placeholder	Description	Example
`<your-elasticsearch-internal-endpoint>`	Internal endpoint of your Elasticsearch cluster	`es-cn-xxxx.elasticsearch.aliyuncs.com`
`<your-elasticsearch-password>`	Password for the `elastic` user on the cluster	—

Verify the results

After the pipeline runs, query the index from the Kibana console:

POST elastic_blog/_search

For details on reading query results, see Step 3: View synchronization results.

Elasticsearch clients

Elasticsearch clients are language-specific libraries that abstract the low-level HTTP details of the RESTful API. Use clients when your application needs to index or query data programmatically. You can also use the RESTful API to manage Elasticsearch clusters and indexes.

Officially supported languages include Java, JavaScript, Go, .NET, PHP, Perl, Python, and Ruby. For full documentation and code samples, see Elasticsearch clients.

If your language is not in the list above, check Community contributed clients.

Kibana console

The Kibana console provides a browser-based interface to the Elasticsearch RESTful API. It abstracts the technical details of underlying HTTP requests. Use it for development, debugging, and one-off indexing tasks.

The following example adds a single JSON document to an index:

PUT my_first_index/_doc/1
{
    "title"       : "How to Ingest Into Elasticsearch Service",
    "date"        : "2019-08-15T14:12:12",
    "description" : "This is an overview article about the various ways to ingest into Elasticsearch Service"
}

For scripted or automated workflows, cURL is a command-line alternative that communicates with Elasticsearch through the same RESTful API and can be used to integrate tailored scripts.

Summary

Select the method that matches your data origin and operational constraints:

Beats — convenient, lightweight, and out-of-the-box. Pre-built modules cover common databases, operating systems, containers, web servers, and caches. A dashboard can be running in under five minutes. Best suited for embedded devices, IoT sensors, or firewalls with limited resources.
Logstash — use when Beats cannot meet your enrichment, filtering, or routing requirements. Alibaba Cloud Logstash is fully managed; you do not operate the infrastructure.
Elasticsearch clients — recommended when your application collects or produces data that must be sent programmatically.
Kibana console — recommended for developing, testing, or debugging Elasticsearch requests interactively.

Elasticsearch:Data ingestion solutions for Alibaba Cloud Elasticsearch