Sync Elasticsearch Data to PolarSearch with Logstash - PolarDB

Use Logstash to sync data from Elasticsearch or OpenSearch into PolarSearch. The pipeline reads documents from the source cluster, preserves index names and document IDs, and writes them to PolarSearch in a single pass.

Prerequisites

Before you begin, ensure that you have:

A server that can reach both the source cluster and the PolarSearch cluster over the network
Source cluster: an account with read and read_metadata permissions on the indexes to sync
PolarSearch cluster: an account with write and create_index permissions

Step 1: Prepare the Logstash environment

This guide uses Logstash 8.8.2 on Linux x86_64, with Elasticsearch 7.10 or OpenSearch as the source and PolarSearch as the destination.

Download and decompress Logstash. Select the package that matches your environment.
Different Logstash versions have minor differences, but the core configuration is the same. For other versions, see the Logstash version list.
Install the required plugins. For the full plugin catalog, see the plugin repository. Source plugin — install based on your source cluster type:
- Elasticsearch source — the logstash-input-elasticsearch plugin is pre-installed. Verify with:
```
# Go to the Logstash root directory
cd /path/to/logstash-8.8.2

# List installed plugins
./bin/logstash-plugin list

# If not listed, install it
./bin/logstash-plugin install logstash-input-elasticsearch
```
- OpenSearch source — install logstash-input-opensearch:
```
cd /path/to/logstash-8.8.2
./bin/logstash-plugin install logstash-input-opensearch
```
Destination plugin — PolarSearch is compatible with OpenSearch APIs, so install logstash-output-opensearch regardless of the source type:
```
cd /path/to/logstash-8.8.2
./bin/logstash-plugin install logstash-output-opensearch
```

Step 2: Create the configuration file

In the Logstash root directory, create a file named synchronization.conf. The pipeline reads all matching documents from the source cluster and writes them to PolarSearch, keeping the original index name and document ID intact.

For reference, see Ship Logstash events to OpenSearch.

Key parameters

Parameter	Description
`index`	Indexes to sync. Supports wildcards (``). Avoid using bare `` — it copies internal indexes such as `.kibana`.
`docinfo`	Set to `true` to capture the original index name and document ID from the source.
`slices`	Number of concurrent read slices. Set to the number of primary shards in the source index. Do not exceed the shard count — more slices than shards degrades query and cluster performance.
`size`	Number of documents fetched per batch.
`scroll`	Time-to-live for the scroll query. Increase this for large datasets to avoid timeout interruptions.
`manage_template`	Set to `false` in the `output` block to prevent Logstash from overwriting any index template you created in PolarSearch.

Each input plugin uses its own @metadata path structure. Enable the stdout debug output during testing to confirm the exact path before setting index and document_id in the output block.

Configuration file example

# synchronization.conf
# Syncs data from Elasticsearch or OpenSearch to PolarSearch.

input {
  # Use 'elasticsearch' instead of 'opensearch' if the source cluster is Elasticsearch.
  # Both plugins share the same parameters, but their @metadata paths differ.
  opensearch {
    # [Required] Source cluster endpoint. HTTPS is recommended.
    hosts => ["https://source-cluster-endpoint:9200"]
    # [Required] Source cluster credentials.
    user => "your_source_user"
    password => "your_source_password"

    # [Required] Indexes to sync. Wildcards are supported.
    # Avoid using "*" or ".*" to prevent copying internal indexes (such as .kibana).
    index => "your-business-logs-*"

    # SSL/TLS
    ssl => true
    # If the source cluster uses a self-signed certificate, specify the CA path.
    # cacert => "/path/to/source_ca.crt"

    # Capture the original index name and document ID.
    docinfo => true
    # Set to the number of primary shards in the source index.
    # Do not exceed the shard count — over-slicing degrades performance.
    slices => 4
    # Documents per batch.
    size => 1000
    # Scroll query time-to-live. Increase for large datasets.
    scroll => "5m"
  }
}

output {
  opensearch {
    # [Required] PolarSearch cluster endpoint.
    hosts => ["https://polarsearch-cluster-endpoint:9200"]
    # [Required] PolarSearch credentials.
    user => "your_target_user"
    password => "your_target_password"

    # SSL/TLS
    ssl => true
    # If PolarSearch uses a self-signed certificate, specify the CA path.
    # cacert => "/path/to/polarsearch_ca.crt"

    # Read the index name from source metadata to keep it consistent with the source.
    # Confirm this path from debug output — it varies by input plugin.
    index => "%{[@metadata][input][opensearch][_index]}"
    # Preserve the original document ID.
    document_id => "%{[@metadata][input][opensearch][_id]}"

    # Prevent Logstash from overwriting a custom index template in PolarSearch.
    # manage_template => false
  }

  # Debug output — uncomment during testing, comment out for production.
  # stdout {
  #   codec => rubydebug {
  #     metadata => true
  #   }
  # }
}

Step 3: Run the sync task and verify the result

Start the task from the Logstash root directory:
```
./bin/logstash -f synchronization.conf
```
For large-scale syncs, run Logstash as a background process:
```
nohup ./bin/logstash -f synchronization.conf &
```
After the task completes, log in to the PolarSearch cluster and verify that the indexes and data were created and written successfully.

FAQ

How do I find the correct `@metadata` path for the index name?

Enable debug output in the configuration: uncomment the stdout block with codec => rubydebug { metadata => true }. Start Logstash, then check the console — each document prints its full @metadata structure. Find the _index and _id fields and copy the exact path into the output block.

For the logstash-input-opensearch plugin, the path is [@metadata][input][opensearch][_index]. For logstash-input-elasticsearch, confirm the path from the debug output, as it differs.

The sync is slow. How can I speed it up?

Try these options in order:

Tune `slices`: Set it to match the number of primary shards in the source index. Do not go higher — more slices than shards hurts performance rather than improving it.
Add pipeline workers: Pass --pipeline.workers when starting Logstash. Set it to the number of CPU cores on the server, for example: ./bin/logstash -f synchronization.conf --pipeline.workers 8.
Increase Java Virtual Machine (JVM) heap: Edit config/jvm.options and raise -Xms and -Xmx, for example, -Xms4g -Xmx4g.

How do I preserve index settings like tokenizers and field types?

Export the index mapping and settings from the source cluster, then manually create a matching index template in PolarSearch before running the sync. In the output block, set manage_template => false to prevent Logstash from overwriting the template.