All Products
Search
Document Center

ApsaraDB for SelectDB:Migrate Elasticsearch data

Last Updated:Mar 30, 2026

Use a catalog to migrate offline data from an Elasticsearch cluster to an ApsaraDB for SelectDB instance. The catalog maps Elasticsearch indexes to SelectDB tables, so you can run a single INSERT INTO ... SELECT statement to complete the migration.

Prerequisites

Before you begin, ensure that you have:

How it works

Creating a catalog establishes a read-only connection from SelectDB to your Elasticsearch cluster. Once the catalog is created, SelectDB automatically creates a database named default_db inside it and maps each Elasticsearch index to a table in that database. You can then query the catalog and run INSERT INTO ... SELECT to copy data into a SelectDB table.

Note

SelectDB can only read data from an external catalog. Writing back to the Elasticsearch cluster is not supported.

Sample environment

The following steps use these example values. Substitute your own values as needed.

Item Value
Elasticsearch index (source) product_info
SelectDB database (destination) es_db
SelectDB table (destination) test_es2SelectDB

Prepare the sample source data

For more information, see Getting started.

Migrate offline data

Step 1: Connect to the SelectDB instance

For connection instructions, see Connect to an instance.

Step 2: Create an Elasticsearch catalog

Run the following statement to create a catalog that points to your Elasticsearch cluster:

CREATE CATALOG es_catalog PROPERTIES (
    "type"="es",
    "hosts"="http://127.0.0.1:9200",
    "user"="test_user",
    "password"="test_passwd",
    "nodes_discovery"="false"
);

The following table describes the catalog properties:

Parameter Required Default Description
es_catalog Yes The name of the catalog. Change the name based on your business requirements.
type Yes Set to es.
hosts Yes The access URL of the Elasticsearch cluster. Format: http://<IP address>:<port>. To use a Server Load Balancer (SLB) endpoint, set this to the SLB URL.
user No The username for accessing the Elasticsearch cluster.
password No The password for accessing the Elasticsearch cluster.
doc_value_scan No true Enables column-oriented storage (doc_values) for querying field values. When enabled, SelectDB automatically queries fields with doc_values enabled, which significantly improves performance for queries on a small number of columns.
keyword_sniff No true Detects TEXT fields in Elasticsearch and queries them using the corresponding KEYWORD fields. When true, SelectDB matches the full field value before tokenization. When false, SelectDB matches individual terms after tokenization.
nodes_discovery No true Enables Elasticsearch node discovery. When true, SelectDB discovers and connects to all available data nodes. When false, SelectDB accesses nodes only through the URL specified in hosts. Set to false for Alibaba Cloud Elasticsearch clusters, which can only be accessed via URL.
mapping_es_id No false Maps the Elasticsearch _id field to a SelectDB column. Set to true if you need to query the primary key value. When false, the _id field is not accessible in queries.
like_push_down No true Converts LIKE queries to Elasticsearch wildcard queries and pushes them down to the cluster. This can increase CPU consumption on the Elasticsearch cluster in some scenarios. Set to false to disable LIKE pushdown and improve accuracy at the cost of recall rate.
include_hidden_index No false Includes hidden indexes in query conditions.

Step 3: Verify the catalog

Run SHOW CATALOGS to confirm the catalog was created:

SHOW CATALOGS;

Expected output:

+--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
| CatalogId    | CatalogName  | Type     | IsCurrent | CreateTime              | LastUpdateTime      | Comment                |
+--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+
| 436009309195 | es_catalog   | es       |           | 2024-08-06 17:09:08.058 | 2024-07-19 18:04:37 |                        |
|            0 | internal     | internal | yes       | UNRECORDED              | NULL                | Doris internal catalog |
+--------------+--------------+----------+-----------+-------------------------+---------------------+------------------------+

Step 4: (Optional) Query the Elasticsearch catalog

Switch to the external catalog to browse the mapped tables:

SWITCH es_catalog;

Query and access data in es_catalog the same way you would in the internal catalog. For query examples, see the "Query data from an Elasticsearch data source" section of the Elasticsearch data source topic.

Switch back to the internal catalog when done:

SWITCH internal;

Step 5: Create the destination database

Skip this step if the database already exists.

CREATE DATABASE es_db;

Switch to the destination database:

USE es_db;

Step 6: Create the destination table

Before creating the table, review the column type mappings between Elasticsearch and SelectDB. For the full type mapping reference, see the "Column type mappings" section of the Elasticsearch data source topic.

Create the table using column types that match the Elasticsearch index schema:

CREATE TABLE test_Es2SelectDB
(
    `annual_rate` VARCHAR(200),
    `describe`    TEXT,
    `productName` VARCHAR(200)
)
DISTRIBUTED BY HASH(productName) BUCKETS 4
PROPERTIES("replication_num" = "1");

Step 7: Migrate data

Run INSERT INTO ... SELECT to copy all rows from the Elasticsearch index into the SelectDB table:

INSERT INTO test_Es2SelectDB SELECT * FROM es_catalog.default_db.product_info;

Step 8: Verify the migration

Query the destination table to confirm the data was imported:

SELECT * FROM test_Es2SelectDB;

Migrate incremental data

In production environments, Elasticsearch data typically includes both historical (offline) data and ongoing incremental data. Use the following strategies based on your data type.

Log data

Write new log entries simultaneously to both the Elasticsearch cluster and the SelectDB instance. Once enough data accumulates in SelectDB, run your analysis queries directly against SelectDB.

Transactional or event data

Use SelectDB's Unique key model to handle deduplication automatically. Write incremental data to both the Elasticsearch cluster and SelectDB in parallel. Migrate historical data using the offline migration steps above. If historical and incremental records overlap, the Unique key model deduplicates on the primary key, so no manual conflict resolution is needed.

ELK ecosystem compatibility

For log collection, SelectDB offers customized and enhanced Filebeat and Logstash plugins to efficiently collect and process log data. You can also collect logs by importing data. For more information, see the following topics:

What's next