The OpenSearch Writer plugin in DataWorks Data Integration writes offline data to Alibaba Cloud OpenSearch. This topic covers supported editions, limitations, field type mappings, and the full parameter reference for configuring a batch synchronization task in code editor mode.
Supported editions and versions
OpenSearch Writer supports the following commercial editions of Alibaba Cloud OpenSearch:
-
Industry Algorithm Edition
-
LLM-based AI Chat Edition
-
High-performance Search Edition
-
Vector Search Edition
-
Retrieval Engine Edition
Version notes:
-
Version 3 uses a second-party package. Add the following pom dependency:
com.aliyun.opensearch:aliyun-sdk-opensearch:2.1.3. -
OpenSearch Writer requires JDK 1.6-32 or later. Run
java -versionto check your installed version.
Limitations
-
OpenSearch Writer supports serverless resource groups (recommended) and exclusive resource groups for Data Integration. Custom resource groups are not supported.
-
Columns in OpenSearch are unordered. Specify the column order explicitly when writing data. Columns not specified are set to their default values or null.
-
Write offline data to OpenSearch only in code editor mode.
-
OpenSearch Writer validates the column count. Writing more columns than exist in the destination table returns an error.
-
Write data to only one table per task. Multiple tables in a single task are not supported.
-
On task rerun, existing records are overwritten based on document ID. The column list must include an ID column as the unique identifier.
Supported field types
OpenSearch Writer supports the following OpenSearch data type mappings.
|
Category |
OpenSearch data type |
|
Integer |
INT |
|
Floating-point |
DOUBLE, FLOAT |
|
String |
TEXT, LITERAL, SHORT_TEXT |
|
Date and time |
INT |
|
Boolean |
LITERAL |
Write modes
Use the writeMode parameter to control how data is written and to ensure write idempotence.
|
Mode |
Behavior |
Atomic |
Supported versions |
|
|
On rerun, clears existing data and imports new data |
Yes |
All versions |
|
|
Inserts data as an update |
Yes |
Version 2 only (not supported in Version 3) |
Batch inserts in OpenSearch are not atomic operations — some records may succeed while others fail. Choose writeMode based on your consistency requirements.
Performance considerations
OpenSearch is optimized for queries, not writes. Its transactions per second (TPS) for write operations is limited.
-
Typically, a single data record is smaller than 1 MB.
-
Typically, a single batch write is smaller than 2 MB.
-
Set
batchSizebased on the resources allocated to your account. The default is300records per batch.
Configure a data synchronization task
Configure OpenSearch Writer in code editor mode. For the general procedure, see Configure in code editor mode.
The following sections provide code samples and parameter references for each supported edition group.
Industry Algorithm Edition, LLM-based AI Chat Edition, and High-performance Search Edition
Code sample
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {},
"writer": {
"plugin": "opensearch",
"parameter": {
"accessId": "<your-access-key-id>",
"accessKey": "<your-access-key-secret>",
"host": "http://yyyy.aliyuncs.com",
"endpoint": "http://yyyy.aliyuncs.com",
"indexName": "datax_xxx",
"table": "datax_yyy",
"column": [
"appkey",
"id",
"title",
"gmt_create",
"pic_default"
],
"batchSize": 500,
"writeMode": "add",
"version": "v2",
"ignoreWriteError": false
}
}
}
}
Replace the following placeholders with your actual values:
|
Placeholder |
Description |
Example |
|
|
Your AccessKey ID |
|
|
|
Your AccessKey secret |
|
Parameters
|
Parameter |
Description |
Required |
Default |
|
|
The AccessKey ID of your AccessKey pair. |
Yes |
N/A |
|
|
The AccessKey secret of your AccessKey pair. |
Yes |
N/A |
|
|
The traffic domain name of OpenSearch. Get this value from the instance details page in the OpenSearch console. |
Yes |
N/A |
|
|
The control endpoint of OpenSearch. Get this value from the service endpoints page for your edition. For example, see Service endpoints for the Industry Algorithm Edition. |
Yes |
N/A |
|
|
The name of the OpenSearch project (index). |
Yes |
N/A |
|
|
The name of the destination table. Only one table is supported per task. |
Yes |
N/A |
|
|
The columns to write data to. Use |
Yes |
N/A |
|
|
The number of records per batch write. Required for partitioned tables; omit for non-partitioned tables. |
Required for partitioned tables |
|
|
|
The write mode: |
Yes |
N/A |
|
|
Whether to ignore write failures for the current batch. Set to |
No |
|
|
|
The OpenSearch version, such as |
No |
|
Vector Search Edition and Retrieval Engine Edition
Code sample
{
"stepType": "opensearch",
"parameter": {
"indexName": "<your-index-name>",
"datasource": "<your-datasource-name>",
"table": "<your-table-name>",
"column": [
{
"name": "col3double",
"type": "DOUBLE"
},
{
"name": "col2vector",
"type": "MULTI_FLOAT"
}
],
"batchSize": "500"
},
"name": "Writer",
"category": "writer"
}
Parameters
|
Parameter |
Description |
Required |
Default |
|
|
The name of the destination table. Only one table is supported per task. |
Yes |
N/A |
|
|
The columns to write data to. Each entry specifies a |
Yes |
N/A |
|
|
The number of records per batch write. Required for partitioned tables; omit for non-partitioned tables. |
Required for partitioned tables |
|