You can use the OpenSearch Writer plugin in DataWorks Data Integration to write data to OpenSearch. This topic describes how to write data to OpenSearch in offline mode.
Supported versions
Version 3 uses a second-party package. The pom dependency is com.aliyun.opensearch:aliyun-sdk-opensearch:2.1.3.
To use the OpenSearch Writer plugin, you must have JDK 1.6-32 or a later version. You can run the
java -versioncommand to check your Java version number.The following commercial editions of Alibaba Cloud OpenSearch are supported: Industry Algorithm Edition, LLM-based AI Chat Edition, High-performance Search Edition, Vector Search Edition, and Retrieval Engine Edition.
Limitations
OpenSearch Writer supports serverless resource groups (recommended) and exclusive resource groups for Data Integration but does not support custom resource groups.
Columns in OpenSearch are unordered. Therefore, OpenSearch Writer requires you to specify the order of columns for writing data. If you specify fewer columns than the number of columns in the destination OpenSearch table, the unspecified columns are set to their default values or null.
For example, if an OpenSearch table contains columns a, b, and c, and you want to import data into columns b and c, you can set the "column":["c","b"] parameter. This configuration imports the first and second columns from the reader into columns c and b of the OpenSearch table, respectively. Column a is set to its default value or null.
You can write offline data to OpenSearch only in code editor mode.
Supported field types
OpenSearch Writer supports most OpenSearch data types. The following table lists the supported data type mappings.
Category | OpenSearch data type |
Integer | INT |
Floating-point | DOUBLE and FLOAT |
String | TEXT, LITERAL, and SHORT_TEXT |
Date and time | INT |
Boolean | LITERAL |
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
For the procedure, see Configure in code editor mode.
For all parameters and a code sample for the code editor mode, see Appendix: Code sample and parameters.
FAQ
Handle column configuration errors
To ensure data reliability, OpenSearch Writer validates the number of columns. The writer reports an error if you try to write more columns than exist in the destination table. For example, if an OpenSearch table has columns a, b, and c, OpenSearch Writer reports an error if you try to write more than three columns.
Notes on table configuration
OpenSearch Writer can write data to only one table at a time.
Task reruns and failover
When a task is rerun, existing data is overwritten based on the document ID. Therefore, the columns that you write to OpenSearch must include an ID column, which serves as the unique identifier for a row. Data with a matching ID is overwritten.
Appendix: Code sample and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configuration in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Code sample for Writer (Industry Algorithm Edition, LLM-based AI Chat Edition, and High-performance Search Edition)
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {},
"writer": {
"plugin": "opensearch",
"parameter": {
"accessId": "*********",
"accessKey": "********",
"host": "http://yyyy.aliyuncs.com",
"endpoint":"http://yyyy.aliyuncs.com",
"indexName": "datax_xxx",
"table": "datax_yyy",
"column": [
"appkey",
"id",
"title",
"gmt_create",
"pic_default"
],
"batchSize": 500,
"writeMode": add,
"version":"v2",
"ignoreWriteError": false
}
}
}
}Parameters for Writer (Industry Algorithm Edition, LLM-based AI Chat Edition, and High-performance Search Edition)
Parameter | Description | Required | Default value |
accessId | The AccessKey ID of your AccessKey pair. | Yes | N/A |
accessKey | The AccessKey secret of your AccessKey pair. It is used as a logon password. | Yes | N/A |
host | The traffic domain name of OpenSearch. You can log on to the OpenSearch console and go to the instance details page to obtain the domain name. | Yes | N/A |
endpoint | The control endpoint of OpenSearch. You can obtain the endpoint from the official website of the corresponding OpenSearch edition. For example, for the Industry Algorithm Edition, see Service endpoints. | Yes | N/A |
indexName | The name of the OpenSearch project. | Yes | N/A |
table | The name of the destination table. You can specify only one table because DataX does not support writing data to multiple tables at the same time. | Yes | N/A |
column | The columns to which you want to write data. To write data to all columns, set this parameter to OpenSearch supports column filtering and reordering. For example, a table has columns a, b, and c. If you want to synchronize data only to columns c and b, you can set this parameter to | Yes | N/A |
batchSize | The number of data records to write in each batch. OpenSearch performs batch writes. The main strength of OpenSearch is in queries, and its write transactions per second (TPS) is not high. Set this parameter based on the resources allocated to your account. Typically, a single data record is smaller than 1 MB, and a single batch write is smaller than 2 MB. | This parameter is required for partitioned tables. Do not specify this parameter for non-partitioned tables. | 300 |
writeMode | The write mode. Configure "writeMode":"add/update" to ensure write idempotence:
| Yes | N/A |
ignoreWriteError | You can ignore write faults. Example: | No | false |
version | The version of OpenSearch, such as | No | v2 |
Code sample for Writer (Vector Search Edition and Retrieval Engine Edition)
{
"stepType": "opensearch",
"parameter": {
"indexName": "",
"column": [
{
"name": "col3double",
"type": "DOUBLE"
},
{
"name": "col2vector",
"type": "MULTI_FLOAT"
}
],
"datasource": "zm_test_vector_01",
"batchSize": "500",
"table": "demotable"
},
"name": "Writer",
"category": "writer"
}Parameters for Writer (Vector Search Edition and Retrieval Engine Edition)
Parameter | Description | Required | Default value |
table | The name of the destination table. You can specify only one table because DataX does not support writing data to multiple tables at the same time. | Yes | N/A |
column | The columns to which you want to write data. To write data to all columns, set this parameter to OpenSearch supports column filtering and reordering. For example, a table has columns a, b, and c. If you want to synchronize data only to columns c and b, you can set this parameter to | Yes | N/A |
batchSize | The number of data records to write in each batch. OpenSearch performs batch writes. The main strength of OpenSearch is in queries, and its write transactions per second (TPS) is not high. Set this parameter based on the resources allocated to your account. Typically, a single data record is smaller than 1 MB, and a single batch write is smaller than 2 MB. | This parameter is required for partitioned tables. Do not specify this parameter for non-partitioned tables. | 300 |