All Products
Search
Document Center

DataWorks:OpenSearch data source

Last Updated:Sep 24, 2024

DataWorks provides OpenSearch Writer for you to write data to OpenSearch data sources. This topic describes the capabilities of writing data to OpenSearch data sources in offline mode.

Supported OpenSearch versions

  • OpenSearch V3 uses a second-party package, with POM of com.aliyun.opensearch aliyun-sdk-opensearch 2.1.3.

  • To use OpenSearch Writer, you must install JDK 1.6-32 or later. You can run the java-version command to view the JDK version.

Limits

  • OpenSearch Writer supports only exclusive resource groups for Data Integration, but not custom resource groups for Data Integration.

  • The columns in OpenSearch are unordered. OpenSearch Writer writes data in strict accordance with the order of the specified columns. If the number of specified columns is less than that in OpenSearch, excess columns in OpenSearch are set to the default value or null.

    For example, an OpenSearch table contains columns a, b, and c, and you want to write data to columns b and c. You can set the column parameter to ["c","b"]. In this case, OpenSearch Writer imports the first and second columns of the source data that is obtained from a reader to columns c and b in the OpenSearch table. Column a in the OpenSearch table is set to the default value or null.

  • You can use only the code editor to configure a batch synchronization node to write data to OpenSearch data sources.

Data type mappings

OpenSearch Writer supports most OpenSearch data types. Make sure that the data types of your database are supported. The following table lists the data type mappings based on which OpenSearch Writer converts data types.

Category

OpenSearch data type

Integer

INT

Floating point

DOUBLE and FLOAT

String

TEXT, LITERAL, and SHORT_TEXT

Date and time

INT

Boolean

LITERAL

Develop a data synchronization node

Additional information

Handling column configuration errors

To prevent data loss caused by redundant columns and ensure high data reliability, OpenSearch Writer returns an error if the number of columns that are to be written is more than that in the destination table. For example, an OpenSearch table contains columns a, b, and c. If more than three columns need to be written to the table, OpenSearch Writer returns an error.

Table configuration

OpenSearch Writer can write data to only one table at a time.

Node rerunning

After a node is rerun, data is overwritten based on IDs. Therefore, the data written to OpenSearch must contain an ID column. An ID is a unique identifier of a row in OpenSearch. The existing data that has the same ID as the new data is overwritten.

Appendix: Code and parameters

Appendix: Configure a batch synchronization node by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.

Code for OpenSearch Writer

{
    "type": "job",
    "version": "1.0",
    "configuration": {
        "reader": {},
        "writer": {
            "plugin": "opensearch",
            "parameter": {
                "accessId": "*********",
                "accessKey": "********",
                "host": "http://yyyy.aliyuncs.com",
                "indexName": "datax_xxx",
                "table": "datax_yyy",
                "column": [
                "appkey",
                "id",
                "title",
                "gmt_create",
                "pic_default"
                ],
                "batchSize": 500,
                "writeMode": add,
                "version":"v2",
                "ignoreWriteError": false
            }
        }
    }
}

Parameters in code for OpenSearch Writer

Parameter

Description

Required

Default value

accessId

The AccessKey ID of the account that you use to connect to OpenSearch.

Yes

No default value

accessKey

The AccessKey secret of the account that you use to connect to OpenSearch.

Yes

No default value

host

The endpoint of OpenSearch. You can obtain the endpoint in the Alibaba Cloud Management Console.

Yes

No default value

indexName

The name of the OpenSearch project.

Yes

No default value

table

The name of the table to which you want to write data. You can specify only one table because Data Integration cannot import data to multiple tables at a time.

Yes

No default value

column

The names of the columns to which you want to write data. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as "column":["*"]. If you want to write data only to specific columns in the destination table, set this parameter to the column names. Separate the column names with commas (,), such as "column":["id","name"].

OpenSearch Writer can filter columns and change the order of columns. For example, an OpenSearch table contains three columns: a, b, and c. If you want to write data only to columns c and b, you can set the column parameter to ["c","b"]. During data synchronization, column a is automatically set to null.

Yes

No default value

batchSize

The number of data records to write at a time. OpenSearch Writer writes multiple data records to OpenSearch at a time. OpenSearch provides the data query feature. In most cases, the transactions per second (TPS) of OpenSearch is not high. Set this parameter based on the resources available for the account that is used to connect to OpenSearch.

In most cases, the size of a data record must be less than 1 MB, and the total size of the data records to write at a time must be less than 2 MB.

Required only for writing data to a partitioned table

300

writeMode

The write mode. To ensure the idempotence of write operations, set this parameter to add/update.

  • add: If a failure occurs and the synchronization node is rerun, OpenSearch Writer deletes existing data records and inserts new data records to OpenSearch. This is an atomic operation.

  • update: OpenSearch Writer updates existing data records based on new data records. This is also an atomic operation.

    Note

    Writing multiple data records to OpenSearch at a time is not an atomic operation. Some of the data records may fail to be written. Exercise caution when you configure the writeMode parameter. OpenSearch V3 does not support the update mode.

Yes

No default value

ignoreWriteError

Specifies whether to ignore the write operations that fail.

Example: "ignoreWriteError":true. If OpenSearch Writer writes multiple data records to OpenSearch at a time, this parameter specifies whether to ignore write operations that fail in the current batch. If you set this parameter to true, OpenSearch Writer continues to perform other write operations. If you set this parameter to false, the synchronization node ends, and OpenSearch Writer returns an error. We recommend that you use the default value.

No

false

version

The version of OpenSearch, such as "version":"v3". We recommend that you use OpenSearch V3 because the push operation has many limits in OpenSearch V2.

No

v2