This topic describes how Open Search Writer works, its features, data types, and parameters, and how to configure it by using the code editor.
How it works
Open Search Writer allows you to insert data to or update data in Open Search. Open Search Writer is designed for developers to import data to Open Search so that the data can be searched.
Specifically, Open Search Writer uses the search API that is provided by Open Search to import data.
- Open Search V3 uses internal dependent databases, with POM of com.aliyun.opensearch aliyun-sdk-opensearch 2.1.3.
- To use Open Search Writer, you must install JDK 1.6-32 or later. You can run the
java-version
command to view the JDK version. - A sync node that is run on the default resource group may fail to connect to Open Search that is deployed in a virtual private cloud (VPC).
Features
The columns in Open Search are unordered. Open Search Writer writes data in strict accordance with the order of the specified columns. If the number of specified columns is less than that in Open Search, redundant columns in Open Search are set to the default value or null.
Assume that an Open Search table contains columns a, b, and c, and you only need to write data to columns b and c. You can set the column parameter to ["c","b"]. In this case, Open Search Writer imports the first and second columns of the source data that is obtained from a reader to columns c and b in the Open Search table. Column a in the Open Search table is set to the default value or null.
- Handling of column configuration errors
To avoid losing the data of redundant columns and ensure high data reliability, Open Search Writer returns an error message if the number of columns to be written is more than that in the destination Open Search table. For example, if an Open Search table contains columns a, b, and c, Open Search Writer returns an error if more than three columns are to be written to the table.
- Table configuration
Open Search Writer can write data to only one table at a time.
- Node rerunning
After a node is rerun, data is overwritten based on IDs. Therefore, the data written to Open Search must contain an ID column. An ID is a unique identifier of a row in Open Search. The existing data with the same ID as the new data will be overwritten.
- Node rerunning
After a node is rerun, data is overwritten based on IDs.
Data types
Open Search Writer supports most Open Search data types. Make sure that your data types are supported.
Category | Open Search data type |
---|---|
Integer | INT |
Floating point | DOUBLE and FLOAT |
String | TEXT, LITERAL, and SHORT_TEXT |
Date and time | INT |
Boolean | LITERAL |
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
accessId | The AccessKey ID of the account that you can use to connect to the Open Search project. | Yes | N/A |
accessKey | The AccessKey secret of the account that you can use to connect to the Open Search project. | Yes | N/A |
host |
The endpoint of Open Search. You can view the endpoint in the Alibaba Cloud Management Console. |
Yes | N/A |
indexName | The name of the Open Search project. | Yes | N/A |
table | The name of the table to which data is written. You can specify only one table because Data Integration cannot import data to multiple tables at a time. | Yes | N/A |
column | The columns in the destination table to which data is written. To write data to all
the columns in the destination table, set the value to an asterisk (*), for example,
"column":["*"] . Set the value to the specified columns if data needs to be written to only specific
columns in the destination table. Separate the columns with commas (,), for example,
"column":["id","name"] .
Open Search Writer can filter columns and change the order of columns. For example,
an Open Search table has three columns: a, b, and c. If you want to write data only
to columns c and b, you can set the column parameter in the format of |
Yes | N/A |
batchSize | The number of data records to write at a time. Multiple data records are written to
Open Search at a time. The advantage of Open Search is data query. The transactions
per second (TPS) of Open Search is generally not high. Set this parameter based on
the resources available for the account that is used to connect to Open Search.
Generally, the size of a data record must be less than 1 MB, and the size of the data records to write at a time must be less than 2 MB. |
Required only for writing data to a partitioned table | 300 |
writeMode | The write mode. To ensure the idempotence of write operations, set the writeMode parameter
to add/update when you configure Open Search Writer.
|
Yes | N/A |
ignoreWriteError | Specifies whether to ignore failed write operations.
Example: |
No | false |
version | The version of Open Search, for example, "version":"v3" . We recommend that you use Open Search V3 because the push operation faces many constraints
in Open Search V2.
|
No | v2 |
Configure Open Search Writer by using the code editor
{
"type": "job",
"version": "1.0",
"configuration": {
"reader": {},
"writer": {
"plugin": "opensearch",
"parameter": {
"accessId": "*********",
"accessKey": "********",
"host": "http://yyyy.aliyuncs.com",
"indexName": "datax_xxx",
"table": "datax_yyy",
"column": [
"appkey",
"id",
"title",
"gmt_create",
"pic_default"
],
"batchSize": 500,
"writeMode": add,
"version":"v2",
"ignoreWriteError": false
}
}
}
}