edit-icon download-icon

Configure OpenSearch writer

Last Updated: Nov 26, 2018

The OpenSearch Writer plug-in is designed to insert or update data into OpenSearch. It is mainly provided for data developers to import processed data into OpenSearch and output data by searching. How fast data can be transmitted depends on the qps of the account corresponding to the OpenSearch table.

At the underlying implementation level, OpenSearch Writer provides the openly available OpenSearch API by means of OpenSearch.

Note:

  • If you use OpenSearchWriter plug-in, please make sure that the JDK version must be 1.6-32 or above. You can use java-version command to view the JDK version.
  • Now, the default resource group does not support connecting to the VPC environment. If you use the VPC environment, it may cause network problems.

Plug-in features

Column issue

The column of OpenSearch is unordered. When the OpenSearch Writer is written, it must be written strictly according to the specified column order. If the specified column is less than OpenSearch, the rest of the columns use the default value or null to enter.

For example, the list of fields that need to be imported has B field and C field, but the fields in the OpenSearch table have three columns: A, B, and C. In the column configuration, the column can be written as column [“b”, “c”], which means that the first and second columns of Reader are imported into the OpenSearch B field and C field. The new insert field A in the table sets as a default value or null.

  • Error handling of column configuration

    To make sure the reliability of data writing and avoid redundant data loss, for writing redundant columns, the OpenSearch Writer will report errors.

    For example, the OpenSearch table field includes a, B, and C columns, but if the OpenSearch Writer imports fields with more than 3 columns, the OpenSearch Writer sends errors out.

  • Table configuration notes

    The OpenSearch Writer can only write in one table immediately.

  • Task re-running and failover

    After the re-running, the system automatically covers data according to their IDs. When inserting data to the OpenSearch, the data must has an ID, which is the unique identification of a row’s record in OpenSearch. The data which has a same unique identification will be covered.

OpenSearch Writer supports most data types in OpenSearch. Check whether your data type is supported.

OpenSearch Writer converts the data types in OpenSearch as follows.

Category OpenSearch data type
Integer Int
Floating point Double/Float
String TEXT/Literal/SHORT_TEXT
Date and time Int
Boolean Literal

Parameter description

  • datasource

    • Description: Data source name. It must be identical to the data source name added. Adding data source is supported in script mode.

    • Required: Yes

    • Default value: None

  • accessId

    • Description: logon ID for the Alibaba Cloud system.

    • Required: Yes

    • Default value: None
  • accessKey

    • Description: logon Key for the Alibaba Cloud system.

    • Required: Yes

    • Default value: None
  • endpoint

    • Description: the endpoint linked to OpenSearch.

    • Required: Yes

    • Default value: None
  • indexName

    • Description: the name of the OpenSearch project.

    • Required: Yes

    • Default value: None
  • table

    • Description: the name of the table to write data to.

    • Required: Yes

    • Default value: None
  • host

    • Description: the linked endpoint.

    • Required: This option is required for a partition table. Do not enter this field if the target table is not a partition table.

    • Default value: null
  • column

    • Description: the list of fields to be imported. If you are importing all the fields, it can be configured to “column”: [“*”]. If you are inserting some of the OpenSearch columns, enter these columns, for example, “column”: [“id”, “name”]. OpenSearch supports column filtering and column order changing. For example, a table has three fields: a, b, and c, and you only want to synchronize fields c and b. You can configure it to [“c, b”]. During the import process, field a is automatically inserted with null values and set to null.

    • Required: Yes

    • Default value: None
  • batchSize

    • Description: The rows of data to write at a time. Data is written to OpenSearch in batches. In general, the advantage of OpenSearch is query, and its write performance (tps) is not impressive. Proceed with the configuration based on the resources applied by your account. For OpenSearch, generally, a single item of data is less than 1 MB, and the data to be written at a time is less than 2 MB.

    • Required: This option is required for a partition table. Do not enter this field if the target table is not a partition table.

    • Default value: 300
  • writeMode

    • Description: In OpenSearch Writer, “writeMode”: “add/update” is configured to make sure the idempotence of write operations.

      -“add”: When a reattempt is made after a failed write attempt, OpenSearch Writer cleans up this data and imports the new data (atomic operation).-“update”: It indicates that the data is inserted in a modified manner (atomic operation).

      In OpenSearch, batch insert is not an atomic operation, which may be partially successful. Therefore, writeMode is a critical option.

    • Required: Yes

    • Default value: None
  • ignoreWriteError

    • Description: ignore write errors.

      Configuration example: “ignoreWriteError”: true. OpenSearch write operations are performed in batches. It indicates whether to ignore the write failure occurred in the current batch. If yes, the other write operations keeps going. If no, the current task is ended, and an error is returned. Default value is recommended.

    • Required: No

    • Default value: False
  • version

    • Description: The version information of OpenSearch.

    • Required: No

    • Default value: v2

Development in wizard mode

Currently, development in wizard mode is not supported.

Development in script mode

Configure the data synchronization job to write data to OpenSearch.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {},
  6. "writer": {
  7. "plugin": "opensearch",
  8. "parameter": {
  9. "accessId": "*********",
  10. "accessKey": "********",
  11. "host": "http://yyyy.aliyuncs.com",
  12. "indexName": "datax_xxx",
  13. "table": "datax_yyy",
  14. "column": [
  15. "appkey",
  16. "id",
  17. "title",
  18. "gmt_create",
  19. "pic_default"
  20. ],
  21. "batchSize": 500,
  22. "writeMode": add,
  23. "version":"v2",
  24. "ignoreWriteError": false
  25. }
  26. }
  27. }
  28. }
Thank you! We've received your feedback.