All Products
Search
Document Center

Use data processing plug-ins

Last Updated: Sep 09, 2021

OpenSearch allows you to upload data by using API, SDKs, or the console. In addition, you can configure a data source for synchronizing data from an existing database to OpenSearch. If you use API or SDKs to upload data, see relevant topics. In this case, you cannot use the data processing plug-ins that are described in this topic. If you use a data source to synchronize data to OpenSearch, you must configure the information about the data source in the console. OpenSearch provides several data processing plug-ins for you to perform simple data conversion operations. You can use a data processing plug-in when you configure field mappings between OpenSearch tables and source tables. If you use the API operation to upload data, you cannot use the data processing plug-ins and must process the data by yourself before you upload it.

You can associate an OpenSearch table with multiple tables in an ApsaraDB RDS for MySQL or a PolarDB for MySQL data source in the case of database and table sharding. However, you can associate an OpenSearch table with only one MaxCompute source table. If you need to synchronize data from multiple MaxCompute source tables, join the tables to form one table and upload the table.

Data processing plug-ins

To use specific search features or functions, you must configure specific field types. For example, you must use a plug-in that is described in the following table to convert fields of other types to the fields of an array type.

Note: You can configure a plug-in when you configure a data source for an application rather than when you define the application schema. You can configure a plug-in only after a data source is configured.

Plug-in

Description

Example

JsonKeyValueExtractor

This plug-in extracts the specified key value from source fields in JSON format. The extracted key value is used as the name of the destination table field. Only the value of the specified key can be extracted.

The value of the title key is extracted from {"title":"the content","body":"the content"}. If the extracted value is in JSON array format, the value is converted to the field value of an array type. Make sure that the type of the extracted value is consistent with that of the destination field. Otherwise, the extracted value is lost. The preceding JSON array format refers to the JSON array format that is defined by OpenSearch. Sample field of the LITERAL_ARRAY type: {"tags":["a","b","c"]}. Sample field of the INT_ARRAY type: {"tags":[1,2,3]}.

MultiValueSpliter

This plug-in uses delimiters to divide the value of a source field into multiple values. The values are used as the array elements of the destination field. The destination fields must be of an array type. If you use a non-printable character as the delimiter, you must use Unicode characters to identify the non-printable character.

The value of a source field is 1,2,3. You can enter a comma (,) when you specify the delimiter.

KeyValueExtractor

This plug-in extracts the specified key value from source fields which are key-value pairs. The extracted key value is used as the value of the destination field. Only the value of the specified key can be extracted. Delimiters are not required.

For the fields of key1:value1,value2;key2:value3, the keys are key1 and key2, Key-value pairs are separated by semicolons (;), keys and values are separated by colons (:), and values are separated by commas (,). If you use delimiters to separate the extracted value, the value is converted to the field value of an array type. Make sure that the type of the extracted value is consistent with that of the destination field. Otherwise, the extracted value is lost. If two identical keys exist, only the value of the second key is extracted.

StringCatenateExtractor

This plug-in concatenates the values of specified fields into a string in a specified sequence. This plug-in cannot concatenate fields of integer types. We recommend that you use fields of literal types. Separate multiple fields with commas (,).

You can use the plug-in to concatenate the field1 field and field2 field into a new field by using an underscore (_). In addition, you can use the $table system variable to obtain the current table name.

HTMLTagRemover

This plug-in removes HTML tags from the value of a source field. Then, the value of the destination field is replaced by the value without HTML tags.

The value of a source field is < div id=”copyright”>OpenSearch< /div>. If you use the plug-in to remove HTML tags, the value of the destination field is OpenSearch.

Procedure for using a data processing plug-in

This section describes how to use the MultiValueSpliter plug-in to map a field of the STRING type to a field of the LITERAL_ARRAY type in OpenSearch. In this example, the source field is "opensearch,odps,oss,rds,polardb". To map the source field to a field of the LITERAL_ARRAY type in OpenSearch, perform the following steps:

1.On the details page of an application, click Offline Change. On the Modify Application page, add a field of the LITERAL_ARRAY type.

12

2.Click Next. Then, click Next again to go to the step of configuring a data source.

3

3.Click Edit to configure a data processing plug-in.

Add a field mapping.

4

Select a source field to be mapped

5

. To configure a data processing plug-in for the field to be mapped, click the plus sign (+) in the Content Conversion column.

6

4.Click Save. After the reindexing is complete, you can check the field for which the data processing plug-in is configured.