All Products
Search
Document Center

OpenSearch:Use data processing plug-ins

Last Updated:Apr 09, 2024

OpenSearch allows you to upload data by using an API operation, OpenSearch SDKs, or the console. In addition, you can configure a data source for synchronizing data from an existing database to OpenSearch. If you use the API operation or OpenSearch SDKs to upload data, see relevant topics. In this case, you cannot use the data processing plug-ins that are described in this topic. If you use a data source to synchronize data on the cloud to OpenSearch, you must configure information about the data source in the console. OpenSearch provides several data processing plug-ins for you to perform simple data conversion operations. You can use a data processing plug-in when you configure field mappings between OpenSearch tables and source tables. If you use the API operation to upload data, you cannot use the data processing plug-ins and must process the data by yourself before you upload it.

You can associate an OpenSearch table with multiple tables in an ApsaraDB RDS or a PolarDB data source in the case of database and table sharding. However, you can associate an OpenSearch table with only one MaxCompute source table. If you need to synchronize data from multiple MaxCompute source tables, join the tables to form one table and then upload the table.

Data processing plug-ins

To use specific search features or functions, you must configure specific field types. For example, you must use a plug-in that is described in the following table to convert fields of other types to the fields of the Array type. Otherwise, you cannot reference the fields.

Note: You can configure a plug-in when you configure a data source for an application rather than when you define the application schema. You can configure a plug-in only after a data source is configured.

Plug-in

Description

Example

JsonKeyValueExtractor

This plug-in extracts the specified key value from source fields in JSON format. The extracted key value is used as the name of the destination table field. Only the value of the specified key can be extracted.

The value of the title key is extracted from {"title":"the content","body":"the content"}. If the extracted value is in JSON array format, the value is converted to the field value of an array type. Make sure that the type of the extracted value is consistent with that of the destination table field. Otherwise, the extracted value is lost. The preceding JSON array format refers to the JSON array format that is defined by OpenSearch. Sample field of the LITERAL_ARRAY type: {"tags":["a","b","c"]}. Sample field of the INT_ARRAY type: {"tags":[1,2,3]}.

MultiValueSpliter

The source field is split into multiple values that are separated by delimiters. The split content is used as the content of the destination table field. The destination table field must be a field of the ARRAY type.

Note

  • If the delimiters are common non-printable characters, such as \t, you can directly write them. If the delimiters are uncommon non-printable characters, you must use Unicode characters such as \u001D to identify them.

  • In addition, the plug-in supports multi-character delimiters, such as ## and \t\t.

The content of a data source is 1,2,3. You can enter a comma (,) when you specify the delimiter.

KeyValueExtractor

This plug-in extracts the specified keys and values from source fields which are key-value pairs. The extracted keys and values are used as the values of the destination table field. Only the values of the specified key can be extracted. Delimiters are not required.

For the fields of key1:value1,value2;key2:value3, the keys are key1 and key2, key-value pairs are separated by semicolons (;), keys and values are separated by colons (:), and values are separated by commas (,). If you use delimiters to separate the extracted value, the value is converted to the field value of the Array type. Make sure that the type of the extracted value is consistent with that of the destination table field. Otherwise, the extracted value is lost. If two identical keys exist, only the value of the second key is extracted.

StringCatenateExtractor

This plug-in concatenates the values of specified fields into a string in a specified sequence. This plug-in cannot concatenate fields of the INT type. We recommend that you use fields of the LITERAL type. Separate multiple fields with commas (,). The fields must be from destination table fields.

You can use the plug-in to concatenate the field1 field and field2 field into a new field by using an underscore (_). You can also obtain the name of the current table from the system variable $table. $table is displayed only when a table-sharding wildcard is configured.

HTMLTagRemover

This plug-in removes HTML tags from the value of a source field. Then, the value of the destination field is replaced by the value without HTML tags.

The value of a source field is < div id=”copyright”>OpenSearch< /div>. If you use the plug-in to remove HTML tags, the value of the field is parsed as OpenSearch.

Note

More informations, see MultiValueSpliter configuration.