OpenSearch provides data processing plugins for simple data transformations during data source configuration.
Overview
You can import data into OpenSearch by using an API, an SDK, or the upload interface, or sync data directly from an ApsaraDB database. If you upload data by using an API or SDK, follow the API reference. The plugins in this topic do not apply to this method, and you must process the data before you push it. If you sync data from a cloud database, configure the data source in the console. You can use data processing plugins for simple transformations when you configure field mappings.
An OpenSearch table can support multiple RDS and PolarDB source tables, such as in sharding scenarios. However, you can configure only one ODPS source. If you need multiple ODPS source tables, you must merge the data into a single table before you import it.
Data processing plugins
Some search features require special field types. For example, Array-type fields must be transformed by using the following plugins and cannot be entered directly.
Configure these plugins during data source configuration, not when you define the application schema. You can set up the plugins only after you configure a data source.
|
Configuration item name |
Description |
Example |
|
JsonKeyValueExtractor |
Extracts the value of a specified key from a JSON-formatted source field and writes it to the destination field. Only one key can be extracted per configuration. |
Extracts the value of the `title` key from `{"title":"the content","body":"the content"}`. If the value is a JSON array, it is converted to the content of a field of the Array type. You must ensure that the data type of the extracted value matches the data type of the destination field. If the data types do not match, the data is lost. The JSON array format is defined by OpenSearch. For example, for a `literal_array` field type, the format is `{"tags":["a","b","c"]}`. For an `int_array` field type, the format is `{"tags":[1,2,3]}`. |
|
MultiValueSplitter |
Splits a source field into multiple values by using a specified separator and writes the results to the destination field, which must be of the ARRAY type. For non-printable separators, use Unicode representation, such as `\u001D`. |
If the source field contains `1,2,3`, you can specify a comma (`,`) as the separator. |
|
KeyValueExtractor |
Extracts the value of a specified key from a source field in key-value (KV) format and writes it to the destination field. Only one key can be extracted per configuration. Separator parameters are optional. |
For example, the source field contains `key1:value1,value2;key2:value3`. In this example, the key separator is a semicolon (;), the key-value separator is a colon (:), and the multi-value separator is a comma (,). If a multi-value separator is configured, the extracted value is converted to the content of a field of the Array type. You must ensure that the data type of the extracted value matches the data type of the destination field. If the data types do not match, the data is lost. If the source field contains duplicate keys, only the value of the last key is extracted. |
|
StringConcatenateExtractor |
Concatenates the values of multiple fields into a single string in a specified order. This plugin supports only `literal`-type fields, not `int`-type fields. Separate fields in the list with commas. All specified fields must be destination fields. |
For example, you can concatenate the values of `field1` and `field2` with an underscore (_) to form the content of a new field. Additionally, you can use the system variable `$table` to retrieve the current table name. The `$table` variable is available only when a wildcard character for table sharding is configured. |
|
HTMLTagRemover |
Strips HTML tags from a source field and copies the plain text to the destination field. If the source and destination fields are the same, the original content is replaced. |
The plugin parses the text <div id="copyright">OpenSearch</div> and extracts "OpenSearch". |
For best practices, see MultiValueSplitter plugin settings.