OpenSearch provides several data processing plugins for simple data transformations. This topic describes these data source plugins.
Overview
You can import data into OpenSearch using an API, an SDK, or the upload interface. You can also sync data directly from an ApsaraDB database. If you upload data using an API or an SDK, follow the API reference to upload the data directly. The plugin configurations described in this topic are not supported for this method, and you must process the data before you push it. When you sync data from a cloud database, you can configure the data source information in the console. You can use the data processing plugins for simple transformations when you configure the field mappings.
An OpenSearch table can support multiple RDS and PolarDB source tables, such as in sharding scenarios. However, you can configure only one ODPS source. If you need multiple ODPS source tables, you must merge the data into a single table before you import it.
Data processing plugins
Some search features or functions require special field types. For example, fields of the Array type must be transformed using the following plugins. You cannot enter them directly.
Configure these plugins during data source configuration, not when you define the application schema. You can set up the plugins only after you configure a data source.
|
Configuration item name |
Description |
Example |
|
JsonKeyValueExtractor |
Extracts the value of a specified key from a source field in JSON format. The extracted value becomes the content of the destination field. You can extract the value of only one key. |
Extracts the value of the `title` key from `{"title":"the content","body":"the content"}`. If the value is a JSON array, it is converted to the content of a field of the Array type. You must ensure that the data type of the extracted value matches the data type of the destination field. If the data types do not match, the data is lost. The JSON array format is defined by OpenSearch. For example, for a `literal_array` field type, the format is `{"tags":["a","b","c"]}`. For an `int_array` field type, the format is `{"tags":[1,2,3]}`. |
|
MultiValueSplitter |
Splits the content of a source field into multiple values based on a specified separator. The split values are used as the content of the destination field. The destination field must be of the ARRAY type. If the separator is a non-printable character, you must use a Unicode character to represent it, such as `\u001D`. |
If the source field contains `1,2,3`, you can specify a comma (`,`) as the separator. |
|
KeyValueExtractor |
Extracts the value of a specified key from a source field that is in key-value (KV) format. The extracted value is used as the content of the destination field. You can extract the value of only one key. The parameters for specifying separators are optional. |
For example, the source field contains `key1:value1,value2;key2:value3`. In this example, the key separator is a semicolon (;), the key-value separator is a colon (:), and the multi-value separator is a comma (,). If a multi-value separator is configured, the extracted value is converted to the content of a field of the Array type. You must ensure that the data type of the extracted value matches the data type of the destination field. If the data types do not match, the data is lost. If the source field contains duplicate keys, only the value of the last key is extracted. |
|
StringConcatenateExtractor |
Concatenates the values of multiple specified fields into a single string in a specified order. This plugin does not support fields of the `int` type. You must use fields of the `literal` type instead. The fields in the list must be separated by commas. The specified fields must be destination fields. |
For example, you can concatenate the values of `field1` and `field2` with an underscore (_) to form the content of a new field. Additionally, you can use the system variable `$table` to retrieve the current table name. The `$table` variable is available only when a wildcard character for table sharding is configured. |
|
HTMLTagRemover |
Removes HTML tags from the content of a source field. The content without the HTML tags is copied to the destination field. If the source field is also the destination field, its original content is replaced. |
The plugin parses the text <div id="copyright">OpenSearch</div> and extracts "OpenSearch". |
For best practices, see MultiValueSplitter plugin settings.