Convert Solr syntax, schema, and features for OpenSearch - OpenSearch

Schema

OpenSearch supports various data types and analyzers to meet the requirements in most scenarios. Take note of the following items:

Supported data types: fields of the INT, INT_ARRAY, FLOAT, FLOAT_ARRAY, DOUBLE, DOUBLE_ARRAY, LITERAL, LITERAL_ARRAY, TEXT, and SHORT_TEXT types.
DynamicField: Dynamic fields are not supported. OpenSearch allows you to modify the schema of an application. Therefore, you can dynamically modify the fields.
CopyField: Copying fields are not supported. Therefore, you can merge table fields in advance.
A maximum of 256 fields are allowed. If more than 256 fields need to be added for an OpenSearch application, you can merge the fields that are used for non-range queries into a single field of the ARRAY type. This can reduce the total number of fields that are added for the application.
patternTokenizer: OpenSearch supports custom analyzers for analysis. However, the default delimiter is \t. In this case, you must convert the original delimiter to \t.
LOCATION: This type of field is converted to two fields of the FLOAT or DOUBLE type, which are used to store the longitude and latitude values.
BOOLEAN: This type of field is converted to a field of the INT type, whose value is 0 or 1.
DATE: This type of field is converted to a field of the INT type. After you push the fields of this type from a data source to an OpenSearch application, the fields are automatically converted to timestamps in units of milliseconds. If you push the fields to an OpenSearch application by calling API operations, you must manually convert them.
Payload analyzer: This type of analyzer is not supported.
Bitwise analyzer: This type of analyzer is not supported.
Paoding analyzer: This type of analyzer uses Chinese basic analyzers of OpenSearch.

Search syntax

OpenSearch supports features such as query, filtering, statistics, aggregation, and sorting.

q: required. This parameter is equivalent to a query in OpenSearch. The following table describes the specific conversion rules.

q conversion rules

: is not supported.

The range index is converted to the filtered range.

+A ==> A

-A ==> This type of conversion is not supported.

A AND B ==> A AND B

A AND -B ==> A ANDNOT B

A OR B ==> A OR B

A OR +B ==> A RANK B

A AND B OR C ==> A AND B RANK C. Example: Hongfushi AND Apple OR Shandong.

A OR B AND C ==> B AND C RANK A. Example: Hongfushi OR Apple AND Shandong.

A AND B OR +C ==> A AND B AND C. Example: Hongfushi AND Apple OR +Shandong.

A OR +B AND C ==> B AND C RANK A. Example: Hongfushi OR +Apple AND Shandong.

+A OR B AND C ==> A AND B AND C. Example: +Hongfushi OR Apple AND Shandong.

A AND B OR -C ==> (A AND B) ANDNOT C. Example: Hongfushi AND Apple OR -Shandong.

A AND -B OR C ==> A ANDNOT B RANK C. Example: Apple AND -Hongfushi OR Shandong.

-A AND B OR C ==> B ANDNOT A RANK C. Example: -Hongfushi AND Apple OR Shandong.

A OR B AND -C ==> B ANDNOT C RANK A. Example: Hongfushi OR Apple AND -Shandong.

A OR -B AND C ==> C ANDNOT B RANK A. Example: Hongfushi OR -Shandong AND Apple.

-A OR B AND C ==> (B AND C) ANDNOT A. Example: -Hongfushi OR Shandong AND Apple.

A OR B OR -C == A OR -C OR B == -C OR A OR B ==> (A OR B) ANDNOT C

A AND B OR C AND D ==> A AND B AND C AND D

fq: filters documents that are retrieved. This affects the document retrieval but not the calculation of matching scores. The filter field is used for non-fuzzy queries. The query field is used for fuzzy queries. Do not specify this field when you use the sorting feature.
fl: You can use the fetch_fields parameter of OpenSearch to define the return value.
hl: configures the summary and HTML tag in the OpenSearch console.
start and rows: equivalent to start and hit in the config clause.
wt: equivalent to format in the config clause.
df: the default field of the query.
sort: field desc => -field: sorts results in descending order based on field types. field asc => +field: sorts results in ascending order based on field types. score => sort=RANK: sorts the sorted results in ascending order.
facet: the index attribute that must be configured for the field.

Statistics conversion rules

facet.field => The group_key parameter in the aggregate clause of OpenSearch.

facet.limit => The max_group parameter in the aggregate clause of OpenSearch. The default value is 1000.

facet.mincount => Not supported. You must manually process all the results.

facet.offset => Not supported. You must manually configure pagination for all the results.

facet.sort => Not supported. You must manually sort all the results.

facet=true&facet.field=price&facet.limit=200 ==> aggregate=group_key:price,agg_fun:count(),max_group:200

group: not supported. You can use the distinct clause with the sort clause to sort data for some simple scenarios.
stats: Some of its features are equivalent to the features of the aggregate clause in OpenSearch. However, agg_func supports only min, max, count, and avg. It does not support missing, sumOfSquares, mean, stddev, distinctValue, or countDistinct.

Search features

Deep pagination: OpenSearch provides two query interfaces: search and scroll. Search is a common query scenario. In this scenario, a maximum of 5,000 results can be returned. Page turning is supported. A maximum of 500 results can be displayed on each page. Scroll is a data export scenario. In this scenario, tens of millions of data records can be exported. Sorting is not supported. The results that are returned can be further analyzed.
Accuracy of statistical results: To ensure better retrieval performance, OpenSearch performs sampling and estimation in many cases, which may lead to inaccurate statistical results.
Total number of search results: To ensure the search performance, OpenSearch estimates the total number of results to be returned for queries regardless of the total amount of data.
Multiple OR operators in a query: The length of a query string can be up to 1 KB after encoding. If excessive OR operators are used, an error occurs and no results are returned. In this case, we recommend that you increase the upper limit of the query string length or concurrently perform multiple queries and merge the results that are returned.