All Products
Search
Document Center

OpenSearch:Custom text analyzers

Last Updated:Mar 06, 2023

Overview

Analysis is a basic but important feature of search engines. Analysis results directly affect search performance. The meaning of a phrase varies with different business scenarios and contexts. Therefore, the expected analysis results change based on diversified business scenarios. In addition to basic analyzers that apply to all industries, OpenSearch provides industry-specific analyzers, such as the analyzer for text from the E-commerce industry.

To meet diversified business requirements, OpenSearch allows you to create a custom analyzer by using a built-in analyzer and intervention entries. You can select analyzers when you configure index fields for an application. This way, OpenSearch can adjust the process of analysis during indexing and searches to ensure that search results meet your expectations.

Intervention entries

You can manage intervention entries by using the secondary analysis feature.

If you enable secondary analysis, the text in the results of the original custom analyzer is segmented again. If you disable secondary analysis, the results of the original custom analyzer are retained.

For example, the entry is "开放搜索" and the general analyzer for Chinese text is used. The following figure shows the results with secondary analysis enabled.

image

The following figure shows the results with secondary analysis disabled.

image

Usage notes

  • The entries in a custom analyzer are composed of all entries for the specified analyzer type and entries manually added to the analyzer. The manually added entries have a higher priority than the entries for the specified analyzer type.

  • Up to 20 custom analyzers can be created in the new OpenSearch console.

  • A custom analyzer can contain up to 1,000 intervention entries.

  • The key of each entry cannot exceed 10 characters in length, and the value of each entry cannot exceed 32 characters in length. Each character can be a Chinese character or a letter.

  • The key and value of an entry cannot contain uppercase letters, full-width characters, or Chinese punctuation marks.

  • The key and value of an intervention entry for semantic-based analysis must be the same after spaces in the value are removed. Sample entries:

    The key is "不正确的词条", and the value is "错误 的 词条".
    The key is "正确的词条", and the value is "正确 的 词条".

    The first entry is invalid because the key is not the same as the value after spaces are removed.

  • The key of an entry cannot contain spaces. Sample entries:

    The key is "不正确 词条", and the value is "不 正确 词条".
    The key is "正确词条", and the value is "正确 词条".

    The first entry is invalid because the key contains spaces.

  • The key of an entry cannot be part of the value of another entry in the same intervention dictionary. Sample entries:

    The key is "自定义分词器", and the value is "自定义 分词器".
    The key is 分词器.
    The key is 分词.

    The second entry is invalid because its key "分词器" is part of the value of the first entry. The third entry is valid.

Create and use a custom analyzer

Process

1. Create a custom analyzer. 2. Modify the offline version of an application. 3. Perform reindexing. 4. Use the custom analyzer.

Procedure

1. Log on to the OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Analyzer Management in the left-side pane. On the Analyzer Management page, click Create on the Text Analyzer tab.

image

2. In the Create Analyzer panel, enter an analyzer name, select an analyzer type, select an analyzer, and then click Save.

image

3. On the Text Analyzer tab, find the created custom analyzer and click Manage Entries in the Actions column. On the Manage Entries page, click Add. In the Add Intervention Entries panel, configure the Search Query and Analysis Results parameters, and turn on Secondary Analysis. In this example, the phrase "糯米" is used.

image

Note: Separate terms with spaces. Example: The key is "糯米", and the value is "糯 米".

4. Run an analysis test to check analysis results after the added intervention entry takes effect.

image
  • Enter 糯米 in the Test Text field.

image
  • The following figure shows the analysis results of multiple custom analyzers.

image

5. After the analysis test is complete, go to the Basic Configuration page to modify the offline version of an application.

image

Note: OpenSearch generates an offline version for an application based on the settings that you modify for the online version of the application. If you modify the offline application, the online application is not affected.

6. In the Index Field List section, find the index for which you want to configure the custom analyzer and select the custom analyzer from the drop-down list in the Analysis Method column.

image

7. Wait until the custom analyzer takes effect after reindexing.

image

Search results of a custom analyzer

For example, you use the general analyzer for Chinese text, but documents that contain "糯米", "小米", or "大米" cannot be retrieved when you search for "米".

In this case, you can perform the preceding operations to create a custom analyzer that is named test_zw. After you modify the schema of the application for which the custom analyzer is configured and perform reindexing, the documents can be retrieved as expected, as shown in the following figure.

Usage notes

  • The new OpenSearch console allows you to add intervention entries to existing custom analyzers. If you add intervention entries to a custom analyzer that is used by an application, the intervention entries take effect only after reindexing is performed. If you want the intervention entries to take effect at the earliest opportunity, you can update documents whose analysis results are not as expected to trigger reindexing.

  • The key of an entry in a custom analyzer cannot exceed 10 characters in length.

  • The key of an entry in a custom analyzer cannot contain uppercase letters, full-width characters, or Chinese punctuation marks.

  • The value of an entry in a custom analyzer cannot contain uppercase letters, full-width characters, or Chinese punctuation marks.

  • If you disable secondary analysis, OpenSearch does not segment the terms that are generated the first time. Otherwise, OpenSearch further segments the terms.

  • Only applications of the Industry-specific Enhanced Edition can use custom analyzers that are created based on the general analyzer for text from the E-commerce industry.

  • You cannot delete a custom analyzer that is used by an application.