All Products
Search
Document Center

Custom analyzers

Last Updated: Sep 09, 2021

Overview

Analysis is a basic but important feature of search engines. Analysis results directly affect search effects. The meaning of a phrase varies with different business scenarios and contexts. Therefore, the expected analysis results change based on diversified business scenarios. In addition to basic analyzers that apply to all industries, OpenSearch provides industry-specific analyzers, such as the analyzer for text from the E-commerce industry. To meet diversified business requirements, OpenSearch allows you to create a custom analyzer by using a built-in analyzer and intervention entries. You can select analyzers when you configure index fields for an application. This way, OpenSearch can adjust the process of analysis during indexing and searches to ensure that search results meet your expectations.

Intervention entries

You can use secondary analysis to manage and intervene the entries.

If you enable secondary analysis, the custom analysis results will be again. If you disable secondary analysis, the custom analysis results are retained.

The following example shows the entry "开放搜索" and the analyzer is Chinese general analyzer.
If you enable secondary analysis, the results are
1
If you disabled secondary analysis, the results are
2

  • Usage notes

  • You can create up to 20 custom analyzers by using the new OpenSearch console.

  • A custom analyzer can contain up to 1,000 intervention entries.

  • The key of each entry cannot exceed 10 characters in length and the value of each entry cannot exceed 32 characters in length. Each character can be a Chinese character or a letter.

  • The key and value of an entry cannot contain uppercase letters, full-width characters (\uff01 - \uff5e), and Chinese punctuation marks.

  • The key and value of an intervention entry for semantic-based analysis must be the same after spaces in the value are deleted. Sample entries:

    不正确的词条 => 错误 的 词条
    正确的词条 => 正确 的 词条

    The first entry is invalid because the key is not the same as the value after spaces are deleted.

  • The key of an entry cannot contain spaces. Sample entries:

    不正确 词条 => 不 正确 词条
    正确词条 => 正确 词条

    The first entry is invalid because the key contains spaces.

  • The key of an entry cannot be part of the value of another entry in the same intervention dictionary. Sample entries:

    自定义分词器 => 自定义 分词器
    分词器
    分词

    The second entry is invalid because its key "分词器" is part of the value of the first entry. The third entry is valid.

Procedure for creating and using a custom analyzer

Overview

1. Create a custom analyzer. 2. Modify an application offline. 3. Rebuild indexes. 4. Use the custom analyzer.

Procedure

1.Log on to OpenSearch console. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration. On the Basic Configuration page, click Analyzer Management in the left-side pane. On the Analyzer Management page, click Create.

3

2.In the Create Analyzer panel, enter an analyzer name, select a analyzer type, and then click Save.

4

3.On the Manage Entries page of the created custom analyzer, click Add. In the Add Intervention Entry panel, set the Search Query and Analysis Results parameters, and turn on Secondary Analysis. In this example, the phrase "糯米" is used.

5

Note: Separate terms with spaces. Example: "糯米" => "糯 米"

4.Run an analysis test to check analysis results after the added intervention entry takes effect.

6
  • 4.1.Enter 糯米 in the Test Text field.

7
  • 4.2.The following figure shows the analysis results of multiple custom analyzers.

8

5.After the analysis test is complete, go to the Basic Configuration page to modify an application offline.

9

Note: OpenSearch generates an offline application based on the settings of an online application. If you modify the offline application, the online application is not affected.

6.In the Index Field List section, find the index for which you want to configure the custom analyzer and select the custom analyzer from the drop-down list in the Analysis Method column.

10

7.Wait until the custom analyzer takes effect after reindexing.

11

Usage notes

  • The new OpenSearch console allows you to add intervention entries to existing custom analyzers. If you add intervention entries to a custom analyzer that is used by an application, the intervention entries take effect only after reindexing is performed. If you want the intervention entries to take effect at the earliest opportunity, you can update documents whose analysis results are not expected to trigger reindexing.

  • The key of an entry in a custom analyzer cannot exceed 10 characters in length.

  • The key of an entry in a custom analyzer cannot contain uppercase letters, full-width characters, and Chinese punctuation marks.

  • The value of an entry in a custom analyzer cannot contain uppercase letters, full-width characters, and Chinese punctuation marks.

  • If you turn off Secondary Analysis, OpenSearch does not segment the terms that are generated at the first time. Otherwise, OpenSearch further segments the terms.

  • Only applications of the Industry-specific Enhanced Edition can use custom analyzers that are created based on the common analyzer for the E-commerce industry.

  • You cannot delete a custom analyzer that is used by an application.