All Products
Search
Document Center

OpenSearch:Fuzzy analyzer

Last Updated:Aug 27, 2024

Overview

The fuzzy analyzer (fuzzy) is suitable for fuzzy searches in which the search intent is unclear. In fuzzy searches, OpenSearch Vector Search Edition retrieves the documents that are relevant to search queries in fuzzy match mode. If a search query is the full spellings, abbreviations, or acronyms of specific characters in a document or the search query is contained in a document, the document is determined as relevant to the search query. In fuzzy searches, OpenSearch Vector Search Edition cannot identify the search intent and may retrieve a large number of unexpected documents. We recommend that you use fuzzy searches based on actual business scenarios.

To use the fuzzy analyzer, set the analyzer to fuzzy when you configure a schema. When you perform a query, set the analyzer for the corresponding index to fuzzy_search_analyzer in the analyzer clause.

Example:
Configure a schema.
{
    "fields":[
        {
            "field_type":"INT64",
            "field_name":"id"
        },
        {
            "field_type":"TEXT",
            "field_name":"title",
            "analyzer":"fuzzy"
        }
    ],
    "summarys":{
        "compress":false,
        "summary_fields":[
            "id",
            "title"
        ]
    },
    "indexs":[
        {
            "has_primary_key_attribute":true,
            "index_fields":"id",
            "is_primary_key_sorted":true,
            "index_name":"id",
            "index_type":"PRIMARYKEY64"
        },
        {
            "doc_payload_flag":1,
            "index_fields":"title",
            "index_name":"title_index",
            "index_type":"TEXT"
        }
    ],
    "attributes":[
        "id"
    ],
    "table_name":"test_table"
}

Query: config=start:0,hit:10,format:json&&query=title_index:'abc'&&analyzer=specific_index_analyzer:title_index#fuzzy_search_analyzer

Note:
* The fuzzy analyzer is specified for the title field in the schema, and title_index is created.
* In the query, fuzzy_search_analyzer is specified for title_index in the analyzer clause.

Scenarios

Fuzzy searches are suitable if your search intent is unclear or you want to increase the number of documents that are retrieved in search results. Fuzzy searches apply to the following scenarios:

Pinyin searches

Description: In this scenario, you can use search queries that are in the form of full pinyin spellings or pinyin abbreviations or acronyms to retrieve Chinese documents.

Example:

Document: 开放搜索
Search queries: "kai", "kaifang", "sousuo", "kaifangsousuo", "k", "kf", "ss", and "kfss" 
All these search queries can be used to retrieve the document.

Usage notes

  • Double quotation marks (" ") are used in pinyin searches.

  • If you want the Chinese characters that are specified in a search query to be consecutive in the retrieved documents, you can enclose the search query in double quotation marks (" "). In most cases, Chinese characters that are specified in the form of full pinyin spellings or pinyin abbreviations or acronyms in a search query are expected to be consecutive in the retrieved documents. For example, if the search query is "kfss", "开放搜索" is expected to be contained in the retrieved documents. Therefore, we recommend that you enclose search queries in double quotation marks (" ") for pinyin searches.

Prefix searches

Description: In this scenario, you can search for the content that is prefixed with a specified search query.

Example:

# In prefix searches, a caret (^) is used as the prefix identifier. If mobile numbers that are prefixed with "138" are expected to be returned, 
the search query can be in the format of "^138". Take note that double quotation marks (" ") are required.

Usage notes

  • Prefix searches do not support Chinese characters.

  • In prefix searches, you must enclose search queries in double quotation marks (" ").

Suffix searches

Description: In this scenario, you can search for the content that is suffixed with a specified search query.

Example:

# In suffix searches, a dollar sign ($) is used as the suffix identifier. If mobile numbers that are suffixed with "9527" are expected to be returned, 
the search query can be in the format of "9527$". Take note that double quotation marks (" ") are required.

Usage notes

  • Suffix searches do not support Chinese characters.

  • In suffix searches, you must enclose search queries in double quotation marks (" ").

Searches of single letters or words

Description: In this scenario, you can search for specific content by specifying search queries in the form of single letters or words. This type of search is applicable if you want to obtain more documents in the search results. However, the search results may not be accurate.

Example:

# Document: '开放搜索 open search'
Query clause: query=default:'放' or query=default:'o'. Both query clauses can be used to retrieve the document.

Phrase searches

Description: In this scenario, you must enclose search queries in double quotation marks (" "). In phrase searches, only documents that contain the consecutive string of letters and digits in search queries are retrieved.

Example:

# Query clause 1: query=default:"开放搜索"
In this case, only documents that contain "xxx开放搜索xxx" can be retrieved. Documents that contain "xxx搜索开放xxx" cannot be retrieved.

# Query clause 2: query=default:"华为P"
In this case, documents that contain "华为P20" cannot be retrieved. This is because "华为P20" is not contained in the query clause as a consecutive string of letters and digits. If you want to use this query clause to obtain the documents that contain "华为P20", enclose the search query in single quotation marks (' ').

Usage notes

  • To implement phrase searches, you must enclose search queries in double quotation marks (" ").

  • In phrase searches, the search results feature high accuracy and the system retrieves a small number of documents. This type of search consumes more resources. We recommend that you use the general analyzer for Chinese in phrase searches.

  • Fuzzy searches are suitable if the search intent is unclear or you want more documents to be returned in the search results. Except for pinyin searches, prefix searches, suffix searches, and phrase searches, you must enclose search queries in single quotation marks (' ').

Usage notes

  • By default, the documents that are returned for a fuzzy search are sorted based on the sequence of the matched term in the field value. For example, you want to use the title field in your application for fuzzy searches. Document 1 contains "开放搜索" and the document 2 contains "喜欢使用开放搜索". If you set the search query to "kfss", document 1 is sorted before document 2 by default.

  • You cannot perform suffix or prefix searches by using search queries that contain Chinese characters. When you perform a suffix or prefix search, the search query can contain only letters, digits, and pinyin spellings.

  • Punctuation marks in the values of the fields that use the fuzzy analyzer are filtered out.

  • Full-width characters in the values of the fields that use the fuzzy analyzer are converted to half-width characters.

  • Letters, digits, and pinyin spellings in the values of the fields that use the fuzzy analyzer cannot be highlighted.