All Products
Search
Document Center

OpenSearch:Fuzzy searches

Last Updated:Feb 13, 2023

Overview

Fuzzy searches are applicable if the search intent is unclear. In fuzzy searches, the system retrieves the documents that are relevant to search queries in fuzzy match mode. If a search query is the full pinyin spellings or pinyin acronyms of specific Chinese characters in a document or the search query is contained in the document, the document is determined as relevant to the search query. In fuzzy searches, the system cannot identify the search intent and may retrieve a large number of unexpected documents. We recommend that you use fuzzy searches based on actual business scenarios.

Precautions

  • The analyzer for fuzzy searches applied to only fields of the SHORT_TEXT type.

  • In most cases, you can use single quotation marks (' ') to implement fuzzy searches. The following section describes the specific scenarios in which double quotation marks (" ") are required.

Scenarios

Fuzzy searches are suitable if the search intent is unclear or you want to increase the number of documents that are retrieved in search results. Fuzzy searches apply to the following scenarios:

Pinyin searches

Description: In this scenario, you can use search queries that are in the form of full pinyin spellings or pinyin abbreviations to retrieve Chinese documents.

Example:

Document: 开放搜索
Search queries: "kai", "kaifang", "sousuo", "kaifangsousuo", "k", "kf", "ss", and "kfss" 
All these search queries can be used to retrieve the document.

Usage notes:

  • Double quotation marks (" ") are used in pinyin searches.

  • If you want the Chinese characters that are specified in a search query to appear consecutively in the retrieved documents, you can enclose the search query in double quotation marks (" "). In most cases, Chinese characters that are specified in the form of full pinyin spellings or pinyin abbreviations in a search query are expected to be consecutive in the retrieved documents. For example, if the search query is "kfss", "开放搜索" is expected to be contained in the retrieved documents. Therefore, we recommend that you enclose search queries in double quotation marks (" ") for pinyin searches.

Prefix searches

Description: In this scenario, you can search for the content that is prefixed with a specific search query.

Example:

# In prefix searches, a caret (^) is used as the prefix identifier. If mobile numbers that are prefixed with "138" are expected to be returned,
the search query can be in the format of "^138". Take note that double quotation marks (" ") are required.

Usage notes:

  • Prefix searches do not support Chinese characters.

  • In prefix searches, you must enclose search queries in double quotation marks (" ").

Suffix searches

Description: In this scenario, you can search for the content that is suffixed with a specific search query.

Example:

# In suffix searches, a dollar sign ($) is used as the suffix identifier. If mobile numbers that are suffixed with "9527" are expected to be returned,
the search query can be in the format of "9527$". Take note that double quotation marks (" ") are required.

Usage notes:

  • Suffix searches do not support Chinese characters.

  • In suffix searches, you must enclose search queries in double quotation marks (" ").

Searches of single characters or words

Description: In this scenario, you can search for specific content by specifying search queries in the form of single characters or words. This type of search is suitable if you want to obtain more documents in the search results. However, the search results may not be accurate.

Example:

# Document: '开放搜索 open search'
Query clause: query=default:'放' or query=default:'o'. Both query clauses can be used to retrieve the document.

Phrase searches

Description: In this scenario, you must enclose search queries in double quotation marks (" "). In phrase searches, only documents that contain the consecutive string of letters and digits in search queries are retrieved.

Example:

# Query clause 1: query=default:"OpenSearch"
In this case, only documents that contain "xxxOpenSearchxxx" can be retrieved. Documents that contain "xxxSearchOpenxxx" cannot be retrieved.

# Query clause 2: query=default:"HUAWEIP"
In this case, documents that contain "HUAWEIP20" cannot be retrieved. This is because "HUAWEIP20" is not contained in the query clause as a consecutive string of letters and digits. If you want to use this query clause to obtain the documents that contain "HUAWEIP20", enclose the search query in single quotation marks (' ').

Usage notes:

  • In phrase searches, you must enclose search queries in double quotation marks (" ").

  • Phrase searches help improve the accuracy of search results and reduce the number of documents that are retrieved. This type of search consumes more resources. We recommend that you use the general-purpose Chinese text analyzer in phrase searches.

  • Fuzzy searches are suitable if the search intent is unclear or you want more documents to be returned in the search results. Except for pinyin searches, prefix searches, suffix searches, and phrase searches, you must enclose search queries in single quotation marks (' ').

Limits

To use the fuzzy search feature, when you create an application, you must set the fields that are used for fuzzy searches to the SHORT_TEXT type and specify an analyzer for fuzzy searches. By default, the documents that are returned for a fuzzy search are sorted based on the sequence of the matched term in the field value. For example, you want to use the title field in your application for fuzzy searches. The doc1 document contains "开放搜索" and the doc2 document contains "喜欢使用开放搜索". If you set the search query to "kfss", the doc1 document is sorted before the doc2 document by default. Fuzzy searches apply to the scenarios in which the search intent is unclear. Take note of the following rules when you implement fuzzy searches:

  • You cannot implement suffix or prefix searches for search queries that contain Chinese characters. Search queries that contain only letters, digits, and pinyin are supported.

  • The punctuation marks in the values of fields of the SHORT_TEXT type are filtered out.

  • After punctuation marks are filtered out from a field of the SHORT_TEXT type, up to 100 bytes of the field value can be retained. The excess part is discarded.

  • You can create a drop-down suggestions model based on a field of the SHORT_TEXT type.

  • You cannot use the query analysis feature for indexes that are created based on fields of the SHORT_TEXT type.

  • If only the analyzer for fuzzy searches is used for a field of the SHORT_TEXT type to create an index, full-width characters are converted into half-width characters in search result summaries. To prevent the conversion, you can use an analyzer for Chinese to create an index.

  • In search result summaries, letters, digits, and pinyin cannot be highlighted in red.