All Products
Search
Document Center

Fuzzy searches

Last Updated: Sep 09, 2021

Overview

Fuzzy searches are applicable if the search intent is unclear. In fuzzy searches, the system retrieves the documents that are relevant to search queries in fuzzy match mode. If a search query is the pinyin or pinyin initials of specific Chinese characters in a document, or the search query is contained in the document, the document is determined as relevant to the search query. In fuzzy searches, the system cannot identify the search intent and may retrieve a large number of documents that are not expected. Exercise caution when you implement fuzzy searches. We recommend that you use this feature only for specific business requirements.

Precautions

  • The analyzer for fuzzy searches applied to only fields of the SHORT_TEXT type.

  • Generally, you can use single quotation marks (' ') to implement fuzzy searches. The following section describes the specific scenarios in which double quotation marks (" ") are required.

Scenarios

Fuzzy searches are applicable if the search intent is unclear or more documents need to be retrieved in the search results. Fuzzy searches apply to the following scenarios:

Pinyin searches

Description: In this scenario, search queries are in the form of pinyin or pinyin initials and Chinese documents are retrieved.

Example:

# Sample document: 开放搜索
Sample search queries: "kai", "kaifang", "sousuo", "kaifangsousuo", "k", "kf", "ss", and "kfss" 
All these preceding search queries can be used to retrieve the sample document.

Usage notes:

  • Double quotation marks (" ") are used in pinyin searches.

  • If you want the Chinese characters that are specified in a search query to be consecutive in the retrieved documents, you can enclose the search query in double quotation marks (" "). In most cases, Chinese characters that are specified in the form of pinyin or pinyin initials in a search query are expected to be consecutive in the retrieved documents. For example, if the search query is "kfss", "开放搜索" is expected to be contained in the retrieved documents. Therefore, we recommend that you enclose search queries in double quotation marks (" ") for pinyin searches.

Prefix searches

Description: In this scenario, you can search for the content that is prefixed with a specified search query.

Example:

# In prefix searches, caret (^) is used as the prefix identifier. If mobile numbers that are prefixed with "138" are expected to be returned,
the search query can be in the format of "^138". Note that the double quotation marks (" ") are required.

Usage notes:

  • Prefix searches do not support Chinese characters.

  • You must enclose search queries in double quotation marks (" ").

Suffix searches

Description: In this scenario, you can search for the content that is suffixed with a specified search query.

Example:

# In suffix searches, the dollar signs ($) is used as the suffix identifier. If mobile numbers that are suffixed with "9527" are expected to be returned,
the search query can be in the format of "9527$". Note that the double quotation marks (" ") are required.

Usage notes:

  • Suffix searches do not support Chinese characters.

  • You must enclose search queries in double quotation marks (" ").

Searches of single characters or words

Description: In this scenario, you can search for specific content by specifying search queries in the form of single characters or words. This type of search is applicable if you want to obtain more documents in the search results. However, the search results may not be accurate.

Example:

# Sample document: '开放搜索 open search'
Sample query clauses: query=default:’放’ and query=default:’o’. Both of them can be used to retrieve the sample document.

Phrase searches

Description: In this scenario, you must enclose search queries in double quotation marks (" "). In phrase searches, only documents that contain the consecutive letters and digits in search queries are retrieved.

Example:

# Sample query clause 1: query=default:"开放搜索"
In this case, only documents that contain "xxx开放搜索xxx" can be retrieved. Documents that contain “xxx搜索开放xxx” cannot be retrieved.

# Sample query clause 2: query=default:"华为P"
In this case, documents that contain “华为P20” cannot be retrieved. This is because the query clause does not contain the consecutive letter and digits "P20". If you want to use the sample search query to obtain the required documents, enclose the search query in double quotation marks (" ").

Usage notes:

  • To implement phrase searches, you must enclose search queries in double quotation marks (" ").

  • In phrase searches, the search results feature high accuracy and the system retrieves a small number of documents. This type of search consumes more resources. We recommend that you use the general analyzer for Chinese in phrase searches.

  • Fuzzy searches are applicable if the search intent is unclear or more documents need to be retrieved in the search results. Except for pinyin searches, prefix searches, suffix searches, and phrase searches, you must enclose search queries in single quotation marks (' ').

Limits

To use the fuzzy search feature, when you create an application, you must set the fields that are used for fuzzy searches to the SHORT_TEXT type and specify an analyzer for fuzzy searches. By default, the documents that are returned in a fuzzy search result are ranked based on the position of the matched term in the field value. For example, you want to use the title field in your application for fuzzy searches. The doc1 document contains "开放搜索" and the doc2 document contains "喜欢使用开放搜索". If you specify the search query as "kfss", the doc1 document is ranked before the doc2 document by default. Fuzzy searches apply to the scenarios in which the search intent is unclear. Take note of the following rules when you implement fuzzy searches:

  • You cannot implement suffix or prefix searches for search queries that contain Chinese characters. Search queries that contain only letters, digits, and pinyin are supported.

  • The punctuation marks in the values of fields of the SHORT_TEXT type are filtered out.

  • After punctuation marks are filtered out from a field of the SHORT_TEXT type, up to 100 bytes of the field value can be retained. The excess part is discarded.

  • You can create a drop-down suggestions model based on a field of the SHORT_TEXT type.

  • You cannot use the query analysis feature for indexes that are created based on fields of the SHORT_TEXT type.

  • If only the analyzer for fuzzy searches is used for a field of the SHORT_TEXT type to create an index, full-width characters are converted into half-width characters in search result summaries. To prevent the conversion, you can use an analyzer for Chinese to create an index.

  • In search result summaries, letters, digits, and pinyin cannot be highlighted.