All Products
Search
Document Center

OpenSearch:General-purpose Chinese text analyzer

Last Updated:Apr 01, 2026

chn_standard is the general-purpose Chinese text analyzer for Open Search Retrieval Engine Edition. It tokenizes Chinese text based on semantics and is suitable for all industries over the entire network. Unlike basic character-splitting approaches, chn_standard produces both base search units and extended terms, improving search recall without requiring manual synonym configuration.

How it works

chn_standard splits text into search units, the minimum granularity used during analysis. In addition to base search units, the analyzer generates extended terms — semantically related expansions of a base unit — to broaden search recall.

Example

Input: 菊花茶

TokenType
菊花Base search unit
Base search unit
花茶Extended term (derived from )

Customize tokenization with a dictionary

To control how chn_standard tokenizes specific terms, add intervention entries to its dictionary.

An intervention entry is a medium-granularity entry. When a user searches for an intervention entry, the engine looks it up in the chn_standard.dict dictionary, then converts it into its constituent search units for matching.

Example

Add 搜索引擎 as an intervention entry. When a user searches for 搜索引擎, the engine:

  1. Finds 搜索引擎 in the chn_standard.dict dictionary.

  2. Converts it into two search units: 搜索 and 引擎.

To add an intervention entry:

  1. Go to Advanced settings and open the chn_standard.dict dictionary.

  2. Add the term as an intervention entry.

  3. Publish the modified configuration as a new version.

Constraints

  • chn_standard applies only to fields of the TEXT data type.

  • To enable it, set the analyzer field to chn_standard when configuring a schema.