Text analyzers - OpenSearch - Alibaba Cloud Documentation Center

OpenSearch text analyzers control how field values are broken into tokens during indexing and how search queries are parsed. Choosing the right analyzer for each field directly determines which queries retrieve a document.

Analyzer overview

The following table summarizes all available analyzers. Use it to narrow down your options before reading the detailed reference sections.

Analyzer	Supported field types	Best for	Availability
Keyword analyzer	LITERAL, ARRAY, INT	Exact-match searches; tags, IDs, whole strings	—
General analyzer for Chinese	TEXT, SHORT_TEXT	General Chinese full-text search	—
E-commerce analyzer for Chinese	TEXT, SHORT_TEXT	Chinese product catalog search	—
Single character analyzer for Chinese	TEXT, SHORT_TEXT	Non-semantic Chinese searches; author or store names	—
Fuzzy analyzer	SHORT_TEXT	Pinyin search; prefix/suffix search; single-letter search	—
Word stemming analyzer for English	TEXT, SHORT_TEXT	English full-text search with stemming	—
Unstemmed word analyzer for English	TEXT, SHORT_TEXT	Non-semantic English searches; book titles, author names	—
Analyzer for fine-grained analysis for English	TEXT, SHORT_TEXT	English full-text search with compound-word splitting	Exclusive applications
Full pinyin spelling analyzer	SHORT_TEXT	Full-pinyin or first-letter abbreviated pinyin search	—
Abbreviated pinyin spelling analyzer	SHORT_TEXT	First-letter abbreviated pinyin search	—
Simple analyzer	TEXT, SHORT_TEXT	Custom tokenization with tab-separated terms	—
Numerical value analyzer	INT, TIMESTAMP	Time-interval and numerical-range queries	—
Geo-location analyzer	GEO_POINT	Geographical-location queries	—
IT content analyzer	TEXT, SHORT_TEXT	Technical content with IT-specific terminology (e.g., `c++`)	—
General analyzer for E-commerce for Chinese	TEXT	Chinese E-commerce product search using NLP	Industry-specific Enhanced Edition for E-commerce
General analyzer for Thai	TEXT, SHORT_TEXT	Thai full-text search	Exclusive applications
Analyzer for E-commerce for Thai	TEXT, SHORT_TEXT	Thai E-commerce product search	Exclusive applications
General analyzer for Vietnamese	TEXT, SHORT_TEXT	Vietnamese full-text search	Exclusive applications
General analyzer for Gaming	TEXT, SHORT_TEXT	Gaming industry content search	Industry-specific Enhanced Edition for Gaming
General analyzer for E-commerce for English	TEXT	English E-commerce product search	Industry-specific Enhanced Edition for E-commerce
Character analyzer for Chinese	TEXT, SHORT_TEXT	Character-level non-semantic Chinese search	Exclusive applications
Custom analyzer for text	TEXT, SHORT_TEXT	Scenarios where built-in analyzers cannot meet requirements	—

Analyzer reference

Keyword analyzer

Outputs the entire field value as a single token without any segmentation. Use it for exact-match searches on tags, keywords, IDs, or any string that must be treated as a whole.

Supported field types: LITERAL, ARRAY, INT

Example: If a field value is juhuacha, the document is retrieved only when a user searches for juhuacha.

General analyzer for Chinese

Segments Chinese text into search units based on Chinese semantics. This is the recommended starting point for most Chinese full-text search use cases.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is juhuacha, the document is retrieved when a user searches for juhuacha, juhua, cha, or huacha.

E-commerce analyzer for Chinese

An industry-specific analyzer tuned for Chinese E-commerce product search. It handles product names, brand names, and mixed Chinese-English terms that are common in retail catalogs.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is Dabao SOD honey, the document is retrieved when a user searches for Dabao, sod, sodhoney, SOD honey, or honey.

Single character analyzer for Chinese

Segments Chinese text into individual characters as well as multi-character words. Use it when semantic meaning is not required and higher recall is preferred — for example, when searching author names or store names.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is juhuacha, the document is retrieved when a user searches for juhuacha, juhua, cha, huacha, ju, hua, or jucha.

This analyzer treats numbers and English words as single tokens. A search for he does not retrieve a document whose field contains hello. To support partial English-word matching, use the fuzzy analyzer instead.

Fuzzy analyzer

Supports pinyin, prefix/suffix, and single-word or single-letter searches. Fields must not exceed 100 bytes.

Supported field types: SHORT_TEXT

Chinese text does not support prefix or suffix searches. Prefix and suffix searches apply to letters, numbers, and pinyin only. For details, see Fuzzy search.

Examples:

Chinese with pinyin: If a field value is juhuacha, the document is retrieved when a user searches for juhuacha, juhua, cha, huacha, ju, hua, jucha, ju, juhua, juhuacha, j, jh, or jhc.
Prefix/suffix on numbers: If a field value is 138****5678, searching ^138 retrieves all numbers starting with 138; searching 5678$ retrieves all numbers ending with 5678.
Latin strings: If a field value is OpenSearch, the document is retrieved when a user searches for any single letter or combination of letters contained in the value.

Word stemming analyzer for English

Reduces each English word to its root form, enabling searches across inflected variants. Consecutive Chinese characters are treated as a single token.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is English tokenizer english analyzer, the document is retrieved when a user searches for English tokenizer, english, analyz, analyzer, analyzers, analyze, analyzed, or analyzing.

Unstemmed word analyzer for English

Segments text on spaces and punctuation marks without applying stemming. Use it for non-semantic English searches such as book titles or author names. Consecutive Chinese characters are treated as a single token.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is English tokenizer english analyzer, the document is retrieved when a user searches for English tokenizer, english, or analyzer.

Analyzer for fine-grained analysis for English

Segments English text by search unit and splits compound words into their components. This is a general-purpose English analyzer for industry-wide use cases.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is dataprocess, the analysis result is data process. The document is retrieved when a user searches for dataprocess, data process, data, or process.

Full pinyin spelling analyzer

Lets users find Chinese text by entering the full pinyin spelling or the first letters of each syllable (abbreviated pinyin). The user must type a complete pinyin syllable — partial syllables do not match.

Supported field types: SHORT_TEXT

Example: If a field value is DaNeiMiTan007, the document is retrieved when a user searches for d, dn, dnm, dnmt, dnmt007, da, danei, daneimi, or daneimitan. Searching for an or anei does not retrieve the document.

Abbreviated pinyin spelling analyzer

Lets users find Chinese text by entering the first letters of each pinyin syllable. Unlike the full pinyin spelling analyzer, partial syllable entries are supported.

Supported field types: SHORT_TEXT

Example: If a field value is DaNeiMiTan007, the document is retrieved when a user searches for d, dn, dnm, dnmt, dnmt0, damt007, m, mt, mt007, or 007.

Simple analyzer

Gives you full control over tokenization. Terms in field values and search queries must be separated by tab characters (\t). Field values and queries must use the same segmentation; otherwise, documents cannot be retrieved.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is ju\thuacha\thao, the document is retrieved when a user searches for ju, huacha, ju\thuacha, huacha\thao, ju\thao, or ju\thuacha\thao.

Numerical value analyzer

Indexes numeric and timestamp fields for range queries.

Supported field types: INT, TIMESTAMP

Example:

query=default:'kaifang sousuo' AND index:[number1,number2]

In this example, index is the name of the index field configured with the numerical value analyzer.

Geo-location analyzer

Indexes GEO_POINT fields for geographical-location queries such as radius searches.

Supported field types: GEO_POINT

Example:

query=spatial_index:'circle(116.5806 39.99624, 1000)'

This query retrieves documents within a circle whose radius can be several kilometers.

IT content analyzer

An industry-specific analyzer for technical IT content. Compared with the general analyzer for Chinese, it handles IT-specific character sequences differently — for example, keeping c++ as a single token instead of splitting it.

Supported field types: TEXT, SHORT_TEXT

Example:

Input	General analyzer result	IT content analyzer result
`c++array usage notes`	`c` `++` `array usage notes`	`c++` `array usage notes`

General analyzer for E-commerce for Chinese

An industry-specific analyzer for Chinese E-commerce product search. It uses natural language processing (NLP) technology from DAMO Academy to produce finer-grained segmentation than the general analyzer.

Supported field types: TEXT

Availability: Industry-specific Enhanced Edition for E-commerce only.

Example:

Input	General analyzer result	General E-commerce analyzer result
`small gold tube concealer`	`small gold tube concealer`	`small gold tube` `concealer` `paste`

General analyzer for Thai

Segments Thai text into search units for general-purpose Thai full-text search.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is แหล่งดึงดูดนักท่องเที่ยว, the analysis result is แหล่ง ดึง ดูด นักท่องเที่ยว. The document is retrieved when a user searches for นักท่องเที่ยว or แหล่งดึงดูดนักท่องเที่ยว.

Analyzer for E-commerce for Thai

Segments Thai text for E-commerce product search scenarios.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is หน้าจอโทรศัพท์, the analysis result is น้าจอ โทรศัพท์. The document is retrieved when a user searches for หน้าจอโทรศัพท์, หน้าจอ, or โทรศัพท์.

General analyzer for Vietnamese

Segments Vietnamese text for general-purpose Vietnamese full-text search.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

General analyzer for Gaming

An industry-specific analyzer tuned for gaming content.

Supported field types: TEXT, SHORT_TEXT

Availability: Industry-specific Enhanced Edition for Gaming only.

Example: If a field value is Genshin equipment, the analysis result is Genshin equipment. The document is retrieved when a user searches for Genshin equipment, Genshin, or equipment.

General analyzer for E-commerce for English

An industry-specific analyzer for English E-commerce product search.

Supported field types: TEXT

Availability: Industry-specific Enhanced Edition for E-commerce only.

Character analyzer for Chinese

Segments text into individual Chinese characters, English letters, numbers, and punctuation marks. Use it for character-level non-semantic searches.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is kaifang sousuo OpenSearch123, the document is retrieved when a user searches for any individual character or letter: kai, fang, sou, suo, O, p, e, n, S, e, a, r, c, h, or ..

Custom analyzer for text

Combines an industry-specific analyzer — the general analyzer, E-commerce analyzer, or person name analyzer — with custom intervention entries. For configuration details, see Custom analyzers.

Supported field types: TEXT, SHORT_TEXT

Choose an analyzer

Use the following guidance to select the right analyzer for your use case.

Chinese full-text search

For most scenarios, start with the general analyzer for Chinese or the E-commerce analyzer for Chinese.
When stringent ranking is not required and higher recall matters — such as short text or non-semantic content — use the single character analyzer for Chinese.

Combining analyzers for better ranking

Index the same field with both a semantic and a character-level analyzer to retrieve more documents while ranking semantically relevant results higher. For example:

query=title_index:'juhuacha' OR sws_title_index:'juhuacha'

Fine sort expression:

text_relevance(title)*5+field_proximity(sws_title)

This configuration allows users to retrieve all documents that contain xxjuxxhuaxxchaxx. In addition, documents that contain juhuacha are ranked first.

Pinyin search

Use the fuzzy analyzer for pinyin-based searches.

English full-text search

Use the word stemming analyzer for English to match inflected word forms.

Test an analyzer

To verify how an analyzer segments a specific input, use the Word Analysis Test tool in the OpenSearch console.

Log on to the OpenSearch console.
In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration.
On the Basic Configuration page, click Analyzer Management in the left-side pane.
On the Analyzer Management page, find the target analyzer and click Word Analysis Test in the Actions column.

Usage notes

Index field types: The following field types support index configuration: INT, INT_ARRAY, TEXT, SHORT_TEXT, LITERAL, LITERAL_ARRAY, TIMESTAMP, and GEO_POINT. The following types do not: FLOAT, FLOAT_ARRAY, DOUBLE, and DOUBLE_ARRAY.
Search result highlighting: For TEXT fields, terms that belong to extended search units — such as huacha derived from juhuacha — may not be wrapped in HTML highlighting tags in search result summaries.
Single character analyzer and English words: This analyzer treats numbers and English words as single tokens. A search for he does not retrieve a document whose field contains hello. To support partial English-word matching, use the fuzzy analyzer instead.
Primary key index field: The primary key of the primary table is automatically set as an index field named id. This field cannot be modified.