All Products
Search
Document Center

:English text analyzers

Last Updated:Aug 27, 2024

English word stemming analyzer

Overview

The English word stemming analyzer (eng_standard) stems each English word to its root form. This analyzer is suitable for searches based on English semantics.

Example: If the value of a field is "英文分词器 english analyzer" in a document and the English word stemming analyzer is specified, the document can be retrieved when a user searches for "英文分词器", "english", "analyz", "analyzer", "analyzers", "analyze", "analyzed", or "analyzing." 
Take note that an English text analyzer analyzes consecutive Chinese characters as one word.

Usage notes

  • This analyzer applies only to fields of the TEXT data type. To use the analyzer, set the analyzer to eng_standard when you configure a schema.

Unstemmed English word analyzer

Overview

The unstemmed English word analyzer (eng_nostem) tokenizes text into terms based on spaces and punctuation marks. This analyzer is suitable for searches that are not based on English semantics, such as searches for book titles or author names.

Example: If the value of a field is "英文分词器 english analyzer" in a document and the unstemmed English word analyzer is specified, the document can be retrieved when a user searches for "英文分词器", "english", or "analyzer." 
Take note that an English text analyzer analyzes consecutive Chinese characters as one word.

Usage notes

  • This analyzer applies only to fields of the TEXT data type. To use the analyzer, set the analyzer to eng_nostem when you configure a schema.

English minimum-granularity analyzer

Overview

The English minimum-granularity analyzer (en_min) tokenizes English text into terms based on English semantics by using search units. The analyzer can tokenize a string of words that are connected without spaces. This English analyzer is suitable for English text analysis in all industries.

Example: If the value of a field is "dataprocess" in a document and the English minimum-granularity analyzer is specified, the analysis result is "data process". In this case, the document can be retrieved when a user searches for "dataprocess", "data process", "data", or "process".

Usage notes

  • This analyzer applies only to fields of the TEXT data type. To use the analyzer, set the analyzer to en_min when you configure a schema.