All Products
Search
Document Center

Tablestore:String types

Last Updated:Dec 25, 2024

This topic describes the Keyword, FuzzyKeyword, and Text data types that are related to strings in search indexes. This topic also describes how to select which one of the preceding types meets your business requirements.

Background information

The String type is the most common data type in data systems and is used in almost all business scenarios. In traditional relational databases, the String type is classified into various types, such as fixed-length and variable-length strings.

The String type in Tablestore is similar to the String type in various programming languages, such as C++ and Java. Search indexes classify the String type in a fine-grained manner to support more features. In search indexes, the String type is classified into three types: the Keyword type that is similar to the String type in databases, the FuzzyKeyword type that can be used for fuzzy query, and the Text type that supports tokenization and is used in full-text search.

This topic describes the relationship between the Keyword, FuzzyKeyword, and Text types and how to select one of the types based on your business requirements.

Data type mappings between tables and search indexes

The String type in tables can be mapped to the Keyword, FuzzyKeyword, or Text type in search indexes. You can also use the virtual column feature to map the Integer or Double type in tables to the Keyword, FuzzyKeyword, or Text Type in search indexes.

Data type in tables

Method

Data type in search indexes

String

Direct use

Keyword

FuzzyKeyword

Text

Integer

Virtual column

Keyword

FuzzyKeyword

Text

Double

Virtual column

Keyword

FuzzyKeyword

Text

Data types

Keyword

The Keyword type in search indexes is the most commonly used String type and is similar to the String type in databases and programming languages such as C++ and Java.

The Keyword type supports term query, range query, wildcard query, prefix query, exists query, sorting, and aggregation operations such as GroupBy. When you perform a wildcard query or prefix query to query data of the Keyword type in a dataset that contains millions of rows of data, the query performance declines when the amount of data increases.

If you do not want to use fuzzy query or full-text search, you can use the Keyword type. If you want to use features such as sorting and aggregation, you must use the Keyword type.

Text

The Text type supports multiple tokenization methods and is similar to the Text type in search engines. The system tokenizes data of the Text type into tokens based on the specified tokenization method and stores the tokens. The Text type is used in full-text search scenarios.

The tokenization methods supported by the Text type include single-word tokenization, delimiter tokenization, minimum semantic unit-based tokenization, and maximum semantic unit-based tokenization. You can specify a tokenization method based on your business requirements. For more information, see Tokenization.

You must use the Text type in scenarios in which you want to use full-text search.

FuzzyKeyword

The FuzzyKeyword type is a String type that supports high-performance wildcard query, prefix query, and suffix query. The FuzzyKeyword type provides high query performance regardless of the amount of data. The query performance does not decline when the amount of data increases.

You must use the FuzzyKeyword type in scenarios in which you want to use millisecond-level wildcard query, prefix query, or suffix query.

If you want to use suffix query, you can use SuffixQuery to query data of the FuzzyKeyword type or reverse data and use PrefixQuery to query data of the Keyword type. In this case, the FuzzyKeyword type outperforms the Keyword type.

Note

If you want to use multiple query methods to query a field, such as term query, high-performance fuzzy query, and full-text search, you can use the virtual column feature to map the field to three fields whose type is Keyword, Text, and FuzzyKeyword in a search index. For more information, see Virtual columns.

Differences between the Keyword, Text, and FuzzyKeyword types

The Keyword, Text, and FuzzyKeyword types support different query methods and maximum field lengths. The following table describes the differences.

Note

A tick (✔️) indicates that the feature is supported, and a cross (×) indicates that the feature is not supported.

Item

Keyword

FuzzyKeword

Text

Term query

✔️

×

×

Terms query

✔️

×

×

Range query

✔️

×

×

Exists query

✔️

×

×

Sorting

✔️

×

×

Aggregation

✔️

×

×

Tokenization

×

×

✔️

Full-text search: keyword relevance score

×

×

✔️

Full-text search: highlight

×

×

✔️

Match query

×

×

✔️

Match phrase query

×

×

✔️

Wildcard query

✔️ (poor performance)

✔️ (high performance)

×

Prefix query

✔️ (poor performance)

✔️ (high performance)

×

Suffix query

×

✔️

×

Maximum field length

4 KB

2 KB

2 MB