This topic describes the Keyword, FuzzyKeyword, and Text data types that are related to strings in search indexes. This topic also describes how to select which one of the preceding types meets your business requirements.
Background information
The String type is the most common data type in data systems and is used in almost all business scenarios. In traditional relational databases, the String type is classified into various types, such as fixed-length and variable-length strings.
The String type in Tablestore is similar to the String type in various programming languages, such as C++ and Java. Search indexes classify the String type in a fine-grained manner to support more features. In search indexes, the String type is classified into three types: the Keyword type that is similar to the String type in databases, the FuzzyKeyword type that can be used for fuzzy query, and the Text type that supports tokenization and is used in full-text search.
This topic describes the relationship between the Keyword, FuzzyKeyword, and Text types and how to select one of the types based on your business requirements.
Data type mappings between tables and search indexes
The String type in tables can be mapped to the Keyword, FuzzyKeyword, or Text type in search indexes. You can also use the virtual column feature to map the Integer or Double type in tables to the Keyword, FuzzyKeyword, or Text Type in search indexes.
Data type in tables | Method | Data type in search indexes |
String | Direct use | Keyword |
FuzzyKeyword | ||
Text | ||
Integer | Virtual column | Keyword |
FuzzyKeyword | ||
Text | ||
Double | Virtual column | Keyword |
FuzzyKeyword | ||
Text |
Data types
Keyword
The Keyword type in search indexes is the most commonly used String type and is similar to the String type in databases and programming languages such as C++ and Java.
The Keyword type supports term query, range query, wildcard query, prefix query, exists query, sorting, and aggregation operations such as GroupBy. When you perform a wildcard query or prefix query to query data of the Keyword type in a dataset that contains millions of rows of data, the query performance declines when the amount of data increases.
If you do not want to use fuzzy query or full-text search, you can use the Keyword type. If you want to use features such as sorting and aggregation, you must use the Keyword type.
Text
The Text type supports multiple tokenization methods and is similar to the Text type in search engines. The system tokenizes data of the Text type into tokens based on the specified tokenization method and stores the tokens. The Text type is used in full-text search scenarios.
The tokenization methods supported by the Text type include single-word tokenization, delimiter tokenization, minimum semantic unit-based tokenization, and maximum semantic unit-based tokenization. You can specify a tokenization method based on your business requirements. For more information, see Tokenization.
You must use the Text type in scenarios in which you want to use full-text search.
FuzzyKeyword
The FuzzyKeyword type is a String type that supports high-performance wildcard query, prefix query, and suffix query. The FuzzyKeyword type provides high query performance regardless of the amount of data. The query performance does not decline when the amount of data increases.
You must use the FuzzyKeyword type in scenarios in which you want to use millisecond-level wildcard query, prefix query, or suffix query.
If you want to use suffix query, you can use SuffixQuery to query data of the FuzzyKeyword type or reverse data and use PrefixQuery to query data of the Keyword type. In this case, the FuzzyKeyword type outperforms the Keyword type.
If you want to use multiple query methods to query a field, such as term query, high-performance fuzzy query, and full-text search, you can use the virtual column feature to map the field to three fields whose type is Keyword, Text, and FuzzyKeyword in a search index. For more information, see Virtual columns.
Differences between the Keyword, Text, and FuzzyKeyword types
The Keyword, Text, and FuzzyKeyword types support different query methods and maximum field lengths. The following table describes the differences.
A tick (✔️) indicates that the feature is supported, and a cross (×) indicates that the feature is not supported.
Item | Keyword | FuzzyKeword | Text |
✔️ | × | × | |
✔️ | × | × | |
✔️ | × | × | |
✔️ | × | × | |
✔️ | × | × | |
✔️ | × | × | |
× | × | ✔️ | |
Full-text search: keyword relevance score | × | × | ✔️ |
Full-text search: highlight | × | × | ✔️ |
× | × | ✔️ | |
× | × | ✔️ | |
✔️ (poor performance) | ✔️ (high performance) | × | |
✔️ (poor performance) | ✔️ (high performance) | × | |
Suffix query | × | ✔️ | × |
Maximum field length | 4 KB | 2 KB | 2 MB |