Search index SQL query functions - Tablestore - Tablestore

Use search index query functions in SQL WHERE clauses to perform full-text search, array, nested, vector, and JSON queries on search index mapping tables.

Supported operations

Note

Before you use search index SQL queries, create a mapping table for the search index. For more information, see DDL operations.

Function	Query type	Description
TEXT_MATCH	Full-text search	Matches rows that contain at least one token from the query text.
TEXT_MATCH_PHRASE	Full-text search	Matches rows in which the tokens appear consecutively in the specified order.
ARRAY_EXTRACT	Array query	Expands an array column for filtering with operators.
NESTED_QUERY	Nested type query	Requires all conditions to be met by a single JSON element.
VECTOR_QUERY_FLOAT32	Vector search	Performs approximate nearest neighbor (ANN) queries.
SCORE()	Vector search	Returns the relevance score of a vector search result.
->>	JSON function	Extracts a value at the specified path and converts it to a string.
JSON_UNQUOTE	JSON function	Removes the outer quotation marks from a JSON value.
JSON_EXTRACT	JSON function	Extracts the sub-document at the specified path.

Full-text search

Match data in Text-type fields with TEXT_MATCH for token matching or TEXT_MATCH_PHRASE for phrase matching.

Note

Before you use full-text search, configure the target column as the Text type in the search index and set a Tokenization. For columns with fuzzy tokenization, use TEXT_MATCH_PHRASE for high-performance fuzzy queries.

TEXT_MATCH (match query)

Tokenizes the query text and matches rows that contain at least one token. Returns a Boolean value: true for a match, false for no match.

TEXT_MATCH(fieldName, text [, options])

Parameter	Type	Description
fieldName	STRING	The name of the column to match. The column must be the Text type in the search index.
text	STRING	The query text. The text is tokenized and then matched against the row data. A row matches if it contains any token. The search index tokenizer determines how the text is split into tokens. If no tokenizer is specified, single-character tokenization is used by default.
options	STRING	Optional match parameters, including operator (the logical operator, which can be OR or AND, default OR) and minimum_should_match (the minimum number of matching tokens, default 1). If operator is OR, a row matches when it contains at least minimum_should_match tokens. If operator is AND, all tokens must be present in the row.

TEXT_MATCH_PHRASE (phrase match query)

Similar to TEXT_MATCH, but requires the tokens to appear consecutively in the same order in the row data. Returns a Boolean value.

TEXT_MATCH_PHRASE(fieldName, text)

The parameters are the same as TEXT_MATCH, but TEXT_MATCH_PHRASE requires the tokens to match consecutively in order. For example, the query text "this is" matches "this is tablestore" but does not match "this table is" or "is this".

Examples

Query data in the content column that contains the "tablestore" token:

SELECT * FROM search_exampletable WHERE TEXT_MATCH(content, 'tablestore') LIMIT 10;

Query data in the content column that contains "sql query" as a consecutive phrase:

SELECT * FROM search_exampletable WHERE TEXT_MATCH_PHRASE(content, 'sql query') LIMIT 10;

Use the options parameter to match data that contains at least 2 tokens:

SELECT * FROM search_exampletable WHERE TEXT_MATCH(content, 'tablestore is cool', 'or', '2') LIMIT 10;

Use the AND operator to require all tokens to be present:

SELECT * FROM search_exampletable WHERE TEXT_MATCH(content, 'tablestore is cool', 'and') LIMIT 10;

Array queries

Use the ARRAY_EXTRACT function to query data in array-type columns. Configure the column as an array type in the search index by enabling the array option in the console or setting IsArray to true in the SDK. When you write data, array values must be in JSON array format, such as ["a","b","c"].

Data type mapping

Data table type	Search index type	SQL type
String	The actual type of array elements (Long, Double, Boolean, Keyword, or Text), with the array property enabled for the column	VARCHAR (primary key) or MEDIUMTEXT (predefined column)

ARRAY_EXTRACT(col_name)

ARRAY_EXTRACT expands an array column and combines with operators as a WHERE clause condition. Supported operators include equality (=), range (>, <), and LIKE.

Important

You cannot use an array column directly with operators as a query condition. Use the ARRAY_EXTRACT function.

Limitations

ARRAY_EXTRACT can be used only on search index mapping tables, and only one array column parameter is allowed per call. The function can be used only as a WHERE clause condition. It cannot be used as a SELECT expression or for aggregation and sorting.
An array column without ARRAY_EXTRACT can be used as a SELECT column name or expression but cannot be used for aggregation and sorting.
When ARRAY_EXTRACT is combined with operators as a query condition, data type conversion is not supported. The query value must match the data type of the array column. For example, a Long-type array column supports ARRAY_EXTRACT(col) = 1 but does not support ARRAY_EXTRACT(col) = '1'.
Text-type array elements must be used with the TEXT_MATCH or TEXT_MATCH_PHRASE function, such as TEXT_MATCH(ARRAY_EXTRACT(col_text), 'keyword').

Examples

-- Query rows that contain the value 'apple' in the array
SELECT * FROM search_exampletable WHERE ARRAY_EXTRACT(col_array) = 'apple';

-- Query rows that contain elements starting with 'd' in the array
SELECT * FROM search_exampletable WHERE ARRAY_EXTRACT(col_array) LIKE 'd%';

Nested type queries

Nested type columns store JSON arrays in which each element contains multiple subcolumns. The column data type in the data table must be String. Configure the column as the Nested type and specify the subcolumn data types when you create the search index.

When you create a mapping table, define nested type columns as MEDIUMTEXT. Internal subcolumns are created automatically and can be viewed with DESCRIBE, such as col_nested.name and col_nested.age. In queries, subcolumn names use the format nested_column.sub_column, with periods (.) separating multiple nesting levels, such as col1.col2.col3.

Data type mapping

Data table type	Search index type	SQL type
String	Nested type. The subcolumn data types match the actual data types of the written data.	VARCHAR (primary key) or MEDIUMTEXT (predefined column)

Query methods

Direct subcolumn query

Use nested subcolumns directly with operators. A row matches if any JSON element in the row has a subcolumn that meets the condition.

SELECT * FROM search_exampletable WHERE `col_nested.age` > 30;

NESTED_QUERY function

Requires all conditions to be met by a single JSON element.

NESTED_QUERY(subcol_column_condition)

subcol_column_condition specifies query conditions on subcolumns at the same nesting level. Combine multiple conditions with AND or OR.

Difference between the two methods

Assume the nested column tags contains the following row data: [{"tagName":"tag1", "score":0.8}, {"tagName":"tag2", "score":0.2}]:

tags.tagName — Matches, because the first element meets the tagName condition and the second element meets the score condition.
NESTED_QUERY( — Does not match, because no single element meets both conditions.

Limitations

NESTED_QUERY can be used only on search index mapping tables and only as a WHERE clause. It cannot be used as a SELECT expression or for aggregation, grouping, or sorting.
Nested subcolumns cannot be used as SELECT expressions or for aggregation, grouping, or sorting.
ALTER TABLE cannot directly add or delete nested subcolumns. You can only add or delete the entire nested column, and the subcolumns are automatically added or deleted with it.
Nested subcolumns do not support data type conversion or function computations that cannot be pushed down to the search index. Make sure the data types of nested subcolumns are correct.

Examples

-- Direct subcolumn query: query rows where age > 30 in the nested column
SELECT * FROM search_exampletable WHERE `col_nested.age` > 30;

-- NESTED_QUERY: query rows where a single element has name starting with 'I' and age < 20
SELECT * FROM search_exampletable WHERE NESTED_QUERY(`col_nested.name` LIKE 'I%' AND `col_nested.age` < 20);

-- Multi-level nesting
SELECT * FROM search_exampletable WHERE NESTED_QUERY(`col1.col2` = 1 AND NESTED_QUERY(`col1.col3.col4` = 2));

Virtual column queries

Search index virtual columns let you query new fields and types by modifying the search index schema, without changing the data table storage structure. Virtual columns are defined in the mapping table with their actual SQL data types.

Data type mapping

Search index virtual column type	SQL type	Description
Keyword	MEDIUMTEXT	Virtual columns have no corresponding column in the data table. Only their source columns have corresponding columns.
Text	MEDIUMTEXT
Long	BIGINT
Double	DOUBLE

Supported usage

Filter data in WHERE clauses. The data type of the virtual column in the condition must match the query parameter type.
Use in aggregation and grouping. The source data type of the virtual column must be compatible with the operation. For example, only Long and Double types support SUM. Keyword-type virtual columns cannot be summed, and Text-type virtual columns do not support grouping.
TopN queries and sorting are supported. Sorting requires LIMIT.

Limitations

Virtual columns can be used only in search index mapping tables.
Virtual columns can be used only in query conditions. They cannot be used in SELECT to return column values. To return values, specify the source column of the virtual column. SELECT * is not affected and automatically excludes virtual columns from the results.
Virtual columns cannot be used for column comparisons, computations, or JOINs.
Virtual columns do not support data type conversion or function computations that cannot be pushed down to the search index. Currently, only aggregate functions can be pushed down in SQL queries.

Examples

Create a search index mapping table that includes virtual columns:

CREATE TABLE search_exampletable(
    col_keyword MEDIUMTEXT,
    col_keyword_virtual_long BIGINT
)
ENGINE='searchindex',
ENGINE_ATTRIBUTE='{"index_name":"exampletable_index","table_name":"exampletable"}';

Query with virtual columns:

SELECT * FROM search_exampletable WHERE col_keyword_virtual_long > 100 LIMIT 10;

Vector search

Use the VECTOR_QUERY_FLOAT32 function for approximate nearest neighbor (ANN) queries. Vector fields are stored as strings in the data table. Configure them as the vector type in the search index, and specify the dimensions, data type, and distance metric. The SQL data type of vector columns in the mapping table is MEDIUMTEXT.

VECTOR_QUERY_FLOAT32

VECTOR_QUERY_FLOAT32(fieldName, float32QueryVector, topK, filter)

Parameter	Required	Description
fieldName	Yes	The name of the vector column. The column must be the vector type in the search index.
float32QueryVector	Yes	The query vector. The dimensions must match those of the vector field in the search index.
topK	Yes	The number of nearest results to return. A larger K value improves recall but increases query latency and cost. If topK is less than the LIMIT value, the server automatically increases topK to match LIMIT. For the maximum topK value, see Search index limits.
filter	No	A query filter that supports any combination of non-vector query conditions. The filter conditions are applied before vector search to narrow down the candidate set for more accurate results. You can also add filter conditions in the WHERE clause with AND, but those conditions filter the topK results after vector search.

SCORE() function

Use SCORE() with VECTOR_QUERY_FLOAT32 as a SELECT expression to return the relevance score of each result. A higher score indicates greater similarity.

SCORE()

Limitations

VECTOR_QUERY_FLOAT32 can be used only on search index mapping tables and must be used with LIMIT. HAVING clauses are not supported.
VECTOR_QUERY_FLOAT32 can be used only as a WHERE clause. It cannot be used as a SELECT expression or for aggregation, grouping, or sorting.
SCORE() can be used only with VECTOR_QUERY_FLOAT32 and only as a SELECT expression. It cannot be used in WHERE clauses, aggregation, or sorting.
Other conditions in the WHERE clause must support pushdown to the search index. Otherwise, the query fails. For supported push-down operators, see Query optimization.

Examples

Query the 10 most similar results to the specified vector in the col_vector column:

SELECT *, SCORE() FROM exampletable WHERE VECTOR_QUERY_FLOAT32(col_vector, '[1.5, -1.5, 2.5, -2.5]', 10) LIMIT 10;

Use filter to narrow the candidate set before vector search for more accurate results:

SELECT *, SCORE() FROM exampletable WHERE VECTOR_QUERY_FLOAT32(col_vector, '[1.5, -1.5, 2.5, -2.5]', 100, col_keyword='cat_a' AND year_num=2024) LIMIT 10;

Use AND in the WHERE clause for post-vector-search filtering. The topK results may not include all matching rows:

SELECT *, SCORE() FROM exampletable WHERE col_keyword='cat_a' AND VECTOR_QUERY_FLOAT32(col_vector, '[1.5, -1.5, 2.5, -2.5]', 500) LIMIT 10;

JSON functions

Tablestore SQL JSON functions follow the MySQL 5.7 syntax and extract data from JSON-format columns.

Function	Syntax	Description
->>	`col->>'$.path'`	Extracts the value at the specified path and converts it to a string. Equivalent to `JSON_UNQUOTE(JSON_EXTRACT())`.
JSON_UNQUOTE	`JSON_UNQUOTE(json_val)`	Removes the outer quotation marks from a JSON value and returns a string.
JSON_EXTRACT	`JSON_EXTRACT(json_doc, path[, path] ...)`	Extracts the sub-document at the specified path. The return value retains the JSON format.

->> (JSON path extraction)

Extracts the value at the specified path from a JSON column and unquotes it to a string. Equivalent to JSON_UNQUOTE(JSON_EXTRACT()).

column->>'$.path'

Parameter	Type	Description
column	STRING	The column name.
path	STRING	A JSON path expression that must start with `$`. For the detailed syntax, see JSON Path syntax below.

Example

SELECT col_json->>'$.city' AS city FROM exampletable LIMIT 10;

JSON_UNQUOTE

Removes the outer quotation marks from a JSON value and returns a string. Returns NULL if the argument is NULL.

JSON_UNQUOTE(json_val)

Parameter	Type	Description
json_val	STRING	A JSON value, typically the return value of JSON_EXTRACT. An error is returned if the value starts and ends with double quotation marks but is not a valid JSON string literal.

Example

SELECT JSON_UNQUOTE(JSON_EXTRACT(col_json, '$.city')) AS city FROM exampletable LIMIT 10;

JSON_EXTRACT

Extracts the sub-document at the specified path from a JSON column. The return value retains the JSON format, with string values wrapped in quotation marks. Multiple paths can be specified, and the results are returned in array format.

Important

Tablestore does not support native JSON types. JSON_EXTRACT cannot be used alone and returns an invalid column type: json error. Use JSON_EXTRACT with JSON_UNQUOTE.

JSON_EXTRACT(json_doc, path[, path] ...)

Parameter	Type	Description
json_doc	STRING	A JSON document. An error is returned if the value is not a valid JSON document.
path	STRING	A JSON path expression that must start with `$`. Multiple paths are supported. Returns NULL if any argument is NULL or if the path does not exist in the document. An error is returned if the path is not a valid path expression.

Examples

Extract a single path:

SELECT JSON_UNQUOTE(JSON_EXTRACT(col_json, '$.city')) AS city FROM exampletable WHERE pk = 1;

Extract multiple paths. The results are returned in array format:

-- Assume col_json contains {"a": 1, "b": 2, "c": {"d": 4}}
SELECT JSON_UNQUOTE(JSON_EXTRACT(col_json, '$.a', '$.b', '$.c.d')) AS subdoc FROM exampletable WHERE pk = 1;
-- Result: [1, 2, 4]

JSON Path syntax

Paths must start with $, which represents the entire JSON document. Append path selectors after $. Selectors can be combined.

Selector	Example	Description
$.key	$.a, $.c.d	Accesses an object member. Enclose keys that contain spaces in double quotation marks, such as `$."a fish"`.
[N]	$[0], $.f[1]	Accesses an array element. Indexes start from 0.
.*	$.*	Object wildcard. Returns the values of all members.
[*]	$.arr[*]	Array wildcard. Returns the values of all elements.
prefix**suffix	$**.d	Path wildcard. Matches all paths that start with prefix and end with suffix.

JSON object query examples

Assume the JSON column contains {"a": 1, "f": [1, 2, 3], "c": {"d": 4}}:

Path	Return value	Description
$	{"a": 1, "c": {"d": 4}, "f": [1, 2, 3]}	The entire document
$.a	1	A direct member
$.c	{"d": 4}	A nested object
$.c.d	4	A nested object member
$.f[1]	2	An array element

JSON array query examples

Assume the JSON column contains [3, {"a": [5, 6], "b": 10}, [99, 100]]. Non-scalar return values support nested queries.

Path	Return value	Description
$[0]	3	A scalar element
$[1]	{"a": [5, 6], "b": 10}	A non-scalar value. Nested queries can continue.
$[1].a	[5, 6]	A nested object member
$[1].a[1]	6	A nested array element
$[1].b	10	A nested object member
$[2][0]	99	A nested array element
$[3]	NULL	Out of bounds. Returns NULL.