To improve the performance of LIKE queries, ApsaraDB for SelectDB supports NGram Bloom filter indexes. The effects of NGram Bloom filter indexes are similar to those of Bloom filter indexes. This topic describes how to create and use NGram Bloom filter indexes in ApsaraDB for SelectDB.
Usage notes
To create an
NGram Bloom filterindex for a column, the column must meet all the following conditions:The column type is STRING or VARCHAR.
An
NGram Bloom filterindex and aBloom filterindex are mutually exclusive for the same column. Make sure that noBloom filterindex is created for the column for which you want to create an NGram Bloom filter index.
To make an
NGram Bloom filterindex valid in a query, the query must meet all the following conditions:The query is a LIKE query.
The number of consecutive characters in the LIKE pattern must be greater than or equal to the value of the
gram_sizeparameter in the index definition.NoteIf you do not specify the
gram_sizeparameter when you create an index, the default value 2 is used.If you specify the
gram_sizeparameter when you create an index, you can execute theSHOW INDEXES FROM <table_name>;statement to query the indexes of the table.
Create NGram Bloom filter indexes
Create indexes when you create a table
This operation is synchronous. If you create indexes when you create a table, the table and indexes are synchronously created.
Syntax
CREATE TABLE [IF NOT EXISTS] [db_name.]<table_name>
(
<column_definition_list>,
[<index_definition_list>]
)
table_properties;Parameters
Parameters that are used to create a table
Parameter | Required | Description |
db_name | No | The name of the database in which you want to create a table. |
table_name | Yes | The name of the table that you want to create. |
column_definition_list | Yes | The list of column definitions. |
table_properties | Yes | The properties of the table, such as the data model and partitioning and bucketing settings. For more information, see Data models. |
index_definition_list | No | The list of index definitions. |
index_definition_list
When you create a table, you can define multiple indexes in the index_definition[, index_definition][, index_definition]... format.
Syntax of index_definition
INDEX <index_name>(<column_name>) <index_type> [PROPERTIES("<key>" = "<value>")] [COMMENT '<comment>']Parameters of index_definition
If you do not specify the gram_size and bf_size parameters for the optional parameter PROPERTIES, the system automatically sets the two parameters to the following default values: "gram_size"="2" and "bf_size"="256". We recommend that you manually specify the two parameters based on your business requirements to deliver the optimal performance of NGram Bloom filter indexes. For more information about how to specify the two parameters, see the description of the optional parameter PROPERTIES.
Required parameters
Parameter | Description |
index_name | The name of the index. We recommend that you add the Important An index name must be unique in a table. |
column_name | The name of the column for which the index is created. Important A column can have only one |
index_type | The type of the index. Set the value to |
Optional parameters
PROPERTIES
The PROPERTIES parameter specifies whether to split a data block based on the index. The value of the PROPERTIES parameter consists of one or more key-value pairs that are separated by commas (,). Each key-value pair is in the format of "<key>" = "<value>".
key | value |
gram_size | The number of consecutive characters based on which a data block is split. For example, if this parameter is set to Note
|
bf_size | The number of bytes of the Unit: byte. This parameter determines the index size of each data block. In most cases, a greater value indicates better filtering performance and a lower probability of hash collisions. However, a larger number of bytes lead to more storage and memory resources consumed by the index. We recommend that you set this parameter to 256 for verification testing and optimal performance. |
COMMENT
Parameter | Description |
comment | The description of the index. |
Example
CREATE TABLE `test_table` (
`siteid` int(11) NULL DEFAULT "10" COMMENT "",
`citycode` smallint(6) NULL COMMENT "",
`username` varchar(32) NULL DEFAULT "" COMMENT "",
`review_body` varchar(320) NULL,
INDEX idx_ngrambf (`review_body`) USING NGRAM_BF
PROPERTIES("gram_size"="3", "bf_size"="256")
COMMENT 'review_body ngram_bf index'
) ENGINE=OLAP
AGGREGATE KEY(`siteid`, `citycode`, `username`, `review_body`) COMMENT "OLAP"
DISTRIBUTED BY HASH(`siteid`) BUCKETS 10;Create indexes for an existing table
This operation is asynchronous. You can execute the SHOW ALTER TABLE COLUMN; statement to query the index creation progress.
Syntax
ALTER TABLE <table_name> ADD INDEX <index_name>(<column_name>) <index_type> [PROPERTIES("<key>" = "<value>")];Parameters
The parameters are the same as those used to create indexes when you create a table.
Example
ALTER TABLE test_table ADD INDEX idx_ngrambf2(username) using NGRAM_BF PROPERTIES("gram_size"="2", "bf_size"="512")comment 'username ngram_bf index' Query all indexes that are created for a table
Syntax
SHOW INDEXES FROM <table_name>;Example
Query all indexes that are created for the test_table table.
SHOW INDEX FROM test_table;Delete an NGram Bloom filter index
This operation is asynchronous. For more information about how to query the index deletion progress, see Query the information about inverted indexes.
If you delete NGram Bloom filter indexes, the query performance decreases. Proceed with caution.
Syntax
ALTER TABLE <table_name> DROP INDEX <index_name>;Example
Delete the idx_ngrambf index from the test_table table.
ALTER TABLE test_table DROP INDEX idx_ngrambf;Query the index change progress
You can execute the ALTER statement to modify an NGram Bloom filter index. This operation is asynchronous. You can execute the following statement to query the index change progress:
SHOW ALTER TABLE COLUMN;FAQ
Q: How do I check whether an NGram Bloom filter index is used in a query?
A: You can check whether an NGram Bloom filter index is used based on a query profile. For more information about how to use query profiles, see Query profile.
Q: What do I do if the following error is reported when I execute the ALTER statement to create NGram Bloom filter indexes for an existing table?
ERROR 1105 (HY000): errCode = 2, detailMessage = NGRAM_BF index for columns (review_body ) already exist.A: You can create multiple NGram Bloom filter indexes for a table. However, these indexes must be configured for different columns. If an NGram Bloom filter index or a Bloom filter index is already created for a column, you cannot create another NGram Bloom filter index for the column.