All Products
Search
Document Center

ApsaraDB for SelectDB:NGram Bloom filter indexes

Last Updated:Jan 26, 2025

To improve the performance of LIKE queries, ApsaraDB for SelectDB supports NGram Bloom filter indexes. The effects of NGram Bloom filter indexes are similar to those of Bloom filter indexes. This topic describes how to create and use NGram Bloom filter indexes in ApsaraDB for SelectDB.

Usage notes

  • To create an NGram Bloom filter index for a column, the column must meet all the following conditions:

    • The column type is STRING or VARCHAR.

    • An NGram Bloom filter index and a Bloom filter index are mutually exclusive for the same column. Make sure that no Bloom filter index is created for the column for which you want to create an NGram Bloom filter index.

  • To make an NGram Bloom filter index valid in a query, the query must meet all the following conditions:

    • The query is a LIKE query.

    • The number of consecutive characters in the LIKE pattern must be greater than or equal to the value of the gram_size parameter in the index definition.

      Note
      • If you do not specify the gram_size parameter when you create an index, the default value 2 is used.

      • If you specify the gram_size parameter when you create an index, you can execute the SHOW INDEXES FROM <table_name>; statement to query the indexes of the table.

Create NGram Bloom filter indexes

Create indexes when you create a table

This operation is synchronous. If you create indexes when you create a table, the table and indexes are synchronously created.

Syntax

CREATE TABLE  [IF NOT EXISTS] [db_name.]<table_name>
(
  <column_definition_list>,
  [<index_definition_list>] 
)
table_properties;

Parameters

Parameters that are used to create a table

Parameter

Required

Description

db_name

No

The name of the database in which you want to create a table.

table_name

Yes

The name of the table that you want to create.

column_definition_list

Yes

The list of column definitions.

table_properties

Yes

The properties of the table, such as the data model and partitioning and bucketing settings. For more information, see Data models.

index_definition_list

No

The list of index definitions.

index_definition_list

When you create a table, you can define multiple indexes in the index_definition[, index_definition][, index_definition]... format.

Syntax of index_definition

INDEX <index_name>(<column_name>) <index_type> [PROPERTIES("<key>" = "<value>")] [COMMENT '<comment>']

Parameters of index_definition

Important

If you do not specify the gram_size and bf_size parameters for the optional parameter PROPERTIES, the system automatically sets the two parameters to the following default values: "gram_size"="2" and "bf_size"="256". We recommend that you manually specify the two parameters based on your business requirements to deliver the optimal performance of NGram Bloom filter indexes. For more information about how to specify the two parameters, see the description of the optional parameter PROPERTIES.

Required parameters

Parameter

Description

index_name

The name of the index.

We recommend that you add the idx_ prefix to a column name to generate an index name.

Important

An index name must be unique in a table.

column_name

The name of the column for which the index is created.

Important

A column can have only one NGram Bloom filter index or Bloom filter index.

index_type

The type of the index. Set the value to USING NGRAM_BF, which specifies that the index type is NGram Bloom filter index.

Optional parameters
PROPERTIES

The PROPERTIES parameter specifies whether to split a data block based on the index. The value of the PROPERTIES parameter consists of one or more key-value pairs that are separated by commas (,). Each key-value pair is in the format of "<key>" = "<value>".

key

value

gram_size

The number of consecutive characters based on which a data block is split.

For example, if this parameter is set to 3, an ngram can be split into 'an ', 'n n', ' ng', 'ngr', 'gra', and 'ram'.

Note
  • The value of this parameter varies based on the actual query scenario. We recommend that you use the minimum length of a string in the LIKE pattern, which is not less than 2.

  • If the value of this parameter is small, you can increase the number of bytes of the Bloom filter by adjusting the value of the bf_size parameter. This optimizes the performance of the NGram Bloom filter index. We recommend that you set the gram_size parameter to 3 and the bf_size parameter to 1024. For more information about how to use query profiles for performance optimization, see Query profile.

bf_size

The number of bytes of the Bloom filter.

Unit: byte.

This parameter determines the index size of each data block. In most cases, a greater value indicates better filtering performance and a lower probability of hash collisions. However, a larger number of bytes lead to more storage and memory resources consumed by the index. We recommend that you set this parameter to 256 for verification testing and optimal performance.

COMMENT

Parameter

Description

comment

The description of the index.

Example

CREATE TABLE `test_table` (
  `siteid` int(11) NULL DEFAULT "10" COMMENT "",
  `citycode` smallint(6) NULL COMMENT "",
  `username` varchar(32) NULL DEFAULT "" COMMENT "",
  `review_body` varchar(320) NULL,
  INDEX idx_ngrambf (`review_body`) USING NGRAM_BF 
  PROPERTIES("gram_size"="3", "bf_size"="256") 
  COMMENT 'review_body ngram_bf index'
) ENGINE=OLAP
AGGREGATE KEY(`siteid`, `citycode`, `username`, `review_body`) COMMENT "OLAP"
DISTRIBUTED BY HASH(`siteid`) BUCKETS 10;

Create indexes for an existing table

This operation is asynchronous. You can execute the SHOW ALTER TABLE COLUMN; statement to query the index creation progress.

Syntax

ALTER TABLE <table_name> ADD INDEX <index_name>(<column_name>) <index_type> [PROPERTIES("<key>" = "<value>")];

Parameters

The parameters are the same as those used to create indexes when you create a table.

Example

ALTER TABLE test_table ADD INDEX idx_ngrambf2(username) using NGRAM_BF PROPERTIES("gram_size"="2", "bf_size"="512")comment 'username ngram_bf index' 

Query all indexes that are created for a table

Syntax

SHOW INDEXES FROM <table_name>;

Example

Query all indexes that are created for the test_table table.

SHOW INDEX FROM test_table;

Delete an NGram Bloom filter index

This operation is asynchronous. For more information about how to query the index deletion progress, see Query the information about inverted indexes.

Important

If you delete NGram Bloom filter indexes, the query performance decreases. Proceed with caution.

Syntax

ALTER TABLE <table_name> DROP INDEX <index_name>;

Example

Delete the idx_ngrambf index from the test_table table.

ALTER TABLE test_table DROP INDEX idx_ngrambf;

Query the index change progress

You can execute the ALTER statement to modify an NGram Bloom filter index. This operation is asynchronous. You can execute the following statement to query the index change progress:

SHOW ALTER TABLE COLUMN;

FAQ

Q: How do I check whether an NGram Bloom filter index is used in a query?

A: You can check whether an NGram Bloom filter index is used based on a query profile. For more information about how to use query profiles, see Query profile.

Q: What do I do if the following error is reported when I execute the ALTER statement to create NGram Bloom filter indexes for an existing table?

ERROR 1105 (HY000): errCode = 2, detailMessage = NGRAM_BF index for columns (review_body ) already exist.

A: You can create multiple NGram Bloom filter indexes for a table. However, these indexes must be configured for different columns. If an NGram Bloom filter index or a Bloom filter index is already created for a column, you cannot create another NGram Bloom filter index for the column.