High-concurrency point query in ApsaraDB for SelectDB - ApsaraDB for SelectDB

This topic describes how to optimize and use high-concurrency point queries in ApsaraDB for SelectDB. This helps you optimize concurrent queries and reduce response latency.

Background information

In high-concurrency service scenarios, you need to query rows of data from the system in most cases. However, ApsaraDB for SelectDB is developed based on the column store format. If wide tables are used, the column store format amplifies random read I/O, which reduces the query speed. The query optimizer and execution engine of ApsaraDB for SelectDB are not suitable for some simple queries, such as point queries. The query optimizer needs to be able to plan a short path to process such queries. In addition, the query access layer of ApsaraDB for SelectDB is written in Java. High CPU overhead is caused when ApsaraDB for SelectDB analyzes and parses high-concurrency SQL queries. To solve these issues, ApsaraDB for SelectDB provides the row store mode, short query path, and prepared statements.

Row store

You can enable the row store mode for a table in ApsaraDB for SelectDB. However, this consumes additional storage space. To implement row store, ApsaraDB for SelectDB encodes the data stored in the row store format and stores the data in separate columns, which simplifies the implementation of row store. You can enable the row store mode for a table only when you create the table. You must configure the following property in the CREATE TABLE statement.

"store_row_column" = "true"

Optimize point queries in the Unique key model

If you enable the row store mode in the Unique key model, you can enable the merge on write (MoW) policy to reduce the I/O overhead during point queries. If the enable_unique_key_merge_on_write and store_row_column parameters are set to true in the CREATE TABLE statement in the Unique key model, the query optimizer plans a short path to optimize the execution of the point query that is performed based on the primary key, and only one remote procedure call (RPC) is required to execute the query.

The following sample code provides an example on how to enable the row store mode and MoW policy.

CREATE TABLE `tbl_point_query` (
    `key` int(11) NULL,
    `v1` decimal(27, 9) NULL,
    `v2` varchar(30) NULL,
    `v3` varchar(30) NULL,
    `v4` date NULL,
    `v5` datetime NULL,
    `v6` float NULL,
    `v7` datev2 NULL
) ENGINE=OLAP
UNIQUE KEY(`key`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`key`) BUCKETS 16
PROPERTIES (
    "enable_unique_key_merge_on_write" = "true",
    "light_schema_change" = "true",
    "store_row_column" = "true"
);

Note

We recommend that you set the enable_unique_key_merge_on_write parameter to true to enable the storage engine to quickly perform point queries based on the primary key.
If only the primary key is involved in the conditions of a query, such as SELECT * FROM tbl_point_query WHERE id = 123, the query optimizer plans a short path to optimize the query.
We recommend that you set the light_schema_change parameter to true because the optimization of point queries based on the primary key depends on the unique column ID in the lightweight schema change to locate columns.
Point queries must contain equality conditions related to key columns in a single table. JOIN queries and nested subqueries are not supported. The WHERE clause supports only queries based on the values of the key columns, which can be considered as key-value pair queries.

Use prepared statements

To reduce the overhead of SQL parsing and expression calculation, ApsaraDB for SelectDB provides the prepared statements that are fully compatible with the MySQL protocol at the query access layer. You can use the prepared statements only when you perform point queries based on primary keys. If the prepared statement feature is enabled on the frontend (FE), SQL statements and their expressions are calculated in advance and cached in the session-level memory cache. Subsequent queries can directly use the cached objects. If the CPU limits the performance of primary key-based point queries, the performance of such queries can be improved by more than four times after the prepared statement feature is enabled.

The following example shows how to use the prepared statement feature in Java Database Connectivity (JDBC).

Specify the JDBC URL and enable the prepared statement feature on the server.
```
url = jdbc:mysql://127.0.0.1:9030/ycsb?useServerPrepStmts=true
```

Use the prepared statements.

// use `?` for placement holders, readStatement should be reused
PreparedStatement readStatement = conn.prepareStatement("select * from tbl_point_query where key = ?");
...
readStatement.setInt(1234);
ResultSet resultSet = readStatement.executeQuery();
...
readStatement.setInt(1235);
resultSet = readStatement.executeQuery();
...

Enable the row cache feature

ApsaraDB for SelectDB provides the page-level cache feature that stores data of a specific column on each page. Each page stores the data of a column. This indicates that the page-level cache is used for columns. In row store mode, a row contains data of multiple columns, and the cache may be evicted by large queries, which can reduce the hit rate. To increase the hit rate, ApsaraDB for SelectDB provides the row cache feature. The row cache feature reuses the least recently used (LRU) cache mechanism in ApsaraDB for SelectDB to ensure sufficient memory usage. You can enable the row cache feature by configuring the following parameters of the backend (BE) configurations:

disable_storage_row_cache: specifies whether to enable the row cache feature. By default, this parameter is set to false to reduce the memory overhead.
row_cache_mem_limit: the maximum percentage of memory that can be occupied by the row cache. Default value: 20, which indicates that 20% of memory can be occupied by the row cache.